From derek at crojecta.com  Mon Mar 16 13:22:49 2009
From: derek at crojecta.com (Derek Lewis)
Date: Mon Mar 16 13:22:52 2009
Subject: [uf-new] First draft of projecta proposal
In-Reply-To: <ecb1e8c60903130337m271ffffcy7f3203a37e099f9c@mail.gmail.com>
References: <ecb1e8c60903130337m271ffffcy7f3203a37e099f9c@mail.gmail.com>
Message-ID: <ecb1e8c60903161422r5b6013f7g736795762fc11543@mail.gmail.com>

We need feedback on the project format.

http://microformats.org/wiki/project

All examples, formats and brainstorming can be found here:

http://microformats.org/wiki/project-examples
http://microformats.org/wiki/project-info-formats
http://microformats.org/wiki/project-brainstorming

Please provide as much feedback and criticism as possible. We'd like to
get this moved into an official draft within the next month.
Please post all comments to the list so that all may participate in
finalizing the proposal.

-- Derek
From scott at makedatamakesense.com  Tue Mar 17 06:19:45 2009
From: scott at makedatamakesense.com (Scott Reynen)
Date: Tue Mar 17 06:19:53 2009
Subject: [uf-new] First draft of projecta proposal
In-Reply-To: <ecb1e8c60903161422r5b6013f7g736795762fc11543@mail.gmail.com>
References: <ecb1e8c60903130337m271ffffcy7f3203a37e099f9c@mail.gmail.com>
	<ecb1e8c60903161422r5b6013f7g736795762fc11543@mail.gmail.com>
Message-ID: <367537FF-80A0-4E35-BC28-C9107E5997CD@makedatamakesense.com>

On [Mar 16], at [ Mar 16] 3:22 , Derek Lewis wrote:

> We need feedback on the project format.
>
> http://microformats.org/wiki/project

I would like to see a response to Toby's feedback here:

http://microformats.org/wiki/project-examples#Project_Examples

Also, why so few (6) examples?

--
Scott Reynen
MakeDataMakeSense.com


From inyuki at gmail.com  Sat Mar 21 02:31:40 2009
From: inyuki at gmail.com (Mindaugas Indriunas)
Date: Sat Mar 21 02:31:54 2009
Subject: [uf-new] A microformat for Machine translation software readable
	words
Message-ID: <e7160b500903210331y4f65ed7ax111771dac23c548@mail.gmail.com>

While human can recognize the meanings of words from context, machine
translation software generally cannot recognize between homonyms.
Statistical methods are used, but they are imperfect. If there was a
microformat to mark up the meanings of words, the translations could
become much easier and better.

For example:

<span class="concept" dictionary="http://www.merriam-webster.com/"
meaning="3a">idea</span> = "an image recalled by memory"

<span class="concept" dictionary="http://www.merriam-webster.com"
meaning="1c">idea</span> = "a plan for action"

Instead of marking the meaning for each word, the following pattern
might also be useful:

<p dict="http://www.merriam-webster.com"><span mean="1c">Ideas</span>
<span mean="8a">for</span> <span mean="10">free</span>. </p>

(in this example, the real indices of the dictionary are used.)

The implications of this kind of microformat could be far reaching. It
could result in better machine translation, and possibly something
like Wikipedia written in one language (that is, in concepts defined
through use of multitude of all existing human languages and
dictionaries), yet displayed in a preferred human language
automatically...

-- 
Mindaugas Indri?nas (Inyuki/????/??)
http://wikipedia.org/wiki/User:Inyuki

From tom at tommorris.org  Sat Mar 21 05:31:05 2009
From: tom at tommorris.org (Tom Morris)
Date: Sat Mar 21 05:31:10 2009
Subject: [uf-new] A microformat for Machine translation software readable 
	words
In-Reply-To: <e7160b500903210331y4f65ed7ax111771dac23c548@mail.gmail.com>
References: <e7160b500903210331y4f65ed7ax111771dac23c548@mail.gmail.com>
Message-ID: <d375f00f0903210631hefb7413t431dd0aa8e6e9785@mail.gmail.com>

On Sat, Mar 21, 2009 at 10:31, Mindaugas Indriunas <inyuki@gmail.com> wrote:
> The implications of this kind of microformat could be far reaching. It
> could result in better machine translation, and possibly something
> like Wikipedia written in one language (that is, in concepts defined
> through use of multitude of all existing human languages and
> dictionaries), yet displayed in a preferred human language
> automatically...
>

The W3C have an Incubator Group in place to try and push 'CWL', the
Common Web Language. The idea of it is that instead of writing a Web
documents in existing natural languages, one writes them in this
semantically-rich markup language, which then gets machine translated.

Here is their charter:
http://www.w3.org/2005/Incubator/cwl-ei/charter

I put up a blog post about it a while back, where I snarkily called it
Esperanto-over-HTTP:
http://tommorris.org/blog/2008/07/01#When:22:29:49

A microformat that sits atop an existing machine language (X/HTML) and
existing natural languages is a lot less impractical than something
like CWL. That said, the idea that general web documents will end up
filled with semantically unambiguous identifiers instead of words is
ambitious to say the least.

Both this proposal and the CWL proposal suffer from the problem that
it'll turn the richness of human languages into machine slop. Human
languages have given us Plato, Dante, the Song of Solomon, Eliot and
Shakespeare. A highly efficient method to turn that into something
like a Java stack trace is perhaps less than ideal. Maybe, in a
hundred years time, we might get some kind of XML Esperanto thing
going on, but we need to just solve the big problems - the common
blobs of data, the common relationships between the things those blobs
of data represent. This is how it is in the real world - there's a
reason why things like the signs at hospitals, train stations,
airports and trams are made internationally readable with a greater
degree of urgency than, say, television shows. If you turn up at a
hospital and don't speak much of the native language, you risk death.
If you can't watch Lost, big deal.

If you think that this approach has a shot, I think the best way is to
produce a demo - write an example in X/HTML and show how linguistic
disambiguation could make for better machine translation. You need to
get the guts working first, then if it's necessary, a microformat can
come later.

-- 
Tom Morris
http://tommorris.org/