[uf-new] A microformat for Machine translation software readable words

Mindaugas Indriunas inyuki at gmail.com
Sat Mar 21 02:31:40 PST 2009


While human can recognize the meanings of words from context, machine
translation software generally cannot recognize between homonyms.
Statistical methods are used, but they are imperfect. If there was a
microformat to mark up the meanings of words, the translations could
become much easier and better.

For example:

<span class="concept" dictionary="http://www.merriam-webster.com/"
meaning="3a">idea</span> = "an image recalled by memory"

<span class="concept" dictionary="http://www.merriam-webster.com"
meaning="1c">idea</span> = "a plan for action"

Instead of marking the meaning for each word, the following pattern
might also be useful:

<p dict="http://www.merriam-webster.com"><span mean="1c">Ideas</span>
<span mean="8a">for</span> <span mean="10">free</span>. </p>

(in this example, the real indices of the dictionary are used.)

The implications of this kind of microformat could be far reaching. It
could result in better machine translation, and possibly something
like Wikipedia written in one language (that is, in concepts defined
through use of multitude of all existing human languages and
dictionaries), yet displayed in a preferred human language
automatically...

-- 
Mindaugas Indriūnas (Inyuki/みんでぃ/明迪)
http://wikipedia.org/wiki/User:Inyuki



More information about the microformats-new mailing list