From derek at crojecta.com Mon Mar 16 13:22:49 2009 From: derek at crojecta.com (Derek Lewis) Date: Mon Mar 16 13:22:52 2009 Subject: [uf-new] First draft of projecta proposal In-Reply-To: References: Message-ID: We need feedback on the project format. http://microformats.org/wiki/project All examples, formats and brainstorming can be found here: http://microformats.org/wiki/project-examples http://microformats.org/wiki/project-info-formats http://microformats.org/wiki/project-brainstorming Please provide as much feedback and criticism as possible. We'd like to get this moved into an official draft within the next month. Please post all comments to the list so that all may participate in finalizing the proposal. -- Derek From scott at makedatamakesense.com Tue Mar 17 06:19:45 2009 From: scott at makedatamakesense.com (Scott Reynen) Date: Tue Mar 17 06:19:53 2009 Subject: [uf-new] First draft of projecta proposal In-Reply-To: References: Message-ID: <367537FF-80A0-4E35-BC28-C9107E5997CD@makedatamakesense.com> On [Mar 16], at [ Mar 16] 3:22 , Derek Lewis wrote: > We need feedback on the project format. > > http://microformats.org/wiki/project I would like to see a response to Toby's feedback here: http://microformats.org/wiki/project-examples#Project_Examples Also, why so few (6) examples? -- Scott Reynen MakeDataMakeSense.com From inyuki at gmail.com Sat Mar 21 02:31:40 2009 From: inyuki at gmail.com (Mindaugas Indriunas) Date: Sat Mar 21 02:31:54 2009 Subject: [uf-new] A microformat for Machine translation software readable words Message-ID: While human can recognize the meanings of words from context, machine translation software generally cannot recognize between homonyms. Statistical methods are used, but they are imperfect. If there was a microformat to mark up the meanings of words, the translations could become much easier and better. For example: idea = "an image recalled by memory" idea = "a plan for action" Instead of marking the meaning for each word, the following pattern might also be useful:

Ideas for free.

(in this example, the real indices of the dictionary are used.) The implications of this kind of microformat could be far reaching. It could result in better machine translation, and possibly something like Wikipedia written in one language (that is, in concepts defined through use of multitude of all existing human languages and dictionaries), yet displayed in a preferred human language automatically... -- Mindaugas Indri?nas (Inyuki/????/??) http://wikipedia.org/wiki/User:Inyuki From tom at tommorris.org Sat Mar 21 05:31:05 2009 From: tom at tommorris.org (Tom Morris) Date: Sat Mar 21 05:31:10 2009 Subject: [uf-new] A microformat for Machine translation software readable words In-Reply-To: References: Message-ID: On Sat, Mar 21, 2009 at 10:31, Mindaugas Indriunas wrote: > The implications of this kind of microformat could be far reaching. It > could result in better machine translation, and possibly something > like Wikipedia written in one language (that is, in concepts defined > through use of multitude of all existing human languages and > dictionaries), yet displayed in a preferred human language > automatically... > The W3C have an Incubator Group in place to try and push 'CWL', the Common Web Language. The idea of it is that instead of writing a Web documents in existing natural languages, one writes them in this semantically-rich markup language, which then gets machine translated. Here is their charter: http://www.w3.org/2005/Incubator/cwl-ei/charter I put up a blog post about it a while back, where I snarkily called it Esperanto-over-HTTP: http://tommorris.org/blog/2008/07/01#When:22:29:49 A microformat that sits atop an existing machine language (X/HTML) and existing natural languages is a lot less impractical than something like CWL. That said, the idea that general web documents will end up filled with semantically unambiguous identifiers instead of words is ambitious to say the least. Both this proposal and the CWL proposal suffer from the problem that it'll turn the richness of human languages into machine slop. Human languages have given us Plato, Dante, the Song of Solomon, Eliot and Shakespeare. A highly efficient method to turn that into something like a Java stack trace is perhaps less than ideal. Maybe, in a hundred years time, we might get some kind of XML Esperanto thing going on, but we need to just solve the big problems - the common blobs of data, the common relationships between the things those blobs of data represent. This is how it is in the real world - there's a reason why things like the signs at hospitals, train stations, airports and trams are made internationally readable with a greater degree of urgency than, say, television shows. If you turn up at a hospital and don't speak much of the native language, you risk death. If you can't watch Lost, big deal. If you think that this approach has a shot, I think the best way is to produce a demo - write an example in X/HTML and show how linguistic disambiguation could make for better machine translation. You need to get the guts working first, then if it's necessary, a microformat can come later. -- Tom Morris http://tommorris.org/