From philipj at opera.com Mon Feb 1 01:51:24 2010 From: philipj at opera.com (=?utf-8?Q?Philip_J=C3=A4genstedt?=) Date: Mon Feb 1 01:51:44 2010 Subject: [uf-discuss] Fwd: Removing the FN magic in the vCard microdata vocabulary In-Reply-To: <101535998-1264784845-cardhu_decombobulator_blackberry.rim.net-1587876118-@bda088.bisx.prod.on.blackberry> References: <101535998-1264784845-cardhu_decombobulator_blackberry.rim.net-1587876118-@bda088.bisx.prod.on.blackberry> Message-ID: Hi microformateers, Please see the below forwarded question about removing the guessing of names when exporting vCard. Since Hixie wants the microdata vCard vocab/extraction to be compatible with microformats, I'm taking it to the source... In short, I think that guessing the names will create problems for Vietnamese names (family-name given-name given-name), Chinese names (?? without space), transcribed Chinese names (family-name given-name) and probably the Japanese and Korean names, for the same reasons. Are there compatibility issues with not outputting an N line at all? If there is, would there be any issues with simply outputting N:;;;; ? The current algorithm is used on http://foolip.org/microdatajs/live/ for reference. -- Philip J?genstedt Core Developer Opera Software ------- Forwarded message ------- From: "Tantek Celik" To: "Ian Hickson" , "Philip J??genstedt" , "Tantek ?elik" , "Jeremy Keith" Cc: "whatwg@whatwg.org List" Subject: Re: Removing the FN magic in the vCard microdata vocabulary (Was: [whatwg]Microdata feedback) Date: Fri, 29 Jan 2010 18:08:33 +0100 There have been several issues filed specifically regarding 'n' and 'fn' optimizations in hCard, in particular the i18n problem that is mentioned in this thread, and resolved with errata updates to these algorithms. This particular issue is documented on the hcard-issues-resolved page on the microformats wiki page. If there are further problems regarding these property optimizations, I'm certainly open to seeing (and would like to see) them raised+documented so that we can fix them in hCard. (There shouldn't be any divergence, and frankly I'd prefer that vcard microdata simply reference hCard but I realize that is waiting on hCard 1.0.1). As I'm editing hCard 1.0.1 now and making changes to address issues just like this - now is a very good time to give this feedback. Please either send them to microformats-discuss@microformats.org or feel free to add them directly to the hCard issues wiki page (preferable): http://microformats.org/wiki/hcard-issues And we can follow-up there. Thanks, Tantek ------Original Message------ From: Ian Hickson To: Philip J??genstedt To: Tantek ?elik To: Jeremy Keith Cc: whatwg@whatwg.org List Subject: Removing the FN magic in the vCard microdata vocabulary (Was: [whatwg]Microdata feedback) Sent: Jan 29, 2010 01:04 On Thu, 21 Jan 2010, Philip J?genstedt wrote: > On Mon, 18 Jan 2010 16:24:46 +0100, Jeremy Keith > wrote: > > Hixie wrote: > > > > Finally on vCard, the final part of the extraction algorithm goes > > > > to great trouble to guess what is the family name and what is the > > > > given name. This guess will be broken for transliterated east > > > > Asian names (CJKV that I know of, maybe others too). Just saying. > > > > Also, why is it important to explicitly add N:;;;; for > > > > organizations? > > > > > > This is intended to be compatible with Microformats vCard, which has > > > these weird rules. If you think we should remove them, please at > > > least first speak to Tantek and see why he thinks. > > > > The fn optimisation pattern isn't intended to catch 100% of cases, > > just the situation "Firstname Lastname" or "Firstname Middlename > > Lastname". So if you just use fn (formatted name) and don't use n > > (name), the name will be extracted/guessed using the optimisation > > pattern. > > > > In cases where the pattern doesn't work (e.g. "Anne van Kesteren", or > > east Asian names) you can still explicitly specify the family name and > > given name, over-riding the fn optimisation pattern. If you do this, > > you need to explicitly state this is the name (n) as well as the > > formatted name (fn). > > This is going to break badly whenever a template uses vCard microdata > and its author either doesn't know the family name and given name > (because the data was never collected) or doesn't even consider that the > vcard conversion does this funny guesswork. If a social network site or > similar does this, then Anne van Kesteren and Zhang Min (fictional name) > will have their names messed up with no way of fixing it. At least I > haven't seen a site which asks users to both fill in their full name and > each component, which is what you need to get this right. > > > Similarly, for organisations, you don't have to explicitly set n > > (name) if you apply both fn (formatted name) and org (organisation > > name) to a string. This time, the optimisation pattern assumes that > > the fn is the name of the organisation. > > > > Technically, the n property is *always* required but if you use either > > of those two optimisation patterns, the n is inferred from fn. > > If this is just a technical problem with some software requiring N to be > present, would it be OK to just output an empty N like for > organizations? That's a good question... As I mentioned above, the rule is here to be compatible with Microformats. I'd be happy to remove it, but I'd like confirmation from the Microformats community that it's ok for us to diverge in this way from their vocabulary, and to find out if they have any experience regarding how much of a problem generating a blank N in the output when it's missing would be. Tantek, Jeremy, any opinions? From tantek at cs.stanford.edu Mon Feb 8 13:31:09 2010 From: tantek at cs.stanford.edu (=?UTF-8?Q?Tantek_=C3=87elik?=) Date: Mon Feb 8 13:31:35 2010 Subject: [uf-discuss] geo shorthand in anchor In-Reply-To: <1263564099.2546.24.camel@csarven-laptop> References: <1261936306.2543.29.camel@csarven-laptop> <21e770780912271046i3bedc485m59aee2df1c469bc0@mail.gmail.com> <1262206249.4426.96.camel@csarven-laptop> <21e770780912301354h2b1c638fj7019592fc17f6fb@mail.gmail.com> <1262210946.9728.14.camel@csarven-laptop> <21e770780912301430p71b37da9i93ccd7acd658d9c1@mail.gmail.com> <1262259010.4580.28.camel@csarven-laptop> <60cb038a0912310710o3287b374h6e48226af499d3b4@mail.gmail.com> <1263564099.2546.24.camel@csarven-laptop> Message-ID: <60cb038a1002081331k5334d812oaf991672ce294c55@mail.gmail.com> On Fri, Jan 15, 2010 at 6:01 AM, Sarven Capadisli wrote: > I've noted my observations on your observations > http://microformats.org/wiki/index.php?title=geo-brainstorming&diff=41657&oldid=41586 Thanks Sarven, you raised some good questions - I've followed up on the wiki as well. > I see two things there: > > 1. changing the problem i.e., intended visible readable text content In general we should seek to make content more visible when possible. > 2. "45.5140800" and "-73.6111000" as text values is no more human > readable and listenable than as "45.5140800;-73.6111000" title value. But that's not the exact comparison of the renderings, leaving out the key difference, the labels: lat:45.5140800; long:-73.6111000 which is then more readable/listenable/understandable than a pair of semicolon separated numbers. it may not be perfect, but it is an improvement. Thanks, Tantek From palmisano at fbk.eu Fri Feb 19 01:37:39 2010 From: palmisano at fbk.eu (Davide Palmisano) Date: Fri Feb 19 01:49:51 2010 Subject: [uf-discuss] (no subject) Message-ID: Dear all, we are proud to announce a new release of any23 -- Anything to Triples. http://developers.any23.org/ Any23 is a Java library that parses RDF from a variety of Web document formats. The currently supported input formats are RDFa, RDF/XML, Turtle, N3, N-Triples, and a number of Microformats. Any23 is an Open Source project originated from the code created within the Sindice project and now used both inside sindice and in related projects e.g. Sig.Ma Any23 comes with a handy command-line tool for parsing RDF and converting between formats. We have also set up a demo service where you can try any23 online and use a REST API to convert between different RDF formats, similar in spirit to triplr.org: http://any23.org/ The major new features in this release are: * Redesigned Java API - Input from string, stream, file, or URI - Allow choosing which extractors to use - Report origin of triples (document/extractor) to client processors - Various processors/serializers for extracted triples * Added flexible command-line tool for easy testing * Vastly improved website and documentation * Media type and encoding detection via Apache Tika * Switched RDF library from Jena to Sesame * Added Maven build * Better RDF extraction from Microformats * Extractors come with example file to document typical in- and output * Major refactoring * Lots and lots of bugfixes The following people have contributed to this release: Michele Mostarda and Davide Pamisano (FBK, Trento, Italy, Web of Data Unit (WED) ); Richard Cyganiak and J?rgen Umbrich (DERI, NUI Galway, Ireland); Michele Catasta (EPFL, Lausanne, Switzerland), Giovanni Tummarello All the best, Davide Palmisano on behalf of the contributors Davide Palmisano Web of Data Research Unit Technologist @ Fondazione Bruno Kessler http://wed.fbk.eu/en/home --- http://davidepalmisano.wordpress.com http://twitter.com/dpalmisano http://www.slideshare.net/dpalmisano From palmisano at fbk.eu Fri Feb 19 01:59:31 2010 From: palmisano at fbk.eu (Davide Palmisano) Date: Fri Feb 19 02:00:18 2010 Subject: [uf-discuss] [ANN] any23 v0.2 released Message-ID: Dear all, we are proud to announce a new release of any23 -- Anything to Triples. http://developers.any23.org/ Any23 is a Java library that parses RDF from a variety of Web document formats. The currently supported input formats are RDFa, RDF/XML, Turtle, N3, N-Triples, and a number of Microformats. Any23 is an Open Source project originated from the code created within the Sindice project and now used both inside sindice and in related projects e.g. Sig.Ma Any23 comes with a handy command-line tool for parsing RDF and converting between formats. We have also set up a demo service where you can try any23 online and use a REST API to convert between different RDF formats, similar in spirit to triplr.org: http://any23.org/ The major new features in this release are: * Redesigned Java API - Input from string, stream, file, or URI - Allow choosing which extractors to use - Report origin of triples (document/extractor) to client processors - Various processors/serializers for extracted triples * Added flexible command-line tool for easy testing * Vastly improved website and documentation * Media type and encoding detection via Apache Tika * Switched RDF library from Jena to Sesame * Added Maven build * Better RDF extraction from Microformats * Extractors come with example file to document typical in- and output * Major refactoring * Lots and lots of bugfixes The following people have contributed to this release: Michele Mostarda and Davide Pamisano (FBK, Trento, Italy, Web of Data Unit (WED) ); Richard Cyganiak and J?rgen Umbrich (DERI, NUI Galway, Ireland); Michele Catasta (EPFL, Lausanne, Switzerland), Giovanni Tummarello All the best, Davide Palmisano on behalf of the contributors Davide Palmisano Web of Data Research Unit Technologist @ Fondazione Bruno Kessler http://wed.fbk.eu/en/home --- http://davidepalmisano.wordpress.com http://twitter.com/dpalmisano http://www.slideshare.net/dpalmisano From tantek at cs.stanford.edu Fri Feb 19 18:01:28 2010 From: tantek at cs.stanford.edu (=?UTF-8?Q?Tantek_=C3=87elik?=) Date: Fri Feb 19 18:01:52 2010 Subject: [uf-discuss] [ANN] any23 v0.2 released In-Reply-To: References: Message-ID: <60cb038a1002191801i268bf84ak9c90144cbfc585bf@mail.gmail.com> On Fri, Feb 19, 2010 at 1:59 AM, Davide Palmisano wrote: > Dear all, > > we are proud to announce a new release of any23 -- Anything to Triples. > > ? ? ? ? ?http://developers.any23.org/ Davide, congratulations on your release! > Any23 is a Java library that parses RDF from a variety of Web document > formats. The currently supported input formats are RDFa, RDF/XML, > Turtle, N3, N-Triples, and a number of Microformats. > Any23 is an Open Source project originated from the code created > within the Sindice project and now used both inside sindice and in > related projects e.g. Sig.Ma > > Any23 comes with a handy command-line tool for parsing RDF and > converting between formats. > > We have also set up a demo service where you can try any23 online and > use a REST API to convert between different RDF formats, similar in > spirit to triplr.org: > > ? ? ? ? ?http://any23.org/ > > The major new features in this release are: > > * Redesigned Java API > ? - Input from string, stream, file, or URI > ? - Allow choosing which extractors to use > ? - Report origin of triples (document/extractor) to client processors > ? - Various processors/serializers for extracted triples > * Added flexible command-line tool for easy testing > * Vastly improved website and documentation > * Media type and encoding detection via Apache Tika > * Switched RDF library from Jena to Sesame > * Added Maven build > * Better RDF extraction from Microformats This is great to hear. Tom Morris has already kindly added any23 to the parsers page: http://microformats.org/wiki/parsers Could you list the specific microformats that are parsed by any23? And even better, feel free to add any23 to the *-implementations pages of the microformats that it supports, e.g. if it supports hCard, add it to: http://microformats.org/wiki/hcard-implementations#Open_Source > The following people have contributed to this release: Michele > Mostarda and Davide Pamisano (FBK, Trento, Italy, Web of Data Unit > (WED) ); Richard Cyganiak and J?rgen Umbrich (DERI, NUI Galway, > Ireland); Michele Catasta (EPFL, Lausanne, Switzerland), Giovanni > Tummarello > > All the best, > Davide Palmisano on behalf of the contributors Thanks again for all your excellent work and for contributing to bettering the interoperability of semantic data on the web. Tantek -- http://tantek.com/ From palmisano at fbk.eu Mon Feb 22 01:55:53 2010 From: palmisano at fbk.eu (Davide Palmisano) Date: Mon Feb 22 02:01:07 2010 Subject: [uf-discuss] [ANN] any23 v0.2 released In-Reply-To: <60cb038a1002191801i268bf84ak9c90144cbfc585bf@mail.gmail.com> References: , <60cb038a1002191801i268bf84ak9c90144cbfc585bf@mail.gmail.com> Message-ID: ________________________________________ From: microformats-discuss-bounces@microformats.org [microformats-discuss-bounces@microformats.org] On Behalf Of Tantek ?elik [tantek@cs.stanford.edu] Sent: Saturday, February 20, 2010 3:01 AM To: Microformats Discuss Subject: Re: [uf-discuss] [ANN] any23 v0.2 released On Fri, Feb 19, 2010 at 1:59 AM, Davide Palmisano wrote: > Dear all, > > we are proud to announce a new release of any23 -- Anything to Triples. > > http://developers.any23.org/ Davide, congratulations on your release! Many thanks Tantek! > Any23 is a Java library that parses RDF from a variety of Web document > formats. The currently supported input formats are RDFa, RDF/XML, > Turtle, N3, N-Triples, and a number of Microformats. > Any23 is an Open Source project originated from the code created > within the Sindice project and now used both inside sindice and in > related projects e.g. Sig.Ma > > Any23 comes with a handy command-line tool for parsing RDF and > converting between formats. > > We have also set up a demo service where you can try any23 online and > use a REST API to convert between different RDF formats, similar in > spirit to triplr.org: > > http://any23.org/ > > The major new features in this release are: > > * Redesigned Java API > - Input from string, stream, file, or URI > - Allow choosing which extractors to use > - Report origin of triples (document/extractor) to client processors > - Various processors/serializers for extracted triples > * Added flexible command-line tool for easy testing > * Vastly improved website and documentation > * Media type and encoding detection via Apache Tika > * Switched RDF library from Jena to Sesame > * Added Maven build > * Better RDF extraction from Microformats This is great to hear. Tom Morris has already kindly added any23 to the parsers page: http://microformats.org/wiki/parsers Wow this is great! Could you list the specific microformats that are parsed by any23? Of course: Adr, Geo, hCalendar, hCard, hListing, hResume, hReview, License and XFN. As listed also here, http://developers.any23.org And even better, feel free to add any23 to the *-implementations pages of the microformats that it supports, e.g. if it supports hCard, add it to: http://microformats.org/wiki/hcard-implementations#Open_Source Sure, Will do! > The following people have contributed to this release: Michele > Mostarda and Davide Pamisano (FBK, Trento, Italy, Web of Data Unit > (WED) ); Richard Cyganiak and J?rgen Umbrich (DERI, NUI Galway, > Ireland); Michele Catasta (EPFL, Lausanne, Switzerland), Giovanni > Tummarello > > All the best, > Davide Palmisano on behalf of the contributors Thanks again for all your excellent work and for contributing to bettering the interoperability of semantic data on the web. Thanks to you for your quick feedback! We are glad to hear suggestions and improvements from the community. Tantek -- http://tantek.com/ _______________________________________________ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss