From lists at ben-ward.co.uk Fri Apr 4 03:08:48 2008 From: lists at ben-ward.co.uk (Ben Ward) Date: Fri Apr 4 03:08:52 2008 Subject: [uf-dev] Re: [uf-discuss] jCard draft In-Reply-To: <47F5750D.8030708@onlinehome.de> References: <47F5750D.8030708@onlinehome.de> Message-ID: On 4 Apr 2008, at 01:23, Gordon Oheim wrote: > I have added a preliminary draft for a possible jCard specification > to the wiki at http://microformats.org/wiki/jcard. > The content is based on what I read from the discussion list so far. > The intention was to have a reference for further discussion and for > solidifying a candidate for a jCard standard. Hi, This is great work, and it's something that I found a number of developers asking about during South By South West. I think it was Glenn Jones suggesting that we're now at a point with parser maturity that some thought needs to be given to having interoperable JSON structures. I have two points of initial followup, one with my admin hat on, the other without. 1. ADMIN: This discussion should probably take place on the microformats-dev mailing list, rather than -discuss. It should come to the attention of all parser developers that way, and hopefully stay focused on this very parser-centric work. I've cross posted this thread to microformats-dev@microformats.org; please continue the development discussion there. 2. In my view: I'm totally supportive and in favour of this work, I think ?jCard? is a bad name for it; I think this work would be better presented connected to the hCard specification itself ? and future equivalents for the other microformats too. Whether that end up as an ?Object Model? section of the relevant specs, or new documents (e.g. hcard-object-model). It doesn't need it's own, separate format name; it's really further specifying hcard itself. What's more, whilst JSON is the obvious driver technology for this work, I think it would make more sense to produce an implementation- agnostic Object Model that would work in JSON, XML, YML or whatever other transport people might want to implement for. I think it's unlikely we'd want to specify ?jCard?, ?xCard?, ?yCard? and so on?) > Please forgive my poor wiki editing skills and feel free to add to > the page. The page is off to a great start! Keep it up. Thanks, Ben From donohoe at nytimes.com Wed Apr 9 13:38:43 2008 From: donohoe at nytimes.com (michael) Date: Wed Apr 9 13:38:47 2008 Subject: [uf-dev] Feedback on XFN implementation Message-ID: <8ebe8ca30804091338y71222a91md85e6c5a4c76504c@mail.gmail.com> Hello, I'm trying to get some initial feedback on XFN support for a project I am working on. I've included some sample text of a users page. Essentially there are three components: 1. User info (name and basic summary) 2. A list of actions/activities from the user and other people in their network 3. A list of people within the users network (this can also include the user) There really aren't any levels of friend designation, and we expect that the user will not really know in RL many of the people in their network. From that perspective I use the designation "acquaintance" only. With that in mind, does the following seem appropriate (ignore href values as they're all bogus):

Michael

NYC 

... ...
John Coleman Earth
Shane Sweeney New York
Nick Burke Austin, TX
John Coleman recommended something: Yahoo
This is a web site
Apr, 1 2008
Michael recommended an another thing: Something about Coffee
This is a summary with description ifnormation.
Apr, 1 2008
... Thoughts and feedback appreciated! -Michael From julian_bond at voidstar.com Fri Apr 11 04:12:46 2008 From: julian_bond at voidstar.com (Julian Bond) Date: Fri Apr 11 04:13:40 2008 Subject: [uf-dev] Parsing XFN in PHP Message-ID: <2P$lv5Due0$HFApB@jblaptop.voidstar.com> Continuing a thread that started on the Discuss list. My experiments have led me to 2 approaches depending on PHP release. First php5. With error handling left as an exercise for the reader $url = 'http://ciaranmcnulty.com/'; if($html = @file_get_contents($url)){ $dom = new DomDocument(); if(@$dom->loadHtml($html)){ if ($nodes = $dom->getElementsByTagName('a')) { foreach($nodes as $node){ if ($node->getAttribute('rel')=='me') { echo $node->getAttribute('href'); } } } } } Pretty easy, huh? Clearly this same approach could be used for other values of rel= It's probably not too hard to extend this approach to find hCard and other uFs. loadHtml() doesn't exist in php4 dom-xml. In theory it should be possible to use HTML-Tidy tidy_repair_string to clean the html first and then feed it to domxml_open_mem. In practice, I'm having real trouble getting the right collection of tidy_repair_string configuration parameters to generate clean enough XML for dom to accept it. If that can be done, then this should work. $url = 'http://ciaranmcnulty.com/'; if($html = @file_get_contents($url)){ $html = @tidy_repair_string($html); if ($dom = @domxml_open_mem($html)) ) { if ($nodes = $dom->get_elements_by_tagname('a')) { foreach($nodes as $node){ if ($node->get_attribute('rel')=='me') { echo $node->get_attribute('href'); } } } } } Typical errors are things like:- - Space required after the Public Identifier - SystemLiteral " or ' expected - xmlParseExternalID: PUBLIC, no URI in - invalid entity nbsp Maybe, it's possible to get Tidy's output to avoid all these but I haven't managed it yet. I had a look at hkit but that makes no attempt to configure the Tidy module so I'd expect lots of problems when trying to parse arbitrary web pages. -- Julian Bond E&MSN: julian_bond at voidstar.com M: +44 (0)77 5907 2173 Webmaster: http://www.ecademy.com/ T: +44 (0)192 0412 433 Personal WebLog: http://www.voidstar.com/ skype:julian.bond?chat Tastes Like Milk From mark at markng.me.uk Fri Apr 11 04:36:03 2008 From: mark at markng.me.uk (Mark Ng) Date: Fri Apr 11 04:36:12 2008 Subject: [uf-dev] Parsing XFN in PHP In-Reply-To: <2P$lv5Due0$HFApB@jblaptop.voidstar.com> References: <2P$lv5Due0$HFApB@jblaptop.voidstar.com> Message-ID: $html = tidy_repair_string($html,array('output-xhtml' => true, 'numeric-entities' => 'true', )); was what I was using - does it work for you ? Mark On 11/04/2008, Julian Bond wrote: > Continuing a thread that started on the Discuss list. > > My experiments have led me to 2 approaches depending on PHP release. > First php5. With error handling left as an exercise for the reader > > > $url = 'http://ciaranmcnulty.com/'; > if($html = @file_get_contents($url)){ > $dom = new DomDocument(); > if(@$dom->loadHtml($html)){ > > if ($nodes = $dom->getElementsByTagName('a')) { > foreach($nodes as $node){ > if ($node->getAttribute('rel')=='me') { > echo $node->getAttribute('href'); > } > } > } > } > } > > Pretty easy, huh? Clearly this same approach could be used for other > values of rel= It's probably not too hard to extend this approach to > find hCard and other uFs. > > loadHtml() doesn't exist in php4 dom-xml. In theory it should be > possible to use HTML-Tidy tidy_repair_string to clean the html first and > then feed it to domxml_open_mem. In practice, I'm having real trouble > getting the right collection of tidy_repair_string configuration > parameters to generate clean enough XML for dom to accept it. If that > can be done, then this should work. > > > $url = 'http://ciaranmcnulty.com/'; > if($html = @file_get_contents($url)){ > > $html = @tidy_repair_string($html); > if ($dom = @domxml_open_mem($html)) ) { > if ($nodes = $dom->get_elements_by_tagname('a')) { > foreach($nodes as $node){ > if ($node->get_attribute('rel')=='me') { > echo $node->get_attribute('href'); > } > } > } > } > } > > Typical errors are things like:- > - Space required after the Public Identifier > - SystemLiteral " or ' expected > - xmlParseExternalID: PUBLIC, no URI in > - invalid entity nbsp > Maybe, it's possible to get Tidy's output to avoid all these but I > haven't managed it yet. I had a look at hkit but that makes no attempt > to configure the Tidy module so I'd expect lots of problems when trying > to parse arbitrary web pages. > > > -- > Julian Bond E&MSN: julian_bond at voidstar.com M: +44 (0)77 5907 2173 > Webmaster: http://www.ecademy.com/ T: +44 (0)192 0412 433 > Personal WebLog: http://www.voidstar.com/ skype:julian.bond?chat > Tastes Like Milk > _______________________________________________ > > microformats-dev mailing list > microformats-dev@microformats.org > http://microformats.org/mailman/listinfo/microformats-dev > From foolistbar at googlemail.com Fri Apr 11 04:45:03 2008 From: foolistbar at googlemail.com (Geoffrey Sneddon) Date: Fri Apr 11 04:52:51 2008 Subject: [uf-dev] Parsing XFN in PHP In-Reply-To: References: <73766b160804091118t1c5ad3bbof0bc5456898c2d1a@mail.gmail.com> <006f01c89afa$b5afadb0$116bacca@COMCEN> Message-ID: <11156B01-48DD-435A-BFE4-F41F1CE661CE@googlemail.com> On 10 Apr 2008, at 18:34, Toby A Inkster wrote: > Ryan Parman wrote: > >> "But we can do it in web browsers!" What do web browsers have that >> PHP >> developers don't? An HTML parser. As far as I know there are no HTML >> parsers written for PHP (or any other language that I'm aware of). > > http://www.php.net/manual/en/function.dom-domdocument-loadhtml.php That doesn't really work. libxml2's HTML parsing is nothing like what is actually needed for real world compatibility. Just take a look at things like foobar, or foo</plaintext><b>bar. On 11 Apr 2008, at 08:33, Toby A Inkster wrote: > Another option is XML_HTMLSax3 from PEAR: > http://pear.php.net/package/XML_HTMLSax3 This really seems like nothing more than a subset of SGML similar to XML, and is therefore equally useless at parsing HTML. See the above two examples again, as well as things like <b<i>hi</i></b> (note the omitted >). Real world HTML content really does rely on specific parsing rules, and attempting to deviate from them will just result in issues. In terms of anything useful, you'd really need to implement your own HTML parser, likely starting from HTML 5. Then you can run into issues with DOM requiring XML well-formedness, so you can't have as a localName "a@" (to reuse the example on public-html a few days ago, you need to parse <a@> <a#> </a@> correctly, despite all those tags having characters that you can't legally store in the DOM) -- Geoffrey Sneddon <http://gsnedders.com/> From julian_bond at voidstar.com Fri Apr 11 05:09:15 2008 From: julian_bond at voidstar.com (Julian Bond) Date: Fri Apr 11 05:10:21 2008 Subject: [uf-dev] Parsing XFN in PHP In-Reply-To: <d6fe3b060804110436h78192a6eod774a24fd21da64d@mail.gmail.com> References: <2P$lv5Due0$HFApB@jblaptop.voidstar.com> <d6fe3b060804110436h78192a6eod774a24fd21da64d@mail.gmail.com> Message-ID: <vfuj7pGrT1$HFAbo@jblaptop.voidstar.com> Mark Ng <mark@markng.me.uk> Fri, 11 Apr 2008 12:36:03 >$html = tidy_repair_string($html,array('output-xhtml' => true, >'numeric-entities' => 'true', )); was what I was using - does it work >for you ? I must have been getting tired last night. I'm sure I tried that. But today it's handling everything I can throw at it. My test rig is here http://www.voidstar.com/xfnexplorer -- Julian Bond E&MSN: julian_bond at voidstar.com M: +44 (0)77 5907 2173 Webmaster: http://www.ecademy.com/ T: +44 (0)192 0412 433 Personal WebLog: http://www.voidstar.com/ skype:julian.bond?chat No Wife, No Horse, No Moustache From ryan.lists.warpshare at gmail.com Fri Apr 11 09:38:54 2008 From: ryan.lists.warpshare at gmail.com (Ryan Parman) Date: Fri Apr 11 09:39:02 2008 Subject: [uf-dev] Fwd: (Off-list) Parsing XFN in PHP References: <11156B01-48DD-435A-BFE4-F41F1CE661CE@googlemail.com> Message-ID: <7EE313F0-F45C-420F-BBFB-31C6AECED526@gmail.com> Forwarding Geoffrey's off-list message sent to the original thread: Begin forwarded message: > From: Geoffrey Sneddon <foolistbar@googlemail.com> > Date: April 11, 2008 4:45:03 AM PDT > To: Toby A Inkster <mail@tobyinkster.co.uk>, Ryan Parman <ryan.lists.warpshare@gmail.com > > > Subject: Re: (Off-list) Parsing XFN in PHP > > > On 10 Apr 2008, at 18:34, Toby A Inkster wrote: >> Ryan Parman wrote: >> >>> "But we can do it in web browsers!" What do web browsers have that >>> PHP >>> developers don't? An HTML parser. As far as I know there are no HTML >>> parsers written for PHP (or any other language that I'm aware of). >> >> http://www.php.net/manual/en/function.dom-domdocument-loadhtml.php > > That doesn't really work. libxml2's HTML parsing is nothing like > what is actually needed for real world compatibility. Just take a > look at things like <b><i>foo</b>bar</i>, or <plaintext>foo</ > plaintext><b>bar. > > > On 11 Apr 2008, at 08:33, Toby A Inkster wrote: >> Another option is XML_HTMLSax3 from PEAR: >> http://pear.php.net/package/XML_HTMLSax3 > > This really seems like nothing more than a subset of SGML similar to > XML, and is therefore equally useless at parsing HTML. See the above > two examples again, as well as things like <b<i>hi</i></b> (note the > omitted >). > > Real world HTML content really does rely on specific parsing rules, > and attempting to deviate from them will just result in issues. In > terms of anything useful, you'd really need to implement your own > HTML parser, likely starting from HTML 5. Then you can run into > issues with DOM requiring XML well-formedness, so you can't have as > a localName "a@" (to reuse the example on public-html a few days > ago, you need to parse <a@> <a#> </a@> correctly, despite all those > tags having characters that you can't legally store in the DOM) > > > -- > Geoffrey Sneddon > <http://gsnedders.com/> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080411/bd79448b/attachment.html From gordon at onlinehome.de Sat Apr 12 02:54:02 2008 From: gordon at onlinehome.de (Gordon Oheim) Date: Sat Apr 12 02:54:06 2008 Subject: [uf-dev] Finalizing jCard Message-ID: <480086BA.3000108@onlinehome.de> Hi all, the discussion about a standardized jCard output format seems to have slept in a bit - so I am here to revive it. I'd say we are pretty much done with the specs, but there is one major point missing (see Section 2.2 in the wiki). Can we do a vote on whether Arrays or Objects may be reduced in case they only contain a single property. Though reducing Objects and Arrays would benefit a more compact JSON format, it would also require a little bit more business logic in the receiving system. +1 Enclosing Arrays or Objects must NOT be reduced. Cheers, Gordon Wiki Page: http://microformats.org/wiki/jcard From lists at ben-ward.co.uk Mon Apr 14 07:12:55 2008 From: lists at ben-ward.co.uk (Ben Ward) Date: Mon Apr 14 07:13:08 2008 Subject: [uf-dev] Finalizing jCard In-Reply-To: <480086BA.3000108@onlinehome.de> References: <480086BA.3000108@onlinehome.de> Message-ID: <B2CAFE18-7E81-4D1D-B9C1-B60EDEAB40B5@ben-ward.co.uk> On 12 Apr 2008, at 10:54, Gordon Oheim wrote: > the discussion about a standardized jCard output format seems to > have slept in a bit - so I am here to revive it. > > I'd say we are pretty much done with the specs, but there is one > major point missing (see Section 2.2 in the wiki). > Can we do a vote on whether Arrays or Objects may be reduced in case > they only contain a single property. I think it's somewhat premature to suggest that we're ?pretty much done with? the specs. I'd like to see input from Mike Kapley, Glenn Jones, Brian Suda, Drew McLellan and David Janes (if he has time!) since they all work on parsers too. Any attempt to standardise the object model of microformats is going to need their assistance, and they're also amongst the most experienced working with parsing. It's important they're give an opportunity to raise their own issues before this work gets pushed into finalisation. Ben From dmitry at baranovskiy.com Mon Apr 14 15:19:24 2008 From: dmitry at baranovskiy.com (Dmitry Baranovskiy) Date: Mon Apr 14 15:19:27 2008 Subject: [uf-dev] Finalizing jCard In-Reply-To: <B2CAFE18-7E81-4D1D-B9C1-B60EDEAB40B5@ben-ward.co.uk> References: <480086BA.3000108@onlinehome.de> <B2CAFE18-7E81-4D1D-B9C1-B60EDEAB40B5@ben-ward.co.uk> Message-ID: <8a52ddad0804141519l40833892sc0efb6b925832d7d@mail.gmail.com> Just an input from me: +1 Enclosing Arrays or Objects must NOT be reduced. I implemented it opposite way in Optimus, but I am pretty sure it is time to change it. From brian.suda at gmail.com Tue Apr 15 01:25:34 2008 From: brian.suda at gmail.com (Brian Suda) Date: Tue Apr 15 01:25:37 2008 Subject: [uf-dev] Finalizing jCard In-Reply-To: <B2CAFE18-7E81-4D1D-B9C1-B60EDEAB40B5@ben-ward.co.uk> References: <480086BA.3000108@onlinehome.de> <B2CAFE18-7E81-4D1D-B9C1-B60EDEAB40B5@ben-ward.co.uk> Message-ID: <21e770780804150125m65d0ecfemf3cd078dc39d7d90@mail.gmail.com> 2008/4/14, Ben Ward <lists@ben-ward.co.uk>: > On 12 Apr 2008, at 10:54, Gordon Oheim wrote: > > > the discussion about a standardized jCard output format seems to have > slept in a bit - so I am here to revive it. --- my first suggestion is not to call it jCard, but something more like JSON output of vCard or JSON to hCard mapping. As Ben said earlier, if we start using jCard, then we'll have xCard, aCard, pCard... all meaningless words. The same json mappings we make for hCard will be effective for hCalendar, hReview, etc. so the terminology should reflect this. > I think it's somewhat premature to suggest that we're 'pretty much done > with' the specs. --- i am not a JSON expert, so i can't weigh in on specifics, but here's what i would suggest to help move things along. Have a look at the current test suite. It has HTML and .vcf/.ics output for the pages. http://hg.microformats.org/tests We should also create a .json output as well. Then we can have a better point of discussion around real examples. This will help clear-up any outstanding issues and at the same time give various developers something to test their own code against. > Any attempt to standardise the object model of microformats is going to > need their assistance, and they're also amongst the most experienced working > with parsing. It's important they're give an opportunity to raise their own > issues before this work gets pushed into finalisation. --- i think the sample .json output from the tests will really help. Without that, it is difficult to discuss exact parsing rules and expected behaviours. -brian -- brian suda http://suda.co.uk From drew.mclellan at gmail.com Tue Apr 15 01:42:56 2008 From: drew.mclellan at gmail.com (Drew McLellan) Date: Tue Apr 15 01:42:59 2008 Subject: [uf-dev] Finalizing jCard In-Reply-To: <21e770780804150125m65d0ecfemf3cd078dc39d7d90@mail.gmail.com> References: <480086BA.3000108@onlinehome.de> <B2CAFE18-7E81-4D1D-B9C1-B60EDEAB40B5@ben-ward.co.uk> <21e770780804150125m65d0ecfemf3cd078dc39d7d90@mail.gmail.com> Message-ID: <83a9a59b0804150142g4ccf15d1r2a69da32d1e3a93d@mail.gmail.com> On 15/04/2008, Brian Suda <brian.suda@gmail.com> wrote: > > > > the discussion about a standardized jCard output format seems to have > > slept in a bit - so I am here to revive it. > > --- my first suggestion is not to call it jCard, but something more > like JSON output of vCard or JSON to hCard mapping. As Ben said > earlier, if we start using jCard, then we'll have xCard, aCard, > pCard... all meaningless words. The same json mappings we make for > hCard will be effective for hCalendar, hReview, etc. so the > terminology should reflect this. > > > I think it's somewhat premature to suggest that we're 'pretty much done > > with' the specs. > > --- i am not a JSON expert, so i can't weigh in on specifics, but > here's what i would suggest to help move things along. > > Have a look at the current test suite. It has HTML and .vcf/.ics > output for the pages. > http://hg.microformats.org/tests > > We should also create a .json output as well. Then we can have a > better point of discussion around real examples. This will help > clear-up any outstanding issues and at the same time give various > developers something to test their own code against. > > > Any attempt to standardise the object model of microformats is going to > > need their assistance, and they're also amongst the most experienced > working > > with parsing. It's important they're give an opportunity to raise their > own > > issues before this work gets pushed into finalisation. > > --- i think the sample .json output from the tests will really help. > Without that, it is difficult to discuss exact parsing rules and > expected behaviours. > Apologies that I'm late to this conversation ... I've been watching the idea unfold but haven't had a moment to contribute so far. I'd echo Brian's point about the name, but I'm not going to get hung up on that. However, the point about the test suites is crucial. If this is viable and useful then having the hCard tests in JSON format will both help confirm and encourage that. I'd just a compact format for the output (no whitespace etc) so that it becomes simple to perform a basic string comparison to verify results. +1 to never reducing single-item arrays. This is something we're changing in hkit already. drew. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080415/1a52da16/attachment.html From brian.suda at gmail.com Tue Apr 15 05:57:38 2008 From: brian.suda at gmail.com (Brian Suda) Date: Tue Apr 15 05:57:41 2008 Subject: [uf-dev] Finalizing jCard In-Reply-To: <93E6A4D1-D835-4C14-B318-4269E46C45F3@tobyinkster.co.uk> References: <93E6A4D1-D835-4C14-B318-4269E46C45F3@tobyinkster.co.uk> Message-ID: <21e770780804150557x67ff3a57m7afc23007b0ed910@mail.gmail.com> 2008/4/15, Toby A Inkster <mail@tobyinkster.co.uk>: > Brian Suda wrote: > > my first suggestion is not to call it jCard, but something more > > like JSON output of vCard or JSON to hCard mapping. As Ben said > > earlier, if we start using jCard, then we'll have xCard > An XML version of vCard > <http://www.watersprings.org/pub/id/draft-dawson-vcard-xml-dtd-03.txt> > already exists and predates hCard by a number of years, though it never > reached RFC stage. --- i must not have explained myself well enough, but your example proves the point i was trying to make. Rather than calling it jCard, for a JSON representation of vCard details. I was suggesting to call it something like "JSON representation of vCard" just like the XML representation of a vCard is not called xCard, but "An XML version of vCard". > > aCard, pCard > > Not sure what those would be, but for other hierarchical > markup/serialisation languages, I'd suggest that formats could be defined > as: --- i?m not talking about definitions of the serializations... just giving examples that if we start putting [A-Za-z0-9] infront of Card, we'll have an alphabet soup of formats which tell us nothing. > I would say that there exists no such function g() which allows for jCard - > or anything *like* jCard - to be defined in those terms, thus it is > justified to dedicate effort into defining jCard explicitly. --- the other thing i think we are hung-up on is solving the JSON representation for a single format. We have several design patterns to map VCF/ICS data to HTML, the class design pattern, the rel-design pattern and others. IMHO This is the best way forward to map Microformatted HTML to JSON in a similar manner, through patterns - not specific formats. Lets not worry about XYZ format mapping to JSON, we should look at a mf2json() mapping. -brian -- brian suda http://suda.co.uk From lists at ben-ward.co.uk Tue Apr 15 06:42:35 2008 From: lists at ben-ward.co.uk (Ben Ward) Date: Tue Apr 15 06:42:39 2008 Subject: [uf-dev] Microformat Object Models (was: Finalizing jCard) In-Reply-To: <21e770780804150557x67ff3a57m7afc23007b0ed910@mail.gmail.com> References: <93E6A4D1-D835-4C14-B318-4269E46C45F3@tobyinkster.co.uk> <21e770780804150557x67ff3a57m7afc23007b0ed910@mail.gmail.com> Message-ID: <AD436B3D-DC5E-45B1-8BB3-02625F1A4C41@ben-ward.co.uk> On 15 Apr 2008, at 13:57, Brian Suda wrote: >> I would say that there exists no such function g() which allows for >> jCard - >> or anything *like* jCard - to be defined in those terms, thus it is >> justified to dedicate effort into defining jCard explicitly. > > --- the other thing i think we are hung-up on is solving the JSON > representation for a single format. We have several design patterns to > map VCF/ICS data to HTML, the class design pattern, the rel-design > pattern and others. IMHO This is the best way forward to map > Microformatted HTML to JSON in a similar manner, through patterns - > not specific formats. Lets not worry about XYZ format mapping to JSON, > we should look at a mf2json() mapping. ?? Defining ?jCard? explicitly is a perfectly valid effort, but within the microformats community ? ? where we're working within the scope of HTML ?? the focus is to solve the problem of parsers producing inconsistent output, hence my emphasis on this being the ?hCard Object Model? (vis a vis the DOM, CSS OM). My view is that If that effort produces a defined vCard in JSON format as well then so be it, but for me, the lack of a vCard->JSON format is not the problem itself. ? Object Model consistency needs to be fixed for all other microformats, too, which gives weight to Brian's generic approach. If a set of generic parsing rules and patterns is robust enough and can be documented tightly enough to be implemented, then it's probably the way to go. Should we perhaps be looking to better define the data types at a schema level, which then map to parsing rules? To Glenn Jones: You said you might have an example of the kind of model documentation you'd like to implement against. Were you able to find any examples of this? B From aconbere at gmail.com Tue Apr 15 10:34:52 2008 From: aconbere at gmail.com (anders conbere) Date: Tue Apr 15 10:34:57 2008 Subject: [uf-dev] Microformat Object Models (was: Finalizing jCard) In-Reply-To: <AD436B3D-DC5E-45B1-8BB3-02625F1A4C41@ben-ward.co.uk> References: <93E6A4D1-D835-4C14-B318-4269E46C45F3@tobyinkster.co.uk> <21e770780804150557x67ff3a57m7afc23007b0ed910@mail.gmail.com> <AD436B3D-DC5E-45B1-8BB3-02625F1A4C41@ben-ward.co.uk> Message-ID: <8ca3fbe80804151034i6c01d651j528013803d66c571@mail.gmail.com> On Tue, Apr 15, 2008 at 6:42 AM, Ben Ward <lists@ben-ward.co.uk> wrote: > > On 15 Apr 2008, at 13:57, Brian Suda wrote: > > > > > > I would say that there exists no such function g() which allows for > jCard - > > > or anything *like* jCard - to be defined in those terms, thus it is > > > justified to dedicate effort into defining jCard explicitly. > > > > > > > --- the other thing i think we are hung-up on is solving the JSON > > representation for a single format. We have several design patterns to > > map VCF/ICS data to HTML, the class design pattern, the rel-design > > pattern and others. IMHO This is the best way forward to map > > Microformatted HTML to JSON in a similar manner, through patterns - > > not specific formats. Lets not worry about XYZ format mapping to JSON, > > we should look at a mf2json() mapping. > > > > ?? Defining 'jCard' explicitly is a perfectly valid effort, but within the > microformats community ? ? where we're working within the scope of HTML ?? > the focus is to solve the problem of parsers producing inconsistent output, > hence my emphasis on this being the 'hCard Object Model' (vis a vis the DOM, > CSS OM). My view is that If that effort produces a defined vCard in JSON > format as well then so be it, but for me, the lack of a vCard->JSON format > is not the problem itself. > > ? Object Model consistency needs to be fixed for all other microformats, > too, which gives weight to Brian's generic approach. If a set of generic > parsing rules and patterns is robust enough and can be documented tightly > enough to be implemented, then it's probably the way to go. Should we > perhaps be looking to better define the data types at a schema level, which > then map to parsing rules? Dan Brickley and I had a couple of good conversations at BlogTalk about how microformats could really use an assertion based approach to parsing. If you see every data item as a claim then everything becomes tuples. (Anders Conbere, has a, Hcard) (Hcard, has a, Address) (Address, has a, Street) (Street, is, 7511 Jones Ave NW) When you organize data structures like this it becomes trivially easy to define what a correct set of claims are for any given microformat and test for the correctness of a parsing output. Some of you might recognize this as the stance the rdf takes with it's testing http://www.w3.org/TR/rdf-testcases/ when I brought this up a month ago there was some strong push back from tantec for what I felt was a reluctance to begin to solidify the definitions of what is a very loose set of specs. That being said, it's REALLY REALLY hard to parse microformats properly today, having a test harness to run my parser against would help immensely, but that requires the organization to put some work into solidifying the way the specs work. (One of the other nice things about specing your formats as rdf, is that you can easily create grddl documents for them and parsers are really good at parsing rdf.) ~ Anders > > To Glenn Jones: You said you might have an example of the kind of model > documentation you'd like to implement against. Were you able to find any > examples of this? > > B > _______________________________________________ > microformats-dev mailing list > microformats-dev@microformats.org > http://microformats.org/mailman/listinfo/microformats-dev > From msporny at digitalbazaar.com Tue Apr 15 13:05:48 2008 From: msporny at digitalbazaar.com (Manu Sporny) Date: Tue Apr 15 13:05:56 2008 Subject: [uf-dev] Microformat Object Models In-Reply-To: <8ca3fbe80804151034i6c01d651j528013803d66c571@mail.gmail.com> References: <93E6A4D1-D835-4C14-B318-4269E46C45F3@tobyinkster.co.uk> <21e770780804150557x67ff3a57m7afc23007b0ed910@mail.gmail.com> <AD436B3D-DC5E-45B1-8BB3-02625F1A4C41@ben-ward.co.uk> <8ca3fbe80804151034i6c01d651j528013803d66c571@mail.gmail.com> Message-ID: <48050A9C.7000100@digitalbazaar.com> anders conbere wrote: > Some of you might recognize this as the stance the rdf takes with it's testing > > http://www.w3.org/TR/rdf-testcases/ It is also the approach that RDFa takes when checking parser conformance against the RDFa specification. Check out the RDFa Test Harness and Unit Tests: http://rdfa.digitalbazaar.com/rdfa-test-harness/ You can plug in different parsers and test them for conformance using the utility above - which has helped when tracking down parser issues. It also allows a developer to check their implementation against a test suite that the community has agreed upon. However, to get something like the above working for this community, we'd have to: - Agree on a parser specification (or set of specifications) for Microformats. - Agree on a serialization format for Microformats (JSON/XML/N3/etc). - Agree on a set of unit tests for Microformats. - Agree on a method of checking the results of parsers. In the RDFa community, this is what happened: - Agree on a parser specification: Standardized by the W3C - Agree on a serialization format: RDF - Agree on a set of unit tests : Standardized by the W3C - Agree on a method of checking the results of parsers: SPARQL -- manu From danny.ayers at gmail.com Tue Apr 15 14:00:29 2008 From: danny.ayers at gmail.com (Danny Ayers) Date: Tue Apr 15 14:07:40 2008 Subject: [uf-dev] Microformat Object Models In-Reply-To: <48050A9C.7000100@digitalbazaar.com> References: <93E6A4D1-D835-4C14-B318-4269E46C45F3@tobyinkster.co.uk> <21e770780804150557x67ff3a57m7afc23007b0ed910@mail.gmail.com> <AD436B3D-DC5E-45B1-8BB3-02625F1A4C41@ben-ward.co.uk> <8ca3fbe80804151034i6c01d651j528013803d66c571@mail.gmail.com> <48050A9C.7000100@digitalbazaar.com> Message-ID: <1f2ed5cd0804151400h74db8320rf025431d1f9bc8b1@mail.gmail.com> Using RDF as a model would have its advantages: * the W3C test harness could be reused * it's straightforward * some of the modelling has already been done * Semantic Web integration comes free at http://esw.w3.org/topic/CustomRdfDialects there are links to several microformat2rdfxml XSLT transformations - at least some of them are less-than-perfect, but should be good enough to bootstrap (incidentally a lot of the material there originated on a page called http://esw.w3.org/topic/MicroModels - it got rebranded :-) SPARQL-capable RDF tools are available for pretty much every language/platform, and test SPARQL would be pretty easy to write. SPARQL results can appear in XML or JSON - which could be handy in this context. http://www.w3.org/TR/rdf-sparql-json-res/ There's also a JSON syntax for RDF, and at least two online converters: http://n2.talis.com/wiki/RDF_JSON_Specification http://triplr.org http://convert.test.talis.com/ The RDF/JSON result would no doubt look different from the intended microformat/JSON, but it shouldn't take much script to convert for testing purposes. On 15/04/2008, Manu Sporny <msporny@digitalbazaar.com> wrote: Sorry Manu, nitpicking, expanding your shorthand - - Agree on a serialization format for Microformats (JSON/XML/N3/etc). presumably=> Agree on a Microformats model Agree on a serialization format for Microformats model (JSON/XML/N3/etc). - Agree on a serialization format: RDF presumably=> Agree on an RDF model : RDF (easy one that) Agree on a serialization format for RDF model : RDF/XML (I'm assuming) Cheers, Danny. -- http://dannyayers.com ~ http://blogs.talis.com/nodalities/this_weeks_semantic_web/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080415/189f6b13/attachment.html From rff.rff at gmail.com Thu Apr 17 04:53:40 2008 From: rff.rff at gmail.com (gabriele renzi) Date: Thu Apr 17 04:53:43 2008 Subject: [uf-dev] doubts intepreting the hListing spec draft Message-ID: <828083e70804170453hf8924balcab358e61a5210d4@mail.gmail.com> Hi everyone, this is my first post to this list so sorry if I ask something stupid, but I could not find details on this. I'm trying to write an hListing parser/extractor but there is something not clear in the draft spec page. The schema does not have reference to item type, which is then described later. I'd fix the page by myself but I'm not sure if we have to keep the item-type (fix schema) or if it's not there anymore (fix summary of changes+field details). Also, I'm not sure: where a field is described as hCard | (fn || email || url || tel) how shall I read the or's ? I believe that the single pipe is to be read as an exclusive or (use hcard or values), while the double pipe is inclusive (use fn, possibly with email, url etc), is this correct? If not is there documentation for this short-hand syntax somewhere? Thanks in advance. From lists at ben-ward.co.uk Thu Apr 17 06:12:18 2008 From: lists at ben-ward.co.uk (Ben Ward) Date: Thu Apr 17 06:12:23 2008 Subject: [uf-dev] doubts intepreting the hListing spec draft In-Reply-To: <828083e70804170453hf8924balcab358e61a5210d4@mail.gmail.com> References: <828083e70804170453hf8924balcab358e61a5210d4@mail.gmail.com> Message-ID: <09305870-B939-477D-8B41-9721C532BEB3@ben-ward.co.uk> Hi Gabriele, Thanks for posting about hListing. First up, the entire wiki page you're working from is due a BIG update, which I've got pending and which I'll follow through on very soon. I'm sorry for the delay on that. On 17 Apr 2008, at 12:53, gabriele renzi wrote: > The schema does not have reference to item type, which is then > described later. > I'd fix the page by myself but I'm not sure if we have to keep the > item-type (fix schema) or if it's not there anymore (fix summary of > changes+field details). So, item type somewhat conflicts with ?listing action? and also the inferred type from item itself (using hCalendar would imply being an event, for example). My advice right now is to ignore that field, or just parse <foo class="item"><bar class="type"> as plain text if you can find evidence of it being used (by way of example, we didn't publish ?type? on Kelkoo as the definition was fuzzy and we didn't want to accidentally steamroller it into the spec). > Also, I'm not sure: where a field is described as > hCard | (fn || email || url || tel) > how shall I read the or's ? That's just badly phrased. fn, email, url and tel are all fields of hcard; every lister should be an hcard (in spec terms, probably ?must? but until the draft is updated I'll avoid such firm terms). Thanks very much for your effort on hListing. If you've got any issues you find please post to the microformats-new list, or add them to the hlisting-issues page on the wiki: http://microformats.org/wiki/hlisting-issues Regards, Ben From rff.rff at gmail.com Thu Apr 17 07:05:08 2008 From: rff.rff at gmail.com (gabriele renzi) Date: Thu Apr 17 07:05:15 2008 Subject: [uf-dev] doubts intepreting the hListing spec draft In-Reply-To: <09305870-B939-477D-8B41-9721C532BEB3@ben-ward.co.uk> References: <828083e70804170453hf8924balcab358e61a5210d4@mail.gmail.com> <09305870-B939-477D-8B41-9721C532BEB3@ben-ward.co.uk> Message-ID: <828083e70804170705o1114f8a4ka3f71cb1e6579c8e@mail.gmail.com> On Thu, Apr 17, 2008 at 2:12 PM, Ben Ward <lists@ben-ward.co.uk> wrote: > Hi Gabriele, > > Thanks for posting about hListing. > > First up, the entire wiki page you're working from is due a BIG update, > which I've got pending and which I'll follow through on very soon. I'm sorry > for the delay on that. No worries, thanks for the quick and detailed answer. I'll ask on microformats-new if I find something else that's unclear to me, and wait for the updated spec. Meanwhile I'm more than happy to skip doubtful things :) -- blog it: http://riffraff.blogsome.com blog en: http://www.riffraff.info From microformats at kaply.com Mon Apr 21 10:38:29 2008 From: microformats at kaply.com (Mike Kaply) Date: Mon Apr 21 11:43:32 2008 Subject: [uf-dev] Proper use of value Message-ID: <e06e0e0b0804211038x3597f24ey63d58a546024755d@mail.gmail.com> Can someone please tell me if Roger Costellos examples for value (Pages 13, 14, and 15 here - http://www.xfront.com/microformats/hCard.html) are correct? There seems to be some confusion around whitespace with regards to value and I like to get it clarified so I do the right thing in FF3. Basically I am allowing all whitespace in "value" but apparently others are not. Also note that I don't get any notes to them mailing list for some strange reason, so please email me as well as the list. Thank you. Mike Kaply From mail at tobyinkster.co.uk Mon Apr 21 23:09:34 2008 From: mail at tobyinkster.co.uk (Toby A Inkster) Date: Mon Apr 21 23:09:41 2008 Subject: [uf-dev] Proper use of value Message-ID: <D6FFB98F-2B47-4097-AEF3-06505AB2FBB0@tobyinkster.co.uk> > Can someone please tell me if Roger Costellos examples for value > (Pages 13, 14, and 15 here - > http://www.xfront.com/microformats/hCard.html) are correct? They look OK to me. Thanks for posting the examples though because they've helped me fix an annoying bug in Cognition's handling of this. (It has some code for specifically avoiding trimming white space from value-excerpted parts, but that code wasn't being triggered correctly, and white space was being trimmed resulting in fn="JohnSmith". I've fixed it now and will include the fix in my next release.) -- Toby A Inkster <mailto:mail@tobyinkster.co.uk> <http://tobyinkster.co.uk> From microformats at kaply.com Tue Apr 22 07:53:57 2008 From: microformats at kaply.com (Mike Kaply) Date: Tue Apr 22 07:54:06 2008 Subject: [uf-dev] Proper use of value In-Reply-To: <D6FFB98F-2B47-4097-AEF3-06505AB2FBB0@tobyinkster.co.uk> References: <D6FFB98F-2B47-4097-AEF3-06505AB2FBB0@tobyinkster.co.uk> Message-ID: <e06e0e0b0804220753y11014d16y30d730d949201dd7@mail.gmail.com> On Tue, Apr 22, 2008 at 1:09 AM, Toby A Inkster <mail@tobyinkster.co.uk> wrote: > > Can someone please tell me if Roger Costellos examples for value > > (Pages 13, 14, and 15 here - > > http://www.xfront.com/microformats/hCard.html) are correct? > > > > They look OK to me. Thanks for posting the examples though because they've > helped me fix an annoying bug in Cognition's handling of this. (It has some > code for specifically avoiding trimming white space from value-excerpted > parts, but that code wasn't being triggered correctly, and white space was > being trimmed resulting in fn="JohnSmith". I've fixed it now and will > include the fix in my next release.) For the record, other parsers do this differently - they trim all whitespace (even in values). What's I'm looking for is the definitive answer as to what the "right thing" to do is. There are a ton of edge cases that are simply poorly defined within the microformats spec. Mike Kaply From brian.suda at gmail.com Tue Apr 22 09:57:21 2008 From: brian.suda at gmail.com (Brian Suda) Date: Tue Apr 22 09:57:23 2008 Subject: [uf-dev] Proper use of value In-Reply-To: <e06e0e0b0804220753y11014d16y30d730d949201dd7@mail.gmail.com> References: <D6FFB98F-2B47-4097-AEF3-06505AB2FBB0@tobyinkster.co.uk> <e06e0e0b0804220753y11014d16y30d730d949201dd7@mail.gmail.com> Message-ID: <21e770780804220957t18233eddjcae0e54693f515e8@mail.gmail.com> 2008/4/22, Mike Kaply <microformats@kaply.com>: > For the record, other parsers do this differently - they trim all > whitespace (even in values). --- we should certainly try to get them inline and decide on a single way to do this. > What's I'm looking for is the definitive answer as to what the "right > thing" to do is. There are a ton of edge cases that are simply poorly > defined within the microformats spec. --- i can't give you a definitive answer, but i think and parse any class="value" and do NOT trim white-space, but i do collapse it. Value is something extra that the user adds, so i take the assumption they know what they are doing and that they meant to include that space. (i do think i reduce multiple spaces, tabs, returns to a single space - i need to confirm this) There was/is also some parsers that intentionally ADD a space, i would say that this is incorrect. If we add this to the wiki as an issue, hopefully we can document a correct answer in some form, that way we have a reference for parser updates. -brian -- brian suda http://suda.co.uk From mail at tobyinkster.co.uk Tue Apr 22 11:01:28 2008 From: mail at tobyinkster.co.uk (Toby A Inkster) Date: Tue Apr 22 11:10:22 2008 Subject: [uf-dev] Proper use of value Message-ID: <DED7B391-6E99-4219-BAC1-0CBDC988A87B@tobyinkster.co.uk> Brian Suda wrote: > i can't give you a definitive answer, but i think and parse any > class="value" and do NOT trim white-space, but i do collapse it. Value > is something extra that the user adds, so i take the assumption they > know what they are doing and that they meant to include that space. (i > do think i reduce multiple spaces, tabs, returns to a single space - i > need to confirm this) For the record, the behaviour used by Cognition (or at least its intended behaviour - as I said, there is a bug in the latest released version pertaining to this issue) is: * Within each element with class="value", expanses of white space are collapsed into single spaces. * Within each element with class="value", white space is *not* trimmed from the beginning or end of the value (although it is collapsed as per above). * All the elements with class="value" are then joined together without any interleaving white space to form a combined string. * Within the combined string, expanses of white space are collapsed into single spaces. * Within the combined string, white space *is* trimmed from the beginning and end. In my experience, this seems to work well for the vast majority of real-world cases. (The percentage of pages that actually *use* multiple elements with class="value" for a single property is tiny anyway.) -- Toby A Inkster <mailto:mail@tobyinkster.co.uk> <http://tobyinkster.co.uk> From mkaply at us.ibm.com Tue Apr 22 12:00:27 2008 From: mkaply at us.ibm.com (Michael Kaply) Date: Tue Apr 22 12:00:42 2008 Subject: [uf-dev] Proper use of value Message-ID: <OF0C855401.4FD0009F-ON86257433.0067F6D6-86257433.0068694E@us.ibm.com> OK, how about this. When retrieving individual values from the documenting if there is any whitespace, it is collapsed into one space, and leading and trailing white space is NOT removed. After the values have been concatenated to create the final value, if there is any whitespace, it is collapsed into one space, and leading and trailing whitespace IS removed. So all of these: <fn> <value>John</value> <value> </value> <value>Doe</value> </fn> <fn> <value>John</value> <value> </value> <value>Doe</value> </fn> <fn> <value> John</value> <value> </value> <value>Doe </value> </fn> <fn> <value>John </value> <value> </value> <value> Doe</value> </fn> <fn> <value>John </value> <value> Doe</value> </fn> <fn> <value> John </value> <value> Doe </value> </fn> become |John Doe| but this: <fn> <value>John</value> <value>Doe</value> </fn> becomes |JohnDoe| Does that sound right? Michael Kaply Firefox Advocate mkaply@us.ibm.com http://www.kaply.com/weblog/ (External Blog) http://blogs.tap.ibm.com/weblogs/page/mkaply@us.ibm.com (Internal Blog) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080422/8ba8eeec/attachment.html From msporny at digitalbazaar.com Tue Apr 22 12:37:56 2008 From: msporny at digitalbazaar.com (Manu Sporny) Date: Tue Apr 22 12:38:20 2008 Subject: [uf-dev] Proper use of value In-Reply-To: <OF0C855401.4FD0009F-ON86257433.0067F6D6-86257433.0068694E@us.ibm.com> References: <OF0C855401.4FD0009F-ON86257433.0067F6D6-86257433.0068694E@us.ibm.com> Message-ID: <480E3E94.6050209@digitalbazaar.com> Michael Kaply wrote: > OK, how about this. > > When retrieving individual values from the documenting if there is any > whitespace, it is collapsed into one space, and leading and trailing > white space is NOT removed. Just my $0.02 on this - we had a very involved discussion (lasting several months) when tackling this problem at the W3C with regards to how to do whitespace canonicalization in RDFa. In the end, we stated that the parser should keep the original text as is (including all whitespace), and it's up to the application to normalize spaces in a way that makes sense to the application. Note that we make a strong distinction between the parser (eg: librdfa[1]) and the application using the parser (Firefox + Fuzzbot[2]). The primary reasoning for this is that several people had different ways that they wanted to canonicalize whitespace and at the end of the day, we didn't want to force application writers into a certain method of whitespace canonicalization. Here's the actual text that we settled upon at the W3C with regard to whitespace canonicalization: PLAIN LITERAL (aka: basic text) CANONICALIZATION: "The actual literal is ... a string created by concatenating the text content of each of the descendant elements of the [current element] in document order." This means that all new lines, tabs, spaces and other whitespace characters are preserved for processing at a later time by the application that is using the parser. I think the above is the proper approach - otherwise you end up with the issues that we had with whitespace canonicalization and Internet Explorer 6. IE6 assumes that you want the whitespace canonicalized in a certain way, thus the non-canonicalized whitespace isn't available in the DOM accessed via Javascript. When you choose to perform whitespace canonicalization in a certain way - you're bound to tick off a sub-set of developers/authors. :) Does this approach sound like a better one to take? -- manu [1] http://rdfa.digitalbazaar.com/librdfa/ [2] http://rdfa.digitalbazaar.com/fuzzbot/ -- Manu Sporny President/CEO - Digital Bazaar, Inc. blog: RDFa Basics in 8 minutes (video) http://blog.digitalbazaar.com/2008/01/07/rdfa-basics/ From rff.rff at gmail.com Tue Apr 22 12:41:10 2008 From: rff.rff at gmail.com (gabriele renzi) Date: Tue Apr 22 12:41:16 2008 Subject: [uf-dev] Proper use of value In-Reply-To: <OF0C855401.4FD0009F-ON86257433.0067F6D6-86257433.0068694E@us.ibm.com> References: <OF0C855401.4FD0009F-ON86257433.0067F6D6-86257433.0068694E@us.ibm.com> Message-ID: <828083e70804221241l27b42de2h809ebbe53257aef7@mail.gmail.com> On Tue, Apr 22, 2008 at 8:00 PM, Michael Kaply <mkaply@us.ibm.com> wrote: > > > OK, how about this. > > When retrieving individual values from the documenting if there is any > whitespace, it is collapsed into one space, and leading and trailing white > space is NOT removed. > > After the values have been concatenated to create the final value, if there > is any whitespace, it is collapsed into one space, and leading and trailing > whitespace IS removed. Isn't the first pass of removing multiple spaces implicit in the second pass? Is it different from just saying * concat all values * collapse whitespaces * trim ? anyway my modest opinion as an incompetent who joined this list just few days ago is that this sound correct :) From mail at tobyinkster.co.uk Tue Apr 22 13:38:47 2008 From: mail at tobyinkster.co.uk (Toby A Inkster) Date: Tue Apr 22 13:38:56 2008 Subject: [uf-dev] Proper use of value Message-ID: <0F65BA62-AEDC-4FB6-853F-B2ABB032BA2D@tobyinkster.co.uk> Manu Sporny wrote: > Just my $0.02 on this - we had a very involved discussion (lasting > several months) when tackling this problem at the W3C with regards to > how to do whitespace canonicalization in RDFa. In the end, we stated > that the parser should keep the original text as is (including all > whitespace), and it's up to the application to normalize spaces in > a way > that makes sense to the application. Unfortunately for some microformats, the parser *needs* to know about white space. The example which springs to mind is N-optimisation in hCard. This: <span class="fn">JohnDoe</span> is parsed as: FN:JohnDoe NICKNAME:JohnDoe Whereas this: <span class="fn">John Doe</span> is parsed as: FN:John Doe N:Doe;John In RDF terms, the white space in the object literal effects the choice of predicate. So it is important to know how white space should be interpreted, at least in some situations. -- Toby A Inkster <mailto:mail@tobyinkster.co.uk> <http://tobyinkster.co.uk> From msporny at digitalbazaar.com Tue Apr 22 19:00:07 2008 From: msporny at digitalbazaar.com (Manu Sporny) Date: Tue Apr 22 19:26:01 2008 Subject: [uf-dev] Proper use of value In-Reply-To: <0F65BA62-AEDC-4FB6-853F-B2ABB032BA2D@tobyinkster.co.uk> References: <0F65BA62-AEDC-4FB6-853F-B2ABB032BA2D@tobyinkster.co.uk> Message-ID: <480E9827.4020300@digitalbazaar.com> Toby A Inkster wrote: > Unfortunately for some microformats, the parser *needs* to know about > white space. The example which springs to mind is N-optimisation in > hCard. Hmm... That's not evident to me. I understand your point, and it's certainly valid - but there's a nuance. To say that the parser "*needs* to know about whitespace" is different from saying that "we should preserve the original whitespace". We can have both. My previous post stated differently could read: "As a general rule, we should preserve any and all whitespace in the parser model. Only when the information is displayed or exported from the parser model should we canonicalize whitespace, and only when it makes sense to do so." > This: > > <span class="fn">JohnDoe</span> > > is parsed as: > > FN:JohnDoe > NICKNAME:JohnDoe > > Whereas this: > > <span class="fn">John Doe</span> > > is parsed as: > > FN:John Doe > N:Doe;John > > In RDF terms, the white space in the object literal effects the choice > of predicate. So it is important to know how white space should be > interpreted, at least in some situations. I don't think the above is a good example. I'm racking my brain to come up with a reason to canonicalize whitespace in the parser. I don't think throwing away the original stuff buys us anything. For example: <span class="fn"> John Doe </span> <span class="fn">John Doe</span> Both of the above would parse to: FN:John Doe N:Doe;John However, I think the proper thing to give the developer back when they ask for the contents of FN should be " John Doe ". The application can then make the decision to canonicalize the whitespace when a) displaying it in an interface or b) exporting it to another format, such as VCARD. As far as the example you gave above... I would expect that the hCard optimization step would be performed after the parser acquired all of the data from the page. FN would contain " John Doe ", and thus the N-optimization would trim all whitespace, split the string and encode it as "Doe;John". In other words, N-optimization is a post-processing step performed after the parser-proper runs. -- manu -- Manu Sporny President/CEO - Digital Bazaar, Inc. blog: RDFa Basics in 8 minutes (video) http://blog.digitalbazaar.com/2008/01/07/rdfa-basics/ From mkaply at us.ibm.com Wed Apr 23 09:11:01 2008 From: mkaply at us.ibm.com (Michael Kaply) Date: Wed Apr 23 13:41:17 2008 Subject: [uf-dev] Proper use of value In-Reply-To: <480E9827.4020300@digitalbazaar.com> Message-ID: <OFD2BF50DC.6AB2156C-ON86257434.0058AC00-86257434.0058E641@us.ibm.com> I think the RDF situation is very different than the microformats with regards to the whitespace problem. With microformats, you are adding the microformat classes to existing content, so you are probably putting them around a lot of various whitespace (carriage returns, line feed, etc.) With RDF, things are done a little more granular. I think parsers should definitely remove the whitespace because what we are making available should equate to the HTML content and the HTML content has whitespace collapsed and removed. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080423/58468ac8/attachment.html From msporny at digitalbazaar.com Wed Apr 23 14:14:04 2008 From: msporny at digitalbazaar.com (Manu Sporny) Date: Wed Apr 23 14:14:28 2008 Subject: [uf-dev] Proper use of value In-Reply-To: <OFD2BF50DC.6AB2156C-ON86257434.0058AC00-86257434.0058E641@us.ibm.com> References: <OFD2BF50DC.6AB2156C-ON86257434.0058AC00-86257434.0058E641@us.ibm.com> Message-ID: <480FA69C.5000308@digitalbazaar.com> Michael Kaply wrote: > I think the RDF situation is very different than the microformats with > regards to the whitespace problem. > > With microformats, you are adding the microformat classes to existing > content, so you are > probably putting them around a lot of various whitespace (carriage > returns, line feed, etc.) Hmm... do you mean RDF or RDFa? :) If you mean RDF, then yes I agree - the two situations are very different. If you mean RDFa, then I don't agree as insertion of RDFa and Microformats into pre-existing XHTML is done in more-or-less the same way. The majority of the RDFa use cases have RDFa added to existing XHTML web pages... so I believe the same whitespace issues exist for RDFa as they do for Microformats. > I think parsers should definitely remove the whitespace because what we > are making available should equate to the > HTML content and the HTML content has whitespace collapsed and removed. What about PRE tags? Or the use of any CSS 'white-space'[1] style that isn't 'normal'. This is important in poetry and other pre-formatted text on the net. For example: <span style="white-space: pre-line> A crash reduces Your expensive computer To a simple stone. </span> By stating that uF parsers should remove whitespace, we're unnecessarily invalidating all of those use cases. -- manu [1] http://webdesign.about.com/od/styleproperties/p/blspwhitespace.htm From brian.suda at gmail.com Thu Apr 24 04:17:40 2008 From: brian.suda at gmail.com (Brian Suda) Date: Thu Apr 24 04:17:45 2008 Subject: [uf-dev] Proper use of value In-Reply-To: <OF0C855401.4FD0009F-ON86257433.0067F6D6-86257433.0068694E@us.ibm.com> References: <OF0C855401.4FD0009F-ON86257433.0067F6D6-86257433.0068694E@us.ibm.com> Message-ID: <21e770780804240417jb245e75o5389e7bbdb6614be@mail.gmail.com> 2008/4/22, Michael Kaply <mkaply@us.ibm.com>: > OK, how about this. > So all of these: > > <fn> > <value>John</value> > <value> </value> > <value>Doe</value> > </fn> > <fn> > <value>John</value> > <value> </value> > <value>Doe</value> > </fn> > <fn> > <value> John</value> > <value> </value> > <value>Doe </value> > </fn> > <fn> > <value>John </value> > <value> </value> > <value> Doe</value> > </fn> > <fn> > <value>John </value> > <value> Doe</value> > </fn> > <fn> > <value> John </value> > <value> Doe </value> > </fn> > > become > > |John Doe| > > but this: > > <fn> > <value>John</value> > <value>Doe</value> > </fn> > > becomes > > > |JohnDoe| > > Does that sound right? --- i agree, this is what i personally would expect. It would need to be codified somehow, but (i think) this is what X2V already does. We could make a simple test page and add it to the test suite if you think it would help? -brian -- brian suda http://suda.co.uk From mdagn at spraci.com Mon Apr 28 21:01:12 2008 From: mdagn at spraci.com (Michael MD) Date: Mon Apr 28 21:01:15 2008 Subject: [uf-dev] Proper use of value References: <OFD2BF50DC.6AB2156C-ON86257434.0058AC00-86257434.0058E641@us.ibm.com> <480FA69C.5000308@digitalbazaar.com> Message-ID: <002601c8a9ad$aaa11960$116bacca@COMCEN> > What about PRE tags? Or the use of any CSS 'white-space'[1] style that > isn't 'normal'. This is important in poetry and other pre-formatted text > on the net. > > For example: > > <span style="white-space: pre-line> > A crash reduces > Your expensive computer > To a simple stone. > </span> > > By stating that uF parsers should remove whitespace, we're unnecessarily > invalidating all of those use cases. its a tricky one ... I can think of some cases where removing whitesapace can be a problem and others where keeping it is a problem... Perhaps a new line should be treated differently to something like a space or tab? ...or perhaps its better to preserve them in the parser and let the application handle them in an appropriate way? From contact at lumieredelune.com Tue Apr 29 12:16:30 2008 From: contact at lumieredelune.com (=?Windows-1252?Q?Lumi=E8re_de_Lune?=) Date: Tue Apr 29 12:16:20 2008 Subject: [uf-dev] Problems with importation of a hcard as a vCard in Outlook Message-ID: <00e101c8aa2d$8914c870$6701a8c0@PARACOU> Hello, I'm not sure this is the right to post, hope I did not make the wrong choice ? I'm currently creating a hCard for my website. The website is in XHTML and utf-8 And the Hcard has a accented character : ? I tried with the three options possible in the source code (?, &grave; and &#232; ) and the two import protocols of Technorati and XV2 and both produce "strange" characters in Outlook. Strange meaning this kind of wrong character you got when you've got the wrong encoding. I also noticed that on the Wiki, or other sites, I could not find any example of a Hcard with accents? Any idea to solve this problem would be highly appreciated. -- Marie-Aude -------------- next part -------------- An HTML attachment was scrubbed... URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080429/c7ff0712/attachment-0001.html From contact at lumieredelune.com Tue Apr 29 12:22:28 2008 From: contact at lumieredelune.com (=?Windows-1252?Q?Lumi=E8re_de_Lune?=) Date: Tue Apr 29 12:22:15 2008 Subject: [uf-dev] Problems with importation of a hcard as a vCard in Outlook Message-ID: <00ed01c8aa2e$5cfc3830$6701a8c0@PARACOU> From jason.karns at gmail.com Tue Apr 29 12:57:33 2008 From: jason.karns at gmail.com (Jason Karns) Date: Tue Apr 29 12:57:37 2008 Subject: [uf-dev] Include-Pattern Infinite Loop Test Cases Message-ID: <1005d65f0804291257x35022f49vdf96a4499796bfc7@mail.gmail.com> I've been working on a simple JavaScript pre-parser of sorts. It is designed to follow all include references (local references only, of course) and produces a DOM with all includes replaced by the referenced subtrees. This is a call to all current microformat parser implementers to produce infinite loop test cases so that I might fully test my implementation before porting it to other languages. If successful, I plan to post the algorithm as well as various language implementations in the hope that existing tools may be able to easily add support for the include-pattern, without falling back to arbitrary max-recursion numbers. Thanks, Jason Karns From brian.suda at gmail.com Tue Apr 29 16:38:53 2008 From: brian.suda at gmail.com (Brian Suda) Date: Tue Apr 29 16:38:57 2008 Subject: [uf-dev] Problems with importation of a hcard as a vCard in Outlook In-Reply-To: <00e101c8aa2d$8914c870$6701a8c0@PARACOU> References: <00e101c8aa2d$8914c870$6701a8c0@PARACOU> Message-ID: <21e770780804291638x415f84b8of832aabae0010fa2@mail.gmail.com> 2008/4/29, Lumi?re de Lune <contact@lumieredelune.com>: > Hello, > I'm currently creating a hCard for my website. The website is in XHTML and > utf-8 --- do you have a public url we could test against? > And the Hcard has a accented character : ? > > I tried with the three options possible in the source code (?, &grave; and > &#232; ) and the two import protocols of Technorati and XV2 and both produce > "strange" characters in Outlook. --- once we have a url, we can test to see if this is an issue with the transformation or with Outlook. Which version of Outlook are you using? There is a list of known issues here: http://microformats.org/wiki/vcard-implementations -brian -- brian suda http://suda.co.uk From contact at lumieredelune.com Tue Apr 29 17:42:39 2008 From: contact at lumieredelune.com (=?US-ASCII?Q?Lumiere_de_Lune?=) Date: Tue Apr 29 17:42:48 2008 Subject: [uf-dev] Problems with importation of a hcard as a vCard inOutlook In-Reply-To: <21e770780804291638x415f84b8of832aabae0010fa2@mail.gmail.com> References: <00e101c8aa2d$8914c870$6701a8c0@PARACOU> <21e770780804291638x415f84b8of832aabae0010fa2@mail.gmail.com> Message-ID: <010a01c8aa5b$181ba570$6701a8c0@PARACOU> 2008/4/29, Brian Suda said >do you have a public url we could test against? Now yes (it was on localhost) http://www.lumieredelune.com/res/tpl/vcardTest.php >once we have a url, we can test to see if this is an issue with >the transformation or with Outlook. Which version of Outlook are you >using? I'm using Outlook 2003 on XP SP2, with a French system. I asked two friends, with Outlook 2007 and SP and a German system, and one with Outlook 2003, XP and an English system, and both of them see experience the same problem. >There is a list of known issues here: >http://microformats.org/wiki/vcard-implementations Is it better to post directly there ? Thank you for your help -- Marie-Aude http://www.lumieredelune.com From gordon at onlinehome.de Wed Apr 30 00:15:33 2008 From: gordon at onlinehome.de (Gordon Oheim) Date: Wed Apr 30 00:20:53 2008 Subject: [uf-dev] Problems with importation of a hcard as a vCard inOutlook Message-ID: <48181C95.4070802@onlinehome.de> I have encountered the same issue just recently. I don't think it is an issue with Brian's script. If you save the generated vCard to your desktop and open it with Notepad, all characters are fine. If you open the vCard on a Mac, it is fine too. It is only when you open the vCard with Windows Address Book or Outlook that the characters are broken. Probably due to the encoding used within Outlook. This is a common problem when using UTF-8 encoded content in Windows applications. Cheers, Gordon From mail at tobyinkster.co.uk Wed Apr 30 00:30:29 2008 From: mail at tobyinkster.co.uk (Toby A Inkster) Date: Wed Apr 30 00:30:42 2008 Subject: [uf-dev] Problems with importation of a hcard as a vCard inOutlook Message-ID: <BA21E673-A9AA-4F6E-BBD6-4E80F31E2B96@tobyinkster.co.uk> Lumiere de Lune wrote: > http://www.lumieredelune.com/res/tpl/vcardTest.php It does appear to be an Outlook-specific error. I've tried converting to vCard with both Cognition and X2V and adding to Apple Address Book, and the accent in the organisation name is imported perfectly. -- Toby A Inkster <mailto:mail@tobyinkster.co.uk> <http://tobyinkster.co.uk>