From andy at pigsonthewing.org.uk Tue Jan 1 18:28:51 2008 From: andy at pigsonthewing.org.uk (Andy Mabbett) Date: Tue Jan 1 22:00:27 2008 Subject: [uf-dev] abbr-design-pattern within microformat attribute values In-Reply-To: <21e770780801011625k42e893ffy139fff7f354a0206@mail.gmail.com> References: <21e770780801011625k42e893ffy139fff7f354a0206@mail.gmail.com> Message-ID: <13$XkDdjbveHFwCg@pigsonthewing.org.uk> [Cross-posted, so original quoted in full] In message <21e770780801011625k42e893ffy139fff7f354a0206@mail.gmail.com>, Brian Suda writes >On 01/01/2008, Andy Mabbett wrote: >> In message , Andy Mabbett >> writes >> >>I have a page at: >> >> >> >>which uses the prototype "species" microformat. >> >>Operator returns for example: >> >> trinominal=Mergus m. merganser >> >>from source code: >> >> >> Mergus >> m. >> merganser >> >> >>which I would have expected to return: >> >> trinominal=Mergus merganser merganser >> >>since that is the "semantic" meaning of the cited source-code. >> >>However, I don't think the abbr-design-pattern specification is clear >>on this point. >> >>Am I correct? If so, should the abbr-design-pattern specification be >>updated, and/or an appropriate example be included? >> >>How do other parsers handle embedded abbr, in other microformats, >such >>as, say: >> >> >> New John >> St. >> West >> >> >>(a real example; compare: & >>). >--- this is more of a question for the dev-list. OK; cross-posted and follow-ups set. I've shown my whole post, above, for that reason. > I would disagree that this is the correct interpretation. For what reason? I can't see the logic behind any other interpretation. Consider, with the " Implied n Optimization" rule in mind, the "given-name" in each of: F. Smith and F. Smith Surely they should be the same? Or consider: F. Smith and: F. Smith Again surely these should be equivalent values? In the "New John Street West" example, where has the valid and meaningful data "street" gone, if the abbr is not expanded? The author clearly intends it to be present. Again, consider: Saint Phil's Church Saint Phil's Church Which of those would the rendered "Saint Phil's Church" represent? >This has come-up before on the dev-list when dealing with looking into >child-elements. Citations/ URLs would be useful, please. There doesn't seem to be any record or summary on the wiki. -- Andy Mabbett From msporny at digitalbazaar.com Thu Jan 3 21:47:35 2008 From: msporny at digitalbazaar.com (Manu Sporny) Date: Thu Jan 3 21:47:38 2008 Subject: [uf-dev] JSON representation of semantic objects Message-ID: <477DC877.7090102@digitalbazaar.com> We've been struggling with writing user action scripts for Operator for hAudio. The problem comes in when one has to operate on semantic objects. Currently, there are two different ways of representing semantic data in Operator, one for eRDF/RDFa and another for Microformats. This is quite a headache for anybody trying to develop an action script that operates on 'vcards', 'vevents' or 'haudio'. Ideally, the action script shouldn't care where the semantic object came from (eRDF/RDFa/uF/whatever). The problem gets more complicated when applications want to share the semantic objects that they've discovered between each other. In short, we don't have a serialization format that I know of... and it would be nice if we gave developers a framework to follow. We've been attempting to solve this problem using JSON serialization of Microformats using hAudio as a testbed, it seems to be a viable candidate: http://microformats.org/wiki/haudio-serialization We would like to see this implemented in Operator as it will make it easier to develop user scripts. It will also help Operator export semantic objects that it detects on pages to other Firefox plug-ins that consume semantic objects. So what do people think? There is interest from the W3C/RDFa groups on solving this problem, would this community be interested in contributing to the discussion? Is it time for a serialization format for semantic objects? -- manu From ryan at theryanking.com Fri Jan 4 10:58:05 2008 From: ryan at theryanking.com (ryan) Date: Fri Jan 4 11:04:09 2008 Subject: [uf-dev] JSON representation of semantic objects In-Reply-To: <477DC877.7090102@digitalbazaar.com> References: <477DC877.7090102@digitalbazaar.com> Message-ID: <52D78582-CCB7-4567-AB3F-51A33532A9E2@theryanking.com> We already have JSON for hreview in the test suite. I'd be willing to create JSON for hCard and hCalendar, too. In fact, I've done it before several times (not publicly available), so its just a matter of writing it all down. -ryan On Jan 3, 2008, at 9:47 PM, Manu Sporny wrote: > We've been struggling with writing user action scripts for Operator > for > hAudio. The problem comes in when one has to operate on semantic > objects. > > Currently, there are two different ways of representing semantic > data in > Operator, one for eRDF/RDFa and another for Microformats. This is > quite > a headache for anybody trying to develop an action script that > operates > on 'vcards', 'vevents' or 'haudio'. > > Ideally, the action script shouldn't care where the semantic object > came > from (eRDF/RDFa/uF/whatever). > > The problem gets more complicated when applications want to share the > semantic objects that they've discovered between each other. In short, > we don't have a serialization format that I know of... and it would be > nice if we gave developers a framework to follow. > > We've been attempting to solve this problem using JSON > serialization of > Microformats using hAudio as a testbed, it seems to be a viable > candidate: > > http://microformats.org/wiki/haudio-serialization > > We would like to see this implemented in Operator as it will make it > easier to develop user scripts. It will also help Operator export > semantic objects that it detects on pages to other Firefox plug-ins > that > consume semantic objects. > > So what do people think? There is interest from the W3C/RDFa groups on > solving this problem, would this community be interested in > contributing > to the discussion? Is it time for a serialization format for semantic > objects? > > -- manu > > _______________________________________________ > microformats-dev mailing list > microformats-dev@microformats.org > http://microformats.org/mailman/listinfo/microformats-dev From msporny at digitalbazaar.com Fri Jan 4 13:12:39 2008 From: msporny at digitalbazaar.com (Manu Sporny) Date: Fri Jan 4 13:58:11 2008 Subject: [uf-dev] JSON representation of semantic objects In-Reply-To: <52D78582-CCB7-4567-AB3F-51A33532A9E2@theryanking.com> References: <477DC877.7090102@digitalbazaar.com> <52D78582-CCB7-4567-AB3F-51A33532A9E2@theryanking.com> Message-ID: <477EA147.5050605@digitalbazaar.com> ryan wrote: > We already have JSON for hreview in the test suite. I'd be willing to > create JSON for hCard and hCalendar, too. In fact, I've done it before > several times (not publicly available), so its just a matter of writing > it all down. That would be really helpful! Could you do it in this style, just so we can provide similar pages for all Microformats when/if the time comes?: http://microformats.org/wiki/haudio-serialization I've put a bit of work into mapping hCard to/from Vcard RDF, here: http://wiki.digitalbazaar.com/en/Mapping-ufs-to-rdfa#hCard_uF_and_vCard_RDFa There is also a section on mapping hCalender to/from Vevent RDF, here: http://wiki.digitalbazaar.com/en/Mapping-ufs-to-rdfa#hCalendar_uF_and_vCal_RDFa I don't know if you mean JSON-RDF or just a pure JSON representation of a Microformatted object? Both are provided for the haudio-serialization page, but we really should get behind one as a community. The JSON-RDF approach seems to be the right solution because it doesn't loose any data, but saying that now may be putting the cart before the horse... -- manu From kevinmarks at gmail.com Fri Jan 4 14:39:34 2008 From: kevinmarks at gmail.com (Kevin Marks) Date: Fri Jan 4 14:39:36 2008 Subject: [uf-dev] JSON representation of semantic objects In-Reply-To: <477EA147.5050605@digitalbazaar.com> References: <477DC877.7090102@digitalbazaar.com> <52D78582-CCB7-4567-AB3F-51A33532A9E2@theryanking.com> <477EA147.5050605@digitalbazaar.com> Message-ID: <73766b160801041439i6ca27e7bk85912c0288364a61@mail.gmail.com> Both those seriliazations look remarkably verbose. Why doesn't:
Start Wearing Purple by Gogol Bordello found on Underdog World Strike
just map to: {"fn":"Start Wearing Purple", "contributor":"Gogol Bordello", "album":"Underdog World Strike"} instead of the multi-crufted: { "_:haudio1" : { "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" : [ { "value" : "http://xmlns.com/haudio/1.0/Album", "type" : "uri"} ], "http://purl.org/dc/elements/1.1/title" : [ { "value" : "Start Wearing Purple", "type" : "literal", "lang" : "en" } ], "http://purl.org/dc/elements/1.1/contributor" : [ { "value" : "Gogol Bordello", "type" : "literal", "lang" : "en" } ], "http://xmlns.com/hmedia/1.0/contains" : [ { "value" : "_:haudio2", "type" : "bnode" } ] }, "_:haudio2" : { "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" : [ { "value" : "http://xmlns.com/haudio/1.0/Recording", "type" : "uri"} ], "http://purl.org/dc/elements/1.1/title" : [ { "value" : "Start Wearing Purple", "type" : "literal", "lang" : "en" } ], "http://purl.org/dc/elements/1.1/contributor" : [ { "value" : "Gogol Bordello", "type" : "literal", "lang" : "en" } ] } } On Jan 4, 2008 1:12 PM, Manu Sporny wrote: > > ryan wrote: > > We already have JSON for hreview in the test suite. I'd be willing to > > create JSON for hCard and hCalendar, too. In fact, I've done it before > > several times (not publicly available), so its just a matter of writing > > it all down. > > That would be really helpful! Could you do it in this style, just so we > can provide similar pages for all Microformats when/if the time comes?: > > > http://microformats.org/wiki/haudio-serialization > > I've put a bit of work into mapping hCard to/from Vcard RDF, here: > > http://wiki.digitalbazaar.com/en/Mapping-ufs-to-rdfa#hCard_uF_and_vCard_RDFa > > There is also a section on mapping hCalender to/from Vevent RDF, here: > > http://wiki.digitalbazaar.com/en/Mapping-ufs-to-rdfa#hCalendar_uF_and_vCal_RDFa > > I don't know if you mean JSON-RDF or just a pure JSON representation of > a Microformatted object? Both are provided for the haudio-serialization > page, but we really should get behind one as a community. The JSON-RDF > approach seems to be the right solution because it doesn't loose any > data, but saying that now may be putting the cart before the horse... > > > > > -- manu > > _______________________________________________ > microformats-dev mailing list > microformats-dev@microformats.org > http://microformats.org/mailman/listinfo/microformats-dev > From msporny at digitalbazaar.com Fri Jan 4 20:06:48 2008 From: msporny at digitalbazaar.com (Manu Sporny) Date: Fri Jan 4 20:27:57 2008 Subject: [uf-dev] JSON representation of semantic objects In-Reply-To: <73766b160801041439i6ca27e7bk85912c0288364a61@mail.gmail.com> References: <477DC877.7090102@digitalbazaar.com> <52D78582-CCB7-4567-AB3F-51A33532A9E2@theryanking.com> <477EA147.5050605@digitalbazaar.com> <73766b160801041439i6ca27e7bk85912c0288364a61@mail.gmail.com> Message-ID: <477F0258.8060902@digitalbazaar.com> Kevin Marks wrote: > Both those seriliazations look remarkably verbose. Why doesn't: > >
> Start Wearing Purple by > Gogol Bordello > found on > Underdog World Strike >
> > just map to: > > {"fn":"Start Wearing Purple", "contributor":"Gogol Bordello", > "album":"Underdog World Strike"} Several reasons: 1. We're going for a format that encapsulates all of the information that an RDFa, eRDF or Microformats parser can generate. This format is for developers - not for publishers, designers nor everyday folks. Developers don't like it when you start chucking away data that should be accessible to them (such as the encoding language of a data item, or the type of a data item). 2. RDFa and eRDF can have a variety of vocabularies, of which Microformats are a semantic subset. We can't boil "http://purl.org/dc/elements/1.1/title" down to "title" or "fn" for an RDF semantic object because we lose meaning... which is bad. 3. The serialization approach you give is not scalable - or rather, to make it scale, we have to come up with another representation that the eRDF, RDFa and Microformats communities must agree on. It's fairly straightforward to map Microformats to RDF vocabularies - the opposite however, is quite difficult. While the example you give is the simplest representation of a semantic object, it is not good enough if we want to do an acceptable job of representing semantic objects generated by the RDFa and eRDF parsers in Operator (and other applications to follow). -- manu From uf-discuss at cilux.org Sun Jan 6 14:52:54 2008 From: uf-discuss at cilux.org (Duncan Cragg) Date: Sun Jan 6 14:52:59 2008 Subject: [uf-dev] JSON representation of semantic objects In-Reply-To: <477F0258.8060902@digitalbazaar.com> References: <477DC877.7090102@digitalbazaar.com> <52D78582-CCB7-4567-AB3F-51A33532A9E2@theryanking.com> <477EA147.5050605@digitalbazaar.com> <73766b160801041439i6ca27e7bk85912c0288364a61@mail.gmail.com> <477F0258.8060902@digitalbazaar.com> Message-ID: <5b5fe14a0801061452m6db91f89p91f0ccc557d4b0c9@mail.gmail.com> Kevin - you missed off the wrapper - 'haudio' itself. Would this be simple enough still? : { "haudio": {"fn":"Start Wearing Purple", "contributor":"Gogol Bordello", "album":"Underdog World Strike" } } The wrapping tag provides context that disambiguates the sub-elements .. .. and that could be the crux of this whole debate! > Several reasons: > 1. .. > 2. .. > 3. .. > While the example you give is the simplest representation of a semantic > object, it is not good enough .. Any lossy serialisation depends on common understanding and agreement to fill in the missing information with assumption and convention. In other words, once triggered by 'haudio', it is possible for the simple representation to carry as much actual /information/ as the complex one. This is related to the 'no namespaces' arguments we all know. It depends on the volume of prior, mutual and globally shared agreement you can achieve. It's going to be a matter of feeling, not logic, I suspect, on which side you fall - but anyone venturing onto uf-discuss with discussion around 'competing' technologies needs to know they're not likely to meet people who feel the same on core issues. =0) _________________________________ Duncan Cragg Web Application Architect The Financial Times Group (UK) http://www.ft.com http://duncan-cragg.org/blog/ From andy at pigsonthewing.org.uk Sun Jan 6 14:56:23 2008 From: andy at pigsonthewing.org.uk (Andy Mabbett) Date: Sun Jan 6 14:57:39 2008 Subject: [uf-dev] Technorati events tool: suggested improvement Message-ID: It would be good if the Technorati events tool: could recognise the IDs of an individual event's hCalendar: (e.g. ) and provide an .ics file with just that event, rather than all the events on the page (as does X2V). -- Andy Mabbett From ryan at theryanking.com Sun Jan 6 17:23:55 2008 From: ryan at theryanking.com (ryan) Date: Sun Jan 6 17:23:58 2008 Subject: [uf-dev] Technorati events tool: suggested improvement In-Reply-To: References: Message-ID: On Jan 6, 2008, at 2:56 PM, Andy Mabbett wrote: > > It would be good if the Technorati events tool: > > > > could recognise the IDs of an individual event's hCalendar: > > (e.g. 2008/01.htm#D20080110a>) > > and provide an .ics file with just that event, rather than all the > events on the page (as does X2V). They both already do this (they run the same code): http://feeds.technorati.com/events/http%3A// www.westmidlandbirdclub.com/diary/2008/01.htm%23D20080110a -ryan From lists at allinthehead.com Mon Jan 7 09:57:14 2008 From: lists at allinthehead.com (Drew McLellan) Date: Mon Jan 7 12:21:34 2008 Subject: [uf-dev] hkit parser (PHP5) now on Google Code Message-ID: <24172403-A095-4260-B2DF-7A4BE4F075A8@allinthehead.com> In order to give more visibility to the progress being made and to enable easier contributions, hkit is now hosted on Google Code: http://hkit.googlecode.com/ There's a public svn repository where the very latest updates can be grabbed for anyone working on the throbbing edge. If anyone's interested in contributing profiles or patches, there's a wiki with a roadmap and known issues. The license is the same as before - basically it's all just a bit more visible and better organised. drew. From andy at pigsonthewing.org.uk Mon Jan 7 13:34:02 2008 From: andy at pigsonthewing.org.uk (Andy Mabbett) Date: Mon Jan 7 13:35:46 2008 Subject: [uf-dev] Technorati events tool: suggested improvement In-Reply-To: References: Message-ID: <+lL1guOKrpgHFwTF@pigsonthewing.org.uk> In message , ryan writes >On Jan 6, 2008, at 2:56 PM, Andy Mabbett wrote: >> >> It would be good if the Technorati events tool: >> >> >> >> could recognise the IDs of an individual event's hCalendar: >> >> (e.g. >2008/01.htm#D20080110a>) >> >> and provide an .ics file with just that event, rather than all the >> events on the page (as does X2V). > >They both already do this I've just tried that again, and, as before, it's returning all the events on the page. >(they run the same code) I was under the impression that Technorati used an earlier release of the code. -- Andy Mabbett From scott at randomchaos.com Mon Jan 7 14:55:39 2008 From: scott at randomchaos.com (Scott Reynen) Date: Mon Jan 7 14:55:50 2008 Subject: [uf-dev] Technorati events tool: suggested improvement In-Reply-To: References: Message-ID: On Jan 6, 2008, at 6:23 PM, ryan wrote: > http://feeds.technorati.com/events/http%3A//www.westmidlandbirdclub.com/diary/2008/01.htm%23D20080110a On Jan 7, 2008, at 2:34 PM, Andy Mabbett wrote: > I've just tried that again, and, as before, it's returning all the > events on the page. I just tried the link above and it returned a single event. I'm guessing you're instead entering the URL into Technorati's form, which wrongly unescapes the # in the submitted URL, so it's never passed to the server. Peace, Scott From brian.suda at gmail.com Wed Jan 16 02:05:11 2008 From: brian.suda at gmail.com (Brian Suda) Date: Wed Jan 16 02:05:15 2008 Subject: [uf-dev] ADR Multiple instances of children Message-ID: <21e770780801160205w45c663cer4dbecb8d8706915f@mail.gmail.com> I have been writing some code dealing with the ADR and hCard microformats and ran into something that i thought was documented, but i can't seem to find a reference. If we can get a quick consensus, then i can update the wiki as needed. Specifically, children properties of an ADR (street-address, extended-address, etc.) Some references say these can be 0-1 instance, others say 0-N. The vCard spec is not exactly clear, all it says is: The text components are separated by the SEMI-COLON character (ASCII decimal 59). Where it makes semantic sense, individual text components can include multiple text values (e.g., a "street" component with multiple lines) separated by the COMMA character (ASCII decimal 44). So in a vCard a multiple extended-address could be something like: "Building A,Suite 1" and in HTML it would be Building A, Suite 1 Basically, any parser that finds multiple instances of children ADR properties, should just concatenate them together with a ',' I know we talked about this before, but i can't find a reference. We do have a test case, but it is only for street-address (we need formally write what happens for the other terms). Thanks, -brian -- brian suda http://suda.co.uk From andy at pigsonthewing.org.uk Wed Jan 16 03:03:05 2008 From: andy at pigsonthewing.org.uk (Andy Mabbett) Date: Wed Jan 16 03:03:08 2008 Subject: [uf-dev] ADR Multiple instances of children In-Reply-To: <21e770780801160205w45c663cer4dbecb8d8706915f@mail.gmail.com> References: <21e770780801160205w45c663cer4dbecb8d8706915f@mail.gmail.com> Message-ID: <52671.80.249.57.38.1200481385.squirrel@www.gradwell.com> On Wed, January 16, 2008 10:05, Brian Suda wrote: > If we can get a quick consensus, then i can update the wiki as needed. [...] > Basically, any parser that finds multiple instances of children ADR > properties, should just concatenate them together with a ',' So long as it's clearly documented (perhaps as "MUST" rather than "should"?), I would support that, so: +1 It could also aid the current "fn + [place] optimisation" proposal: Adrian Boult Hall, second floor fn = Adrian Boult Hall extended-address = Adrian Boult Hall, second floor > I know we talked about this before, but i can't find a reference. We > do have a test case, but it is only for street-address (we need formally > write what happens for the other terms). Would there be and "adr-children" where this did not work? -- Andy Mabbett ** via webmail ** From spike at tenbus.co.uk Wed Jan 16 11:34:14 2008 From: spike at tenbus.co.uk (Webadmin - Tenbus) Date: Wed Jan 16 11:34:19 2008 Subject: [uf-dev] ADR Multiple instances of children In-Reply-To: <21e770780801160205w45c663cer4dbecb8d8706915f@mail.gmail.com> References: <21e770780801160205w45c663cer4dbecb8d8706915f@mail.gmail.com> Message-ID: <478E5C36.1040700@tenbus.co.uk> Brian Suda wrote: > Specifically, children properties of an ADR (street-address, > extended-address, etc.) Some references say these can be 0-1 instance, > others say 0-N. The vCard spec is not exactly clear, all it says is: > > The text components are separated by the SEMI-COLON character (ASCII > decimal 59). Where it makes semantic sense, individual text > components can include multiple text values (e.g., a "street" > component with multiple lines) separated by the COMMA character > (ASCII decimal 44). > > So in a vCard a multiple extended-address could be something like: > "Building A,Suite 1" > and in HTML it would be > Building A, class="extended-address">Suite 1 > > Basically, any parser that finds multiple instances of children ADR > properties, should just concatenate them together with a ',' > I would certainly support this suggestion Brian! I would also like this to apply to the locality child as well. In the UK we frequently have an address entity that fits somewhere between the locality and street-address. On my own websites I've invented a field (not passed to hCard) that I call "extended-locality". Two comma-seperated values in the "locality" would work fine for me. Regards Spike From ryan at theryanking.com Wed Jan 16 15:22:36 2008 From: ryan at theryanking.com (ryan) Date: Wed Jan 16 15:22:33 2008 Subject: [uf-dev] ADR Multiple instances of children In-Reply-To: <21e770780801160205w45c663cer4dbecb8d8706915f@mail.gmail.com> References: <21e770780801160205w45c663cer4dbecb8d8706915f@mail.gmail.com> Message-ID: <5B277CDD-6233-40AF-B95D-74623C3733DD@theryanking.com> On Jan 16, 2008, at 2:05 AM, Brian Suda wrote: > Basically, any parser that finds multiple instances of children ADR > properties, should just concatenate them together with a ',' As you, Brian, already know, I fully support this. > I know we talked about this before, but i can't find a reference. We > do have a test case, but it is only for street-address (we need > formally write what happens for the other terms). I say that we should do this on any ADR sub-property that the RFC doesn't explicitly disallow. And looking at the the RFC now, I can't make a case for disallowing it on any of ADRs sub-properties. -ryan From andy at pigsonthewing.org.uk Wed Jan 16 16:33:14 2008 From: andy at pigsonthewing.org.uk (Andy Mabbett) Date: Wed Jan 16 16:34:56 2008 Subject: [uf-dev] ADR Multiple instances of children In-Reply-To: <5B277CDD-6233-40AF-B95D-74623C3733DD@theryanking.com> References: <21e770780801160205w45c663cer4dbecb8d8706915f@mail.gmail.com> <5B277CDD-6233-40AF-B95D-74623C3733DD@theryanking.com> Message-ID: In message <5B277CDD-6233-40AF-B95D-74623C3733DD@theryanking.com>, ryan writes >> Basically, any parser that finds multiple instances of children ADR >> properties, should just concatenate them together with a ',' >I say that we should do this on any ADR sub-property that the RFC >doesn't explicitly disallow. And looking at the the RFC now, I can't >make a case for disallowing it on any of ADRs sub-properties. Those properties are: * type [work|home|pref|postal|dom|intl] * post-office-box * street-address * extended-address * region * locality * postal-code * country-name I'm not clear what the cases for having comma-separated, concatenated values for type, PO Box, postal code or country are. Would a multi-term type be valid? In the UK at least, PO boxes or postal codes with commas would be invalid. Can anyone give an example of a country whose name includes a comma? -- Andy Mabbett From ryan at theryanking.com Wed Jan 16 17:49:47 2008 From: ryan at theryanking.com (ryan) Date: Wed Jan 16 17:49:46 2008 Subject: [uf-dev] ADR Multiple instances of children In-Reply-To: References: <21e770780801160205w45c663cer4dbecb8d8706915f@mail.gmail.com> <5B277CDD-6233-40AF-B95D-74623C3733DD@theryanking.com> Message-ID: On Jan 16, 2008, at 4:33 PM, Andy Mabbett wrote: > In message <5B277CDD-6233-40AF-B95D-74623C3733DD@theryanking.com>, > ryan > writes > >>> Basically, any parser that finds multiple instances of children ADR >>> properties, should just concatenate them together with a ',' > >> I say that we should do this on any ADR sub-property that the RFC >> doesn't explicitly disallow. And looking at the the RFC now, I can't >> make a case for disallowing it on any of ADRs sub-properties. > > Those properties are: > > * type [work|home|pref|postal|dom|intl] > * post-office-box > * street-address > * extended-address > * region > * locality > * postal-code > * country-name > > I'm not clear what the cases for having comma-separated, concatenated > values for type, PO Box, postal code or country are. Type already allows multiple values (and it a parameter in vcard, not a sub-property). For the rest, that's probably a good argument that those don't make "semantic sense". > Would a multi-term type be valid? In the UK at least, PO boxes or > postal > codes with commas would be invalid. Can anyone give an example of a > country whose name includes a comma? You can escape commas with backslash (\). -ryan From guillaume at lebleu.org Thu Jan 17 01:01:53 2008 From: guillaume at lebleu.org (Guillaume Lebleu) Date: Thu Jan 17 01:09:07 2008 Subject: [uf-dev] ADR Multiple instances of children In-Reply-To: <478E5C36.1040700@tenbus.co.uk> References: <21e770780801160205w45c663cer4dbecb8d8706915f@mail.gmail.com> <478E5C36.1040700@tenbus.co.uk> Message-ID: <478F1981.9030106@lebleu.org> Brian Suda wrote: > > Basically, any parser that finds multiple instances of children ADR > properties, should just concatenate them together with a ',' > I think this technique might find value elsewhere as well. For instance, I know at least one instance of a calendar where event's dtstart and summary is split. (See: http://biz.yahoo.com/c/e.html) In this particular instance there would be a lot of value in allowing the following markup: Jan 3rd8:30 AM and Retail salesDec, and having a parser extract dtstart = 2007-01-03T08:00:00-0500 and summary = Retail SalesDec. Guillaume From brian.suda at gmail.com Thu Jan 17 02:11:54 2008 From: brian.suda at gmail.com (Brian Suda) Date: Thu Jan 17 02:11:56 2008 Subject: [uf-dev] ADR Multiple instances of children In-Reply-To: <478F1981.9030106@lebleu.org> References: <21e770780801160205w45c663cer4dbecb8d8706915f@mail.gmail.com> <478E5C36.1040700@tenbus.co.uk> <478F1981.9030106@lebleu.org> Message-ID: <21e770780801170211hc1ecc79n3aa49b29eb30e2f6@mail.gmail.com> 2008/1/17, Guillaume Lebleu : > Brian Suda wrote: > > > > Basically, any parser that finds multiple instances of children ADR > > properties, should just concatenate them together with a ',' > > > > I think this technique might find value elsewhere as well. --- i think we are conflating two different issues. Firstly, the vCard spec says that street-address (etc) "can include multiple text values ... separated by the COMMA character" This means that the comma is NOT escaped and is a list of values. "Suite\, 110" is different than "Suite, 110". The first is one value, the second is two values. > For instance, I know at least one instance of a calendar where event's > dtstart and summary is split. --- this COMMA concatenation is only available for certain values defined in the RFC (Org-unit, adr prop, and some N-props) What you are describing with multiple DTSTARTs is incorrect. Since there is a 0-1 for many values (including DTSTART) only the first is used and all subsequent are ignored. To "have your cake and eat it too" you can use the class="value" which concatenates (without separators) into a single value. -brian -- brian suda http://suda.co.uk From guillaume at lebleu.org Thu Jan 17 02:30:45 2008 From: guillaume at lebleu.org (Guillaume Lebleu) Date: Thu Jan 17 02:30:48 2008 Subject: [uf-dev] ADR Multiple instances of children In-Reply-To: <21e770780801170211hc1ecc79n3aa49b29eb30e2f6@mail.gmail.com> References: <21e770780801160205w45c663cer4dbecb8d8706915f@mail.gmail.com> <478E5C36.1040700@tenbus.co.uk> <478F1981.9030106@lebleu.org> <21e770780801170211hc1ecc79n3aa49b29eb30e2f6@mail.gmail.com> Message-ID: <478F2E55.4030601@lebleu.org> Brian Suda wrote: > --- this COMMA concatenation is only available for certain values > defined in the RFC (Org-unit, adr prop, and some N-props) > > What you are describing with multiple DTSTARTs is incorrect. Since > there is a 0-1 for many values (including DTSTART) only the first is > used and all subsequent are ignored. To "have your cake and eat it > too" you can use the class="value" which concatenates (without > separators) into a single value. > I don't see how this works in the case my example (the Yahoo economic calendar http://biz.yahoo.com/c/e.html), which uses an HTML table and where the date is in a td and the time in another td (same for summary split in two tds), and where you can't wrap the two tds with another tag: Jan 15 8:30 AM Retail Sales ex-auto Dec ... Guillaume From brian.suda at gmail.com Thu Jan 17 03:04:06 2008 From: brian.suda at gmail.com (Brian Suda) Date: Thu Jan 17 03:04:09 2008 Subject: [uf-dev] ADR Multiple instances of children In-Reply-To: <478F2E55.4030601@lebleu.org> References: <21e770780801160205w45c663cer4dbecb8d8706915f@mail.gmail.com> <478E5C36.1040700@tenbus.co.uk> <478F1981.9030106@lebleu.org> <21e770780801170211hc1ecc79n3aa49b29eb30e2f6@mail.gmail.com> <478F2E55.4030601@lebleu.org> Message-ID: <21e770780801170304n4bf33665raea4a34e54318f5e@mail.gmail.com> 2008/1/17, Guillaume Lebleu : > I don't see how this works in the case my example (the Yahoo economic > calendar http://biz.yahoo.com/c/e.html), which uses an HTML table and > where the date is in a td and the time in another td (same for summary > split in two tds), and where you can't wrap the two tds with another tag: --- in the current example it would not. This is better discussed on the mf-discuss list. What usually happens is that the JAN 15 has NO dtstart, but the time gets the full DTSTART. Please add this to the Wiki so we have a references in the future. -- brian suda http://suda.co.uk From aconbere at gmail.com Sun Jan 20 19:18:09 2008 From: aconbere at gmail.com (anders conbere) Date: Sun Jan 20 19:18:12 2008 Subject: [uf-dev] Python Microformats parser Message-ID: <8ca3fbe80801201918q2e17b0d5of2f28011a7c8d2a3@mail.gmail.com> So I've spent a little while developing a new python microformats parser. (code below) http://microformats.googlecode.com/svn/code/python/microformats-parser/uf/ I ran into quite a few hurdles and I've ended up on an implementation that uses lxml to parse html into an internal xml representation, then applying an xsl transform to that to arrive at the standard format it represents, then using the available python parsers for that format to get back to a python data object. By and large this actually works pretty well at getting the data out of microformats. The largest problem I've actually run into is that the various parsing libraries I use for things like vCard/vCal and hAtom provide different interfaces, different bugs and different ways of handling data. Anyway I would love comments and critiques, and maybe someone has gotten around all these problems already. ~ Anders