From mkaply at us.ibm.com Thu May 1 10:15:17 2008 From: mkaply at us.ibm.com (Michael Kaply) Date: Thu May 1 10:26:00 2008 Subject: [uf-dev] Include-Pattern Infinite Loop Test Cases In-Reply-To: <1005d65f0804291257x35022f49vdf96a4499796bfc7@mail.gmail.com> Message-ID: Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available Url : http://microformats.org/discuss/mail/microformats-dev/attachments/20080501/9e72020e/graycol.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: pic10152.gif Type: image/gif Size: 1255 bytes Desc: not available Url : http://microformats.org/discuss/mail/microformats-dev/attachments/20080501/9e72020e/pic10152.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available Url : http://microformats.org/discuss/mail/microformats-dev/attachments/20080501/9e72020e/ecblank.gif From zack.carter at gmail.com Thu May 1 22:08:12 2008 From: zack.carter at gmail.com (Zachary Carter) Date: Thu May 1 22:08:16 2008 Subject: [uf-dev] Preventing false positives Message-ID: I was trying to write a monkeyformat[1] for Facebook but there are many false positives within profiles. So I have two questions: 1) is there a way to ignore an entire element and its descendants from being parsed? 2) Is there a way to have the parser ignore all class names on an element? (as if the class names were removed from the element prior to parsing) Thanks. [1]http://userscripts.org/scripts/search?q=monkeyformats -- Zach Carter http://zachcarter.info From mail at tobyinkster.co.uk Fri May 2 01:25:10 2008 From: mail at tobyinkster.co.uk (Toby A Inkster) Date: Fri May 2 01:25:18 2008 Subject: [uf-dev] Preventing false positives Message-ID: Zachary Carter wrote: > So I have two questions: 1) is > there a way to ignore an entire element and its descendants from being > parsed? Not that I know of. I suppose that putting the content into an IFRAME instead of on the main page ought to do it, but it's an ugly solution; and because it's not an officially sanctioned method for hiding content from parsers, you have no guarantee that future parsers will not start parsing within IFRAMEs. > 2) Is there a way to have the parser ignore all class names on > an element? (as if the class names were removed from the element prior > to parsing) The MFO effort is an attempt to do something like this. The list of parsers that actually support MFO is pretty short though. Cognition does support MFO. I mention this because the technique it uses is close to what you describe. When it parses a microformat, it takes a *clone* of the element and its children (so as not to damage the original DOM tree), then tries to parse embedded microformats -- e.g. "adr", "geo" and "agent vcard" within a "vcard". I'll break off the parsing procedure here for a little terminology: I make a distinction between "embedded microformats" which are those that imply a special meaning by being nested within each other; and "nested microformats" which are those that are nested within each other by mere co-incidence, or perhaps to convey some kind of undefined relationship between the objects (e.g. an hCard could be nested within a geo -- perhaps the author meant to convey that the person represented by the hCard lives at that location, but this type of nesting is not defined in the specs) Anyway, after parsing *embedded* microformats, Cognition searches for *nested* microformats. It uses a list of all known root element classes (e.g. "hatom", "hresume", "hlisting", "vcalendar") -- including the class names for microformats which Cognition does not yet support. It also includes the class name "mfo". Now, if it finds any of these nested microformats, it reaches within them and tampers with every descendent element, setting the "rel", "rev" and "class" attributes to the empty string. Remember, that this is on a clone of the DOM. Thus these elements will be excluded from supplying any unintentional semantics to the outer microformat. Let's look at an example:

Dr. Marvin Candle

Worked for a company called The Hanzo Foundation .

Now, when we come to parse the outer hCard, the clone is reduced to the following using MFO:

Dr. Marvin Candle

Worked for a company called The Hanzo Foundation .

And the following vCard may be produced: BEGIN:VCARD FN:Dr. Marvin Candle N:Candle;Marvin;;Dr. NOTE:Worked for a company called The Hanzo Foundation. END:VCARD Note that the full text of the note is included, but there is no "ORG" property in the vCard. As it happens, because "vcard" is included in that big list of known microformats (remember? "hatom", "hresume", "hlisting", "vcalendar"...), the same effect would have happened even if we hadn't included -- but the MFO class is still useful because new microformats could arise at some point in the future which are not on that list. It is also worth noting that while this MFO step masks the properties of the inner hCard from the outer hCard, the inner hCard will still be parsed as a later step, resulting in a second vCard: BEGIN:VCARD FN:The Hanzo Foundation ORG:The Hanzo Foundation END:VCARD -- Toby A Inkster -- Toby A Inkster From zack.carter at gmail.com Fri May 2 15:47:07 2008 From: zack.carter at gmail.com (Zachary Carter) Date: Fri May 2 15:47:10 2008 Subject: [uf-dev] Preventing false positives In-Reply-To: References: Message-ID: The mfo approach is interesting, and would probably be the ideal type of approach. The handling by Cognition would be identical to mfo except the classes aren't added back at a later step. For the second situation, where the descendants are still parsed as belonging to the scope of the uF but not the element, it would remove class/rev/rel from just the element it's placed on. To help elaborate my situation:

Dr. Marvin Candle

Website: http://example.org

Applications

[... third party content ...]

Title and label classes are not being used as hcard properties, so I would want to exclude them. The third party application area I would want to ignore completely (placing it in an iframe would likely break lots of functionality.) Are there any plans (or should there be) to support something like this? Alternatively, is it possible to assign content distributed on the page as belonging to a single microformat? On Fri, May 2, 2008 at 4:25 AM, Toby A Inkster wrote: > Zachary Carter wrote: > > > > So I have two questions: 1) is > > there a way to ignore an entire element and its descendants from being > > parsed? > > > > Not that I know of. I suppose that putting the content into an IFRAME > instead of on the main page ought to do it, but it's an ugly solution; and > because it's not an officially sanctioned method for hiding content from > parsers, you have no guarantee that future parsers will not start parsing > within IFRAMEs. > > > > 2) Is there a way to have the parser ignore all class names on > > an element? (as if the class names were removed from the element prior > > to parsing) > > > > The MFO effort is an attempt to do > something like this. The list of parsers that actually support MFO is pretty > short though. > > Cognition does support MFO. I mention > this because the technique it uses is close to what you describe. When it > parses a microformat, it takes a *clone* of the element and its children (so > as not to damage the original DOM tree), then tries to parse embedded > microformats -- e.g. "adr", "geo" and "agent vcard" within a "vcard". > > I'll break off the parsing procedure here for a little terminology: I make > a distinction between "embedded microformats" which are those that imply a > special meaning by being nested within each other; and "nested microformats" > which are those that are nested within each other by mere co-incidence, or > perhaps to convey some kind of undefined relationship between the objects > (e.g. an hCard could be nested within a geo -- perhaps the author meant to > convey that the person represented by the hCard lives at that location, but > this type of nesting is not defined in the specs) > > Anyway, after parsing *embedded* microformats, Cognition searches for > *nested* microformats. It uses a list of all known root element classes > (e.g. "hatom", "hresume", "hlisting", "vcalendar") -- including the class > names for microformats which Cognition does not yet support. It also > includes the class name "mfo". > > Now, if it finds any of these nested microformats, it reaches within them > and tampers with every descendent element, setting the "rel", "rev" and > "class" attributes to the empty string. Remember, that this is on a clone of > the DOM. Thus these elements will be excluded from supplying any > unintentional semantics to the outer microformat. > > Let's look at an example: > >
>

> Dr. > Marvin > Candle >

>

> > Worked for a company called > > The Hanzo Foundation > . > >

>
> > Now, when we come to parse the outer hCard, the clone is reduced to the > following using MFO: > >
>

> Dr. > Marvin > Candle >

>

> > Worked for a company called > > The Hanzo Foundation > . > >

>
> > And the following vCard may be produced: > > BEGIN:VCARD > FN:Dr. Marvin Candle > N:Candle;Marvin;;Dr. > NOTE:Worked for a company called The Hanzo Foundation. > END:VCARD > > Note that the full text of the note is included, but there is no "ORG" > property in the vCard. > > As it happens, because "vcard" is included in that big list of known > microformats (remember? "hatom", "hresume", "hlisting", "vcalendar"...), the > same effect would have happened even if we hadn't included class="mfo"> -- but the MFO class is still useful because new microformats > could arise at some point in the future which are not on that list. > > It is also worth noting that while this MFO step masks the properties of > the inner hCard from the outer hCard, the inner hCard will still be parsed > as a later step, resulting in a second vCard: > > BEGIN:VCARD > FN:The Hanzo Foundation > ORG:The Hanzo Foundation > END:VCARD > > -- > Toby A Inkster > > > > > > > > -- > Toby A Inkster > > > > > > _______________________________________________ > microformats-dev mailing list > microformats-dev@microformats.org > http://microformats.org/mailman/listinfo/microformats-dev > -- Zach Carter http://zachcarter.info From mail at tobyinkster.co.uk Sat May 3 02:15:04 2008 From: mail at tobyinkster.co.uk (Toby A Inkster) Date: Sat May 3 02:15:14 2008 Subject: [uf-dev] Preventing false positives Message-ID: Zachary Carter wrote: > To help elaborate my situation: > >
>

> Dr. > Marvin > Candle >

>

> Website: href="http://example.org" class="url">http://example.org >

>

Applications

>

> [... third party content ...] >

>
> > Title and label classes are not being used as hcard properties, so I > would want to exclude them. Well, TITLE is a singular property, so it should be easy to force microformat parsers to ignore your title class -- simply include a blank title: earlier on in the vCard (before your

element). Parsers should just pick up the first title and ignore the later one. LABEL is a plural property, so this approach will not work for that. MFO as implemented by Cognition (and I emphasise that the MFO effort is still in the brainstorming stage, so the final MFO spec, if any, may be completely different) can be used to provide a solution for both the TITLE and LABEL:

Dr. Marvin Candle

Website: http://example.org

Applications

[... third party content ...]

Of course the most obvious solution is simply:

Dr. Marvin Candle

Website: http://example.org

Applications

[... third party content ...]

Which should work in all present-day parsers. -- Toby A Inkster From mail at tobyinkster.co.uk Sat May 3 02:45:23 2008 From: mail at tobyinkster.co.uk (Toby A Inkster) Date: Sat May 3 02:45:28 2008 Subject: [uf-dev] Preventing false positives In-Reply-To: References: Message-ID: <5128F867-8393-4E50-AF9E-0A3BA5815ECF@tobyinkster.co.uk> On 3 May 2008, at 10:15, Toby A Inkster wrote: > Well, TITLE is a singular property Wrong, I was. TITLE is plural. But singular properties do exist (e.g. 'fn', 'class', 'bday'), so the technique outined may still be of some use for those. -- Toby A Inkster From danbri at danbri.org Sat May 3 07:13:43 2008 From: danbri at danbri.org (Dan Brickley) Date: Sat May 3 07:13:43 2008 Subject: [uf-dev] Preventing false positives In-Reply-To: <5128F867-8393-4E50-AF9E-0A3BA5815ECF@tobyinkster.co.uk> References: <5128F867-8393-4E50-AF9E-0A3BA5815ECF@tobyinkster.co.uk> Message-ID: <481C7317.7040307@danbri.org> Toby A Inkster wrote: > On 3 May 2008, at 10:15, Toby A Inkster wrote: > >> Well, TITLE is a singular property > > > Wrong, I was. TITLE is plural. But singular properties do exist (e.g. > 'fn', 'class', 'bday'), so the technique outined may still be of some > use for those. Is 'singular property' accepted Microformat-community terminology? (or just an obvious/sensible phrase). Is there any machine-readable representation of which microformat properties are singular? It seems roughly what RDF/OWL calls 'functional' property (eg. in FOAF, 'birthday','gender', 'primaryTopic' are functional). Is there a microformal word for the inverse of this concept: properties that have at most one proper value, for anything they apply to? In FOAF, examples of this (we call it an "Inverse Functional Property") include "homepage", "weblog", "openid", "tipjar", "jabberID", "mbox_sha1sum"... cheers, Dan -- http://danbri.org/ From rff.rff at gmail.com Sat May 3 09:26:21 2008 From: rff.rff at gmail.com (gabriele renzi) Date: Sat May 3 09:26:23 2008 Subject: [uf-dev] Preventing false positives In-Reply-To: <481C7317.7040307@danbri.org> References: <5128F867-8393-4E50-AF9E-0A3BA5815ECF@tobyinkster.co.uk> <481C7317.7040307@danbri.org> Message-ID: <828083e70805030926v530ce142t2919ff911fe1ecc6@mail.gmail.com> On Sat, May 3, 2008 at 3:13 PM, Dan Brickley wrote: > Is 'singular property' accepted Microformat-community terminology? (or just > an obvious/sensible phrase). Is there any machine-readable representation of > which microformat properties are singular? sorry to hijack the thread, but on the same line: anybody thought of a simple/generic machine readable description of microformats ? A simple mix of CSS/xpath/regex, for example hcard : .vcard, * #creates a namespace many allowed full_name: .vcard .fn/text(), 1 #add full name to this namespace, exactly one email: a.email/href or area.email/href or .email/text(), ? #add email to this ns, checking various choices, zero or one I'm writing a generic parser and It basically has this kind of structure (i.e. fn = getRequired(root, '.fn', 'text()'), is there a clear problem with this that I'm not seeing? It would be a small improvement on the semiformal descriptions on the wiki, where informations are a bit scattered around, for example there is an hcard test for when .email is to be taken from the text value of a node, but I could not find it explained on the hcard parsing page, and it seem that this happened to other people[1]. Please excuse me if I sound dumb and talk about already discussed things, but I'm still new to uFs. [1] http://www.w3.org/2006/vcard/hcard2rdf.xsl seems to miss it, for one From zack.carter at gmail.com Sat May 3 11:25:41 2008 From: zack.carter at gmail.com (Zachary Carter) Date: Sat May 3 11:25:43 2008 Subject: [uf-dev] Preventing false positives In-Reply-To: <828083e70805030926v530ce142t2919ff911fe1ecc6@mail.gmail.com> References: <5128F867-8393-4E50-AF9E-0A3BA5815ECF@tobyinkster.co.uk> <481C7317.7040307@danbri.org> <828083e70805030926v530ce142t2919ff911fe1ecc6@mail.gmail.com> Message-ID: There was discussion about a JSON representation of hCard (http://microformats.org/wiki/jcard). I think that's what you're looking for. On Sat, May 3, 2008 at 12:26 PM, gabriele renzi wrote: > On Sat, May 3, 2008 at 3:13 PM, Dan Brickley wrote: > > > Is 'singular property' accepted Microformat-community terminology? (or just > > an obvious/sensible phrase). Is there any machine-readable representation of > > which microformat properties are singular? > > sorry to hijack the thread, but on the same line: anybody thought of a > simple/generic machine readable description of microformats ? > A simple mix of CSS/xpath/regex, for example > > hcard : .vcard, * > #creates a namespace many allowed > full_name: .vcard .fn/text(), 1 > #add full name to this namespace, exactly one > email: a.email/href or area.email/href or .email/text(), ? > #add email to this ns, checking various choices, zero or one > > > I'm writing a generic parser and It basically has this kind of > structure (i.e. fn = getRequired(root, '.fn', 'text()'), is there a > clear problem with this that I'm not seeing? > > It would be a small improvement on the semiformal descriptions on the > wiki, where informations are a bit scattered around, for example > there is an hcard test for when .email is to be taken from the text > value of a node, but I could not find it explained on the hcard > parsing page, and it seem that this happened to other people[1]. > > Please excuse me if I sound dumb and talk about already discussed > things, but I'm still new to uFs. > > > > [1] > http://www.w3.org/2006/vcard/hcard2rdf.xsl seems to miss it, for one > > > _______________________________________________ > microformats-dev mailing list > microformats-dev@microformats.org > http://microformats.org/mailman/listinfo/microformats-dev > -- Zach Carter http://zachcarter.info From mail at tobyinkster.co.uk Sat May 3 13:50:24 2008 From: mail at tobyinkster.co.uk (Toby A Inkster) Date: Sat May 3 13:50:47 2008 Subject: [uf-dev] Preventing false positives Message-ID: Dan Brickley wrote: > Is 'singular property' accepted Microformat-community terminology? (or > just an obvious/sensible phrase). Is there any machine-readable > representation of which microformat properties are singular? The terms used in the hCard spec are "singular property" and "plural property". A list of the singular properties for hCard can be found at: http://microformats.org/wiki/hcard#Singular_vs._Plural_Properties The hCard spec is rather casual about such matters -- it's just described in prose. Some of the newer microformats include property lists marked up as nested unordered HTML lists with Perl-like quantifiers, such as "{1}" = must occur exactly once; "*" = optional, may occur more than once; "+" = optional, may only occur once; etc. These could theoretically be parsed mechanically, but that wouldn't be enough to fully automate supporting new microformats, as there's still the matter of content models (e.g. should something be parsed as a link, or as a string). > Is there a microformal word for the inverse of this concept: > properties > that have at most one proper value, for anything they apply to? In > FOAF, > examples of this (we call it an "Inverse Functional Property") include > "homepage", "weblog", "openid", "tipjar", "jabberID", > "mbox_sha1sum"... The UID property of hCard and hCalendar are in effect inverse functional properties. (Indeed my parser implements them as such. If two hCalendar events exist which share a UID, they'll be conflated into the same event in the output.) PS: Dan, did you get my e-mail on 27 April? I sent it to your rdfweb.org address -- not sure if that's still valid? -- Toby A Inkster From rff.rff at gmail.com Sun May 4 05:55:01 2008 From: rff.rff at gmail.com (gabriele renzi) Date: Sun May 4 05:55:04 2008 Subject: [uf-dev] Preventing false positives In-Reply-To: References: <5128F867-8393-4E50-AF9E-0A3BA5815ECF@tobyinkster.co.uk> <481C7317.7040307@danbri.org> <828083e70805030926v530ce142t2919ff911fe1ecc6@mail.gmail.com> Message-ID: <828083e70805040555s1dfb18b1j7f02e726ea1b870@mail.gmail.com> On Sat, May 3, 2008 at 7:25 PM, Zachary Carter wrote: > There was discussion about a JSON representation of hCard > (http://microformats.org/wiki/jcard). I think that's what you're > looking for. > I'm not sure, I think I failed to express my question. This looks like a way to describe a microformatted object, through a JSON serialization. What I was thinking is a way to describe the microformat itself, as in a DTD, or OWL ontology or BNF. Nothing too fancy, just a little improvement on the already existing informal "schema" definitions in the wiki to include some parsing details. Anyway, thanks for the anwer. From julian_bond at voidstar.com Mon May 5 00:12:08 2008 From: julian_bond at voidstar.com (Julian Bond) Date: Mon May 5 01:16:21 2008 Subject: [uf-dev] Discovery of Microformatted documents Message-ID: We've been looking at ways of discovering microformatted documents. The requirement is to be able to say something like "My Profile page is at this URL". We've identified 3 likely candidates (which might all be the same page). 1) A page holding my personal profile. Probably containing hCard. Typically something like an AboutMe page on a blog. 2) A page holding a list of my external profiles. Marked up with XFN rel="me" The "YASN-Roll". 3) A page holding a list of my contacts. Marked up with XFN rel="contact" and the other contact types. To make this work we need a URI that identifies the page types and/or a Media Type. But since this is all html/xhtml, the Media-Type is going to be the same in each case. This page http://www.gmpg.org/xfn/join seems to suggest http://gmpg.org/xfn/11 as a relatively permanent URI to use for XFN but it wouldn't distinguish between cases 2 and 3. This page http://microformats.org/wiki/profile-uris defines some URI candidates as well. But I think my requirement is at a slightly higher level. Any thoughts on this? -- Julian Bond E&MSN: julian_bond at voidstar.com M: +44 (0)77 5907 2173 Webmaster: http://www.ecademy.com/ T: +44 (0)192 0412 433 Personal WebLog: http://www.voidstar.com/ skype:julian.bond?chat Serve At Room Temperature From mail at tobyinkster.co.uk Mon May 5 02:09:39 2008 From: mail at tobyinkster.co.uk (Toby A Inkster) Date: Mon May 5 02:09:47 2008 Subject: [uf-dev] Discovery of Microformatted documents Message-ID: Julian Bond wrote: > We've been looking at ways of discovering microformatted documents. > The > requirement is to be able to say something like "My Profile page is at > this URL". Probably something like: Link to my profile is the best way to link to a page which contains metadata about you. (rel=meta is formally defined in the XHTML 2 drafts, but has been used for years by the Dublin Core and FOAF communities to indicate a page which contains relevant metadata.) > This page http://www.gmpg.org/xfn/join seems to suggest > http://gmpg.org/xfn/11 as a relatively permanent URI to use for XFN > but > it wouldn't distinguish between cases 2 and 3. > > This page http://microformats.org/wiki/profile-uris defines some URI > candidates as well. But I think my requirement is at a slightly higher > level. The term "profile" used on those pages has nothing to do with "personal profiles". It refers to the (rarely used) "profile" attribute of the element which is used to link to one or more documents that describe the way that you're using HTML. For example: may be used to indicate that when you write 'rel="contact"', you are using the definition of 'rel="contact"' which can be found in XFN 1.1, and not, say, the entirely different definition of 'rel="contact"' which the current HTML 5 drafts use. In terms of microformats, the attribute can be thought of as a place to list which microformats you use on the page, so that parsers can distinguish between an intentional use of hCard and a co- incidental use of 'class="vcard"' by someone who's never even heard of hCard. -- Toby A Inkster From julian_bond at voidstar.com Mon May 5 14:22:31 2008 From: julian_bond at voidstar.com (Julian Bond) Date: Mon May 5 14:23:16 2008 Subject: [uf-dev] Discovery of Microformatted documents In-Reply-To: References: Message-ID: Toby A Inkster Mon, 5 May 2008 10:09:39 >Probably something like: > > rel="me meta">Link to my profile > >is the best way to link to a page which contains metadata about you. >(rel=meta is formally defined in the XHTML 2 drafts, but has been used >for years by the Dublin Core and FOAF communities to indicate a page >which contains relevant metadata.) OK. We're trying to do something slightly different. The use case is to find relevant documents (in this case microformat marked up html) during the initial signup to a new site using openid. Openid has a discovery mechanism using XRDS files. These XRDS files seem like a good place to put links to things like:- - The page with my profile on it (hcard) - The page with a list of the other profiles I have (rel="me") - The page with a list of all my contacts (rel="contact") The XRDS format expects a URI to identify the type of service and a media type. If those documents are HTML then the media type is simple. Then it has a URL field for the location of the page described. But we would need URIs for those cases above. The best URIs to use for Type seem to be http://xmlns.com/foaf/spec/ http://gmpg.org/xfn/11#me http://gmpg.org/xfn/11#contact respectively. I'm suggesting using the anchors to distinguish between a page primarily about other profiles and one primarily about a list of contacts. I don't remember seeing anything before about discovery of microformat documents. As you've described, you could put a For any kind of strange reason, I missed some messages, which I saw on the archive now. Just wanted to thank those who answered, and told me it's a specific Outlook problem. -- Marie-Aude Koiransky www.lumieredelune.com From mail at tobyinkster.co.uk Tue May 6 04:50:17 2008 From: mail at tobyinkster.co.uk (Toby Inkster) Date: Tue May 6 04:50:23 2008 Subject: [uf-dev] hCard label type Message-ID: <60769.81.2.120.180.1210074617.squirrel@goddamn.co.uk> The vCard spec allows types (e.g. "home", "postal", etc) to be specified for the LABEL property, but the hCard spec doesn't seem to allow this. The hCard examples page on the wiki (RFC 2426 examples) does include a label marked up with type+value, but that page is not considered normative. Is this an oversight in the spec, or was a conscious decision made not to allow types to be specified within labels? If the latter, what was the reasoning? Do any current parsers extend hCard and allow a type to be specified for labels? (I'm considering adding this feature to Cognition.) -- Toby Inkster From zack.carter at gmail.com Thu May 8 12:53:07 2008 From: zack.carter at gmail.com (Zachary Carter) Date: Thu May 8 12:53:20 2008 Subject: [uf-dev] Preventing false positives In-Reply-To: <828083e70805040555s1dfb18b1j7f02e726ea1b870@mail.gmail.com> References: <5128F867-8393-4E50-AF9E-0A3BA5815ECF@tobyinkster.co.uk> <481C7317.7040307@danbri.org> <828083e70805030926v530ce142t2919ff911fe1ecc6@mail.gmail.com> <828083e70805040555s1dfb18b1j7f02e726ea1b870@mail.gmail.com> Message-ID: Oops, I think you mean XMDP (http://gmpg.org/xmdp/) then. Toby has a list of uF profiles on his site: http://buzzword.org.uk/profiles/ On Sun, May 4, 2008 at 8:55 AM, gabriele renzi wrote: > On Sat, May 3, 2008 at 7:25 PM, Zachary Carter wrote: >> There was discussion about a JSON representation of hCard >> (http://microformats.org/wiki/jcard). I think that's what you're >> looking for. >> > > I'm not sure, I think I failed to express my question. > This looks like a way to describe a microformatted object, through a > JSON serialization. > > What I was thinking is a way to describe the microformat itself, as in > a DTD, or OWL ontology or BNF. > Nothing too fancy, just a little improvement on the already existing > informal "schema" definitions in the wiki to include some parsing > details. > > > Anyway, thanks for the anwer. > _______________________________________________ > microformats-dev mailing list > microformats-dev@microformats.org > http://microformats.org/mailman/listinfo/microformats-dev > -- Zach Carter http://zachcarter.info From rff.rff at gmail.com Fri May 9 05:11:05 2008 From: rff.rff at gmail.com (gabriele renzi) Date: Fri May 9 05:11:11 2008 Subject: [uf-dev] Testcase clarification Message-ID: <828083e70805090511s53dd896cv4302c438d6d506a@mail.gmail.com> Hi everyone, I'm perusing the hCard test suite and it is a great source, thanks for it. Yet there is something I'm not clear in the testcase 21-tel. Basically, we have a lot of tel fields that use schemes different from tel, such as call me or call me it seems that this should be extracted as simple "tel" poperties without type TEL:+1.415.555.1241 and that the type is only kept when it is made explicit with a "type" subproperties. Is this correct? Why are we ignoring the explicit information in the uri scheme? -- goto 10: http://www.goto10.it blog it: http://riffraff.blogsome.com blog en: http://www.riffraff.info From julian.reschke at gmx.de Fri May 9 05:31:33 2008 From: julian.reschke at gmx.de (Julian Reschke) Date: Fri May 9 05:31:39 2008 Subject: [uf-dev] Test cases for hcard vs profile attribute In-Reply-To: <828083e70805090511s53dd896cv4302c438d6d506a@mail.gmail.com> References: <828083e70805090511s53dd896cv4302c438d6d506a@mail.gmail.com> Message-ID: <48244425.8070503@gmx.de> Hi, I just checked, and it seems that the test cases (such as ) do not use the profile attribute, as recommended in . Bug? Documentation out of date? BR, Julian From bjonkman at sobac.com Mon May 12 10:41:54 2008 From: bjonkman at sobac.com (Bob Jonkman) Date: Mon May 12 10:43:31 2008 Subject: [uf-dev] Testcase clarification In-Reply-To: <828083e70805090511s53dd896cv4302c438d6d506a@mail.gmail.com> References: <828083e70805090511s53dd896cv4302c438d6d506a@mail.gmail.com> Message-ID: <48284922.3848.DD6E7C3@bjonkman.sobac.com> I'm pretty sure the modem:// and fax:// uri schemes have been deprecated. tel:// is the only telephone uri scheme remaining, and call type (modem, fax) is left to be negotiated in-band by carrier recognition or out-of-band through SIP or similar.[1] So, ignoring the TEL scheme in an hCard would be the correct behaviour, but an explicit TYPE indication should be preserved in the vCard. --Bob. [1] http://tools.ietf.org/html/rfc3966 >>> 9 May 2008 13:11 gabriele renzi >>> > Hi everyone, > > I'm perusing the hCard test suite and it is a great source, thanks for > it. Yet there is something I'm not clear in the testcase 21-tel. > > Basically, we have a lot of tel fields that use schemes different from > tel, such as > call me or > call me > > it seems that this should be extracted as simple "tel" poperties > without type > TEL:+1.415.555.1241 > and that the type is only kept when it is made explicit with a "type" > subproperties. > > Is this correct? > Why are we ignoring the explicit information in the uri scheme? > > -- > goto 10: http://www.goto10.it > blog it: http://riffraff.blogsome.com > blog en: http://www.riffraff.info > _______________________________________________ > microformats-dev mailing list > microformats-dev@microformats.org > http://microformats.org/mailman/listinfo/microformats-dev -- -- -- -- Bob Jonkman http://sobac.com/sobac/ SOBAC Microcomputer Services Voice: +1-519-669-0388 6 James Street, Elmira ON Canada N3B 1L5 Cel: +1-519-635-9413 Software --- Office & Business Automation --- Consulting From rff.rff at gmail.com Tue May 13 08:37:51 2008 From: rff.rff at gmail.com (gabriele renzi) Date: Tue May 13 08:37:57 2008 Subject: [uf-dev] testcase for hcard TD referring to missing TH Message-ID: <828083e70805130837h61c3f077y295fe625506f5f77@mail.gmail.com> I don't know if this is interesting for anyone, but I added in my local copy of the test uF/hCard test suite an additional test for the header case, namely: Jane Doe should produce a vCard like BEGIN:VCARD PRODID:$PRODID$ SOURCE:$SOURCE$ NAME:32-header VERSION:3.0 N;CHARSET=UTF-8:Doe;Jane;;; FN;CHARSET=UTF-8:Jane Doe END:VCARD I think this should not happen in properly formatted pages, but as it caused a bug to be revealed in my parser implementation I have included it in my tests. I'm sending this cause maybe it can be interesting for others, albeit it seems that the uF test suite does not usually take care of invalid formatting. I'm not familiar with hg so I'm not sure if this is the correct way, but I'm attaching the tiny patch formatted as an hg bundle. -- goto 10: http://www.goto10.it blog it: http://riffraff.blogsome.com blog en: http://www.riffraff.info -------------- next part -------------- A non-text attachment was scrubbed... Name: missing-th.hg Type: application/octet-stream Size: 832 bytes Desc: not available Url : http://microformats.org/discuss/mail/microformats-dev/attachments/20080513/01f5f00b/missing-th.obj From lee.jordan at gmail.com Wed May 14 01:55:21 2008 From: lee.jordan at gmail.com (Lee Jordan) Date: Wed May 14 01:55:24 2008 Subject: [uf-dev] SEO and abbr Message-ID: Hi folks, I've been adding more and more mF's to the sites that I work on professionally and have come across a snag with the abbr-design pattern, just wondering if anyone else has come across the following issue? If so how it was resolved, I presume by not using abbr but using span instead. I've used hcalendar to mark up some dates but in a search engines results page, the title of the abbr tag has been included in the results decription text which makes the date look messy. It seems Google in particular indexes the title tag. Just raising this as I've just noticed it so we can be aware of the issue. Many Thanks Lee -- HTML | CSS | Javascript http://www.leejordan.org.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080514/5b5b34c4/attachment.html From brian.suda at gmail.com Wed May 14 02:09:45 2008 From: brian.suda at gmail.com (Brian Suda) Date: Wed May 14 02:09:48 2008 Subject: [uf-dev] SEO and abbr In-Reply-To: References: Message-ID: <21e770780805140209l43128311nd75fb0ac6cbdec5d@mail.gmail.com> 2008/5/14, Lee Jordan : > It seems Google in particular indexes the title tag. > Just raising this as I've just noticed it so we can be aware of the issue. thanks for the heads-up, can you start a wiki page and document your findings? With the URL and keywords you are searching for and the results that google (and other search engines) are producing? That way over time we can easily confirm or deny that search engines behaviour continues in a consistant way. thanks, -brian -- brian suda http://suda.co.uk From lee.jordan at gmail.com Wed May 14 03:48:03 2008 From: lee.jordan at gmail.com (Lee Jordan) Date: Wed May 14 03:48:08 2008 Subject: [uf-dev] SEO and abbr In-Reply-To: <21e770780805140209l43128311nd75fb0ac6cbdec5d@mail.gmail.com> References: <21e770780805140209l43128311nd75fb0ac6cbdec5d@mail.gmail.com> Message-ID: Cheers Brian, Have opened an issue on the wiki, with a text example, but without a link to the search result as SEO is a sensitive area for my employer and I'm trying to have a positive outlook on mF. Cheers Lee On Wed, May 14, 2008 at 10:09 AM, Brian Suda wrote: > 2008/5/14, Lee Jordan : > > It seems Google in particular indexes the title tag. > > Just raising this as I've just noticed it so we can be aware of the > issue. > > thanks for the heads-up, can you start a wiki page and document your > findings? With the URL and keywords you are searching for and the > results that google (and other search engines) are producing? That way > over time we can easily confirm or deny that search engines behaviour > continues in a consistant way. > > thanks, > -brian > > -- > brian suda > http://suda.co.uk > _______________________________________________ > microformats-dev mailing list > microformats-dev@microformats.org > http://microformats.org/mailman/listinfo/microformats-dev > -- HTML | CSS | Javascript http://www.leejordan.org.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080514/8b48a645/attachment.html From brian.suda at gmail.com Wed May 14 04:26:42 2008 From: brian.suda at gmail.com (Brian Suda) Date: Wed May 14 04:26:45 2008 Subject: [uf-dev] SEO and abbr In-Reply-To: References: <21e770780805140209l43128311nd75fb0ac6cbdec5d@mail.gmail.com> Message-ID: <21e770780805140426o374776b2qabdc1c1e89149bd6@mail.gmail.com> 2008/5/14, Lee Jordan : > Have opened an issue on the wiki, with a text example, but without a link to > the search result as SEO is a sensitive area for my employer and I'm trying > to have a positive outlook on mF. i am unable to replicate your finding with any of the microformats on my sites. If you could give a solid example, then we could look into how/why/what mark-up is contributing to this behaviour and how to proceed, but without any confirmation it is difficult to keep this as an open issue. Could you create a test page somewhere, so that you do not have to disclose any sensitive data for your employer? thanks, -brian -- brian suda http://suda.co.uk From csarven at gmail.com Wed May 14 06:46:00 2008 From: csarven at gmail.com (Sarven Capadisli) Date: Wed May 14 06:46:07 2008 Subject: [uf-dev] SEO and abbr In-Reply-To: <21e770780805140426o374776b2qabdc1c1e89149bd6@mail.gmail.com> References: <21e770780805140209l43128311nd75fb0ac6cbdec5d@mail.gmail.com> <21e770780805140426o374776b2qabdc1c1e89149bd6@mail.gmail.com> Message-ID: Here is one example: http://www.google.com/search?hl=en&q=microformats+introduction Look for "sarven". Shows description as: "24 Jan 2008 ... An introduction to microformats: what they are, why we need them and briefly how to use them." It *appears* to be that this happens when the description is less then 150 characters and they fill in the available space with the timestamp if and only if a new sentence doesn't fit. Here is another example: http://www.google.com/search?hl=en&q=three+significant+modes Which doesn't include the timestamp. And: http://www.google.com/search?hl=en&q=irc+social+networking+platform Which doesn't include the second sentence but fills it in with the timestamp. Sarven Capadisli http://www.csarven.ca On Wed, May 14, 2008 at 7:26 AM, Brian Suda wrote: > 2008/5/14, Lee Jordan : >> Have opened an issue on the wiki, with a text example, but without a link to >> the search result as SEO is a sensitive area for my employer and I'm trying >> to have a positive outlook on mF. > > i am unable to replicate your finding with any of the microformats on > my sites. If you could give a solid example, then we could look into > how/why/what mark-up is contributing to this behaviour and how to > proceed, but without any confirmation it is difficult to keep this as > an open issue. > > Could you create a test page somewhere, so that you do not have to > disclose any sensitive data for your employer? > > thanks, > -brian > > -- > brian suda > http://suda.co.uk > _______________________________________________ > microformats-dev mailing list > microformats-dev@microformats.org > http://microformats.org/mailman/listinfo/microformats-dev > From brian.suda at gmail.com Wed May 14 07:00:35 2008 From: brian.suda at gmail.com (Brian Suda) Date: Wed May 14 07:00:37 2008 Subject: [uf-dev] SEO and abbr In-Reply-To: References: <21e770780805140209l43128311nd75fb0ac6cbdec5d@mail.gmail.com> <21e770780805140426o374776b2qabdc1c1e89149bd6@mail.gmail.com> Message-ID: <21e770780805140700u1f68d662v351ca16851d24cc2@mail.gmail.com> 2008/5/14, Sarven Capadisli : > Here is one example: > http://www.google.com/search?hl=en&q=microformats+introduction > > Look for "sarven". Shows description as: > "24 Jan 2008 ... An introduction to microformats: what they are, why > we need them and briefly how to use them." > > It *appears* to be that this happens when the description is less then > 150 characters and they fill in the available space with the timestamp > if and only if a new sentence doesn't fit. --- thanks for the links and analysis. I agree, the description is coming from the element and the date before that is either the publication or date crawled. This doesn't seem to be in any way connected to the element that Lee Jordan is finding. Maybe we are all just slightly confused and talking about different things and/or Lee Jordan is connecting that displayed date with a date in the HTML, or he has actually finding an issue. Until we can find an example of this behaviour in the wild that is testable, (all other examples are counter to this) i do not believe this issue exists. thanks, -brian -- brian suda http://suda.co.uk From lee.jordan at gmail.com Fri May 16 01:30:21 2008 From: lee.jordan at gmail.com (Lee Jordan) Date: Fri May 16 01:30:24 2008 Subject: [uf-dev] SEO and abbr In-Reply-To: <21e770780805140700u1f68d662v351ca16851d24cc2@mail.gmail.com> References: <21e770780805140209l43128311nd75fb0ac6cbdec5d@mail.gmail.com> <21e770780805140426o374776b2qabdc1c1e89149bd6@mail.gmail.com> <21e770780805140700u1f68d662v351ca16851d24cc2@mail.gmail.com> Message-ID: This is what I found in the google description: "2008-04-2121st April - 2008-05-1211th May 2008" Bit of confusion for me too as I had messed around with that page quite a lot. It is actually an issue with working around abbr, not abbr itself. Looking at it deeper for that page I may have changed abbr to spans somewhere along the line before the google bot came along, to address accessibility with abbr, the lack of whitespace would be my fault then (schoolboy - hehe). In which case this should really be noted as a pitfall of working around abbr with span classes and should be noted as a possible downside to avoiding abbr? I'd say that does seem the more likely situation as it makes sense all span text gets indexed. Still be interested in knowing how search engines handle abbr though, will keep an eye on my abbr dates on the search engines as I have a few and will keep watching my own cubs in the wild. Lee On Wed, May 14, 2008 at 3:00 PM, Brian Suda wrote: > 2008/5/14, Sarven Capadisli : > > Here is one example: > > http://www.google.com/search?hl=en&q=microformats+introduction > > > > Look for "sarven". Shows description as: > > "24 Jan 2008 ... An introduction to microformats: what they are, why > > we need them and briefly how to use them." > > > > It *appears* to be that this happens when the description is less then > > 150 characters and they fill in the available space with the timestamp > > if and only if a new sentence doesn't fit. > > --- thanks for the links and analysis. I agree, the description is > coming from the element and the date before that is either the > publication or date crawled. This doesn't seem to be in any way > connected to the element that Lee Jordan is finding. > > Maybe we are all just slightly confused and talking about different > things and/or Lee Jordan is connecting that displayed date with a date > in the HTML, or he has actually finding an issue. > > Until we can find an example of this behaviour in the wild that is > testable, (all other examples are counter to this) i do not believe > this issue exists. > > thanks, > -brian > > -- > brian suda > http://suda.co.uk > _______________________________________________ > microformats-dev mailing list > microformats-dev@microformats.org > http://microformats.org/mailman/listinfo/microformats-dev > -- HTML | CSS | Javascript http://www.leejordan.org.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080516/cc957526/attachment.html From lists at ben-ward.co.uk Sat May 17 11:47:01 2008 From: lists at ben-ward.co.uk (Ben Ward) Date: Sat May 17 11:47:37 2008 Subject: [uf-dev] Defining and Extending Value Excepting Message-ID: <93867DB2-12A5-4DBD-938D-2FF35616347E@ben-ward.co.uk> Hi parser devs! I've spent a number of hours this weekend documenting cases of microformats requiring particular data formats for parsing (ISO 8601, telephone keywords in hCard, and so on). Alongside this, I've documented the current supported means of including said data (class-design-pattern, abbr-design-pattern and value-excerpting), noting how the intention of authors is to hide these specified formats in favour of more flexible human-centric formats. Alongside that, I've documented where different means of inclusion are appropriate and inappropriate in different situations. Finally, I've proposed an extension to the current pattern of value- excepting, whereby cases where an element with a class of ?value? is also empty, it would have the @title attribute parsed in place of inner-text. I am aware that we need to better specify the behaviour of value- excerpting as a whole, let alone adding extensions. We do, however, have a problem that can be solved; the requirement is to include specific data formats, but hidden in place of variable, human consumable forms of that data (or internationalised), whilst still operating entirely within the HTML layer (not depending on CSS). This is not something that HTML has a native means of handling. The way I see it, at the same time as properly specifying value- excerpting (possibly just calling ?value-design-pattern?), we can specify a robust means of handling the exceptional requirement to include machine data. ** What I'd like from parser developers is feedback on how feasible this pattern is to parse, please. ** Note that whilst this proposal _does_ resolve the long running abbr- misuse issue that keeps coming up, my approach here is in solving the root of the problem, not of working around a consequence of that problem. Additionally, in extending the existing value-excerpting behaviour, we avoid adding yet more syntactic vocabulary to microformats and we produce a pattern that does not tie people to particular HTML elements (which is more inline with our microformat goals). With regard to the separate issues we've had with ABBR, I'm asking some colleagues to test this idea thoroughly with regard to assistive technology before we finalise a spec. Thanks, Ben From mail at tobyinkster.co.uk Sat May 17 15:08:19 2008 From: mail at tobyinkster.co.uk (Toby A Inkster) Date: Sat May 17 15:08:34 2008 Subject: [uf-dev] Defining and Extending Value Excepting Message-ID: Although this sounds like a nice idea, I've previously been informed that requiring empty inline elements is a non-starter, as many HTML processors (including "tidy" with its default settings) strip these out. Preliminary testing with tidy (version: 1 September 2005) shows this to be true. Some parsers, including X2V IIRC, pre-process non-XHTML HTML by running it through tidy to get it into well-formed XML. Skimming through the tidy documentation, I can't see a way of disabling this empty inline element stripping behaviour. If people *want* to publish data that uses empty inline elements, then that's fair enough, but with the current state of HTML processors, it's probably unwise to publish a pattern that *requires* the use of empty inline elements. -- Toby A Inkster From lists at ben-ward.co.uk Sun May 18 05:09:23 2008 From: lists at ben-ward.co.uk (Ben Ward) Date: Sun May 18 05:09:42 2008 Subject: [uf-dev] Defining and Extending Value Excepting In-Reply-To: References: Message-ID: Hey Toby, On 17 May 2008, at 23:08, Toby A Inkster wrote: > Although this sounds like a nice idea, I've previously been informed > that requiring empty inline elements is a non-starter, as many HTML > processors (including "tidy" with its default settings) strip these > out. > > Preliminary testing with tidy (version: 1 September 2005) shows this > to be true. Some parsers, including X2V IIRC, pre-process non-XHTML > HTML by running it through tidy to get it into well-formed XML. > Skimming through the tidy documentation, I can't see a way of > disabling this empty inline element stripping behaviour. hKit does this too (via the W3C hosted version, although there was some talk of switching to PHP's native HTML DOM parser instead). Looking over the HTMLTidy bug tracker, it does seem to be an open issue, but there's one bug ? http://is.gd/i8E ? proposing that it not drop empty elements with class attributes, and includes a simple fix for it, fixing that would resolve this. > If people *want* to publish data that uses empty inline elements, > then that's fair enough, but with the current state of HTML > processors, it's probably unwise to publish a pattern that > *requires* the use of empty inline elements. I'm not entirely comfortable with a broken part of the parser stack being a blocker for a mark-up level pattern. Of course, If we can't work out a fix, then you're absolutely right that we can't go requiring something that's too expensive to parse (especially given parsing expense is the whole reason for having specified data formats within microformats in the first place!). But, if it's feasible to fix tidy for microformat parsers, then I'd be in favour of doing so. B From lists at ben-ward.co.uk Sun May 18 07:44:47 2008 From: lists at ben-ward.co.uk (Ben Ward) Date: Sun May 18 07:45:01 2008 Subject: [uf-dev] Defining and Extending Value Excepting In-Reply-To: References: Message-ID: <72CA99A4-02E1-4666-899D-8DF1B34ACAFF@ben-ward.co.uk> On 17 May 2008, at 23:08, Toby A Inkster wrote: > Although this sounds like a nice idea, I've previously been informed > that requiring empty inline elements is a non-starter, as many HTML > processors (including "tidy" with its default settings) strip these > out. As a second followup to this, I've built a version of HTMLTidy which does not strip empty elements where a class attribute is present. You can download a copy from http://ben-ward.co.uk/files/tidy-microformats.zip It's built on Mac OSX (Intel), and I can't recall what the deal is with OSX binaries running on other forms of UNIX. However, I've included the diff, so it should be trivial to compile other builds on other platforms as required. Cheers, Ben From lists at ben-ward.co.uk Thu May 22 03:22:03 2008 From: lists at ben-ward.co.uk (Ben Ward) Date: Thu May 22 03:22:14 2008 Subject: [uf-dev] Defining and Extending Value Excepting In-Reply-To: References: Message-ID: <25D51B07-201C-48F3-A1F6-8B2909B88B15@ben-ward.co.uk> OK, pushing on a bit: I've got one flaw with my own suggestion here, which is that using class="value" is going to cause a bit of car-crash in hCard, due to the two instances of machine-data identified in the tel property (documented on the wiki page ). The type property works alongside the other specified use of value, whilst it's possible for the value itself to need a hidden data value. In combination with this new value-pattern, we could end up with mark-up like:

Mobile Phone +1-555-FORMATS

That's? messy. Value of Value is especially unpleasant, parsing the value of tel without parsing the value of type as the value of tel strikes me as complex (although, with value-excerpting itself not fully spec'd, maybe it could be made to work). So I'm suggesting one quick alteration here, which is to use a class=data rather than class=value, so as to avoid the example above. I'm thinking this from a publisher point of view as much as anything; I'd like to avoid that above scenario of nesting the same class for different behaviours. Once again, more feedback on the pattern from a parsing angle would be great. I'd like to be confident that the pattern is robust and parsable before presenting it to ?f-discuss; I don't want to lose it in a maelstrom :-) Thanks, Ben On 17 May 2008, at 23:08, Toby A Inkster wrote: > Although this sounds like a nice idea, I've previously been informed > that requiring empty inline elements is a non-starter, as many HTML > processors (including "tidy" with its default settings) strip these > out. > > Preliminary testing with tidy (version: 1 September 2005) shows this > to be true. Some parsers, including X2V IIRC, pre-process non-XHTML > HTML by running it through tidy to get it into well-formed XML. > Skimming through the tidy documentation, I can't see a way of > disabling this empty inline element stripping behaviour. > > If people *want* to publish data that uses empty inline elements, > then that's fair enough, but with the current state of HTML > processors, it's probably unwise to publish a pattern that > *requires* the use of empty inline elements. > > -- > Toby A Inkster > > > > > > _______________________________________________ > microformats-dev mailing list > microformats-dev@microformats.org > http://microformats.org/mailman/listinfo/microformats-dev From brian.suda at gmail.com Thu May 22 03:34:19 2008 From: brian.suda at gmail.com (Brian Suda) Date: Thu May 22 03:34:23 2008 Subject: [uf-dev] Defining and Extending Value Excepting In-Reply-To: <25D51B07-201C-48F3-A1F6-8B2909B88B15@ben-ward.co.uk> References: <25D51B07-201C-48F3-A1F6-8B2909B88B15@ben-ward.co.uk> Message-ID: <21e770780805220334j7e3055bayc5cbf332f7ce2bd@mail.gmail.com> 2008/5/22 Ben Ward : >

> > Mobile Phone > > > > +1-555-FORMATS > > >

> > That's? messy. Value of Value is especially unpleasant, parsing the value of > tel without parsing the value of type as the value of tel strikes me as > complex (although, with value-excerpting itself not fully spec'd, maybe it > could be made to work). --- in your example, if you are only interested in the +15553676177, then there is no need for the outer class="value" around the +1-555-FORMATS -brian -- brian suda http://suda.co.uk From glenn.jones at madgex.com Thu May 22 07:05:46 2008 From: glenn.jones at madgex.com (Glenn Jones) Date: Thu May 22 07:05:55 2008 Subject: [uf-dev] Defining and Extending Value Excepting In-Reply-To: <25D51B07-201C-48F3-A1F6-8B2909B88B15@ben-ward.co.uk> References: <25D51B07-201C-48F3-A1F6-8B2909B88B15@ben-ward.co.uk> Message-ID: <36A319113CF910438942741C4727ADFF01E97814@MOBY.Clarence.local> 2008/5/22 Ben Ward : >

> > Mobile Phone > > > > +1-555-FORMATS > > >

> > That's... messy. Value of Value is especially unpleasant, parsing the > value of tel without parsing the value of type as the value of tel > strikes me as complex (although, with value-excerpting itself not > fully spec'd, maybe it could be made to work). This would be really hard for me to add the above to ufXtract. Your right nested Value of Value is especially unpleasant. Adding the "Invisible Supplementary Data" idea as below, should not be a problem Tomorrow lunchtime It looks like Cognition and Optimus are already picking up invisible supplementary data pattern for the geo class http://www.ufxtract.com/testsuite/experimental/experimental1.htm (Press 'Alt X' and run test) Glenn Jones From lists at ben-ward.co.uk Thu May 22 07:08:30 2008 From: lists at ben-ward.co.uk (Ben Ward) Date: Thu May 22 07:08:37 2008 Subject: [uf-dev] Specification of Value-Excerpting Message-ID: <22689992-ADD3-4541-8FA0-5EF2BD61B7A0@ben-ward.co.uk> Related to the machine-data documentation and empty-element-value- excerpting http://microformats.org/wiki/machine-data, I'd like to get proper documentation written on the value-excerpting behaviour first described in hCard. It's currently covered by a single paragraph in the hCard spec, which is massively insufficient. It's also exposed issues lately about putting values out of nested microformats and even just out of complexly nested properties. I'm vaguely aware that Operator has implemented safety nets by not parsing with other known microformats, but that seems to be flawed solution as it depends on every parser knowing about every other microformat. There's the ongoing class=mfo idea, which is a separate solution to that problem, but we should get the parsing behaviour of class=value (sans mfo) tightened up. I've created an initial wiki page at ? with basic starting points of how it's supposed to work. However, I don't understand the intricacies of existing implementations enough to fully document parsing, so it's marked as being a ?draft, don't publish this yet? page. One requirement for value-excerption that seems critical is that it must be implementable without parsers having knowledge of every other microformat. It needs to be possible to write an ?hcard parser? that stands alone, nor that needs to be updated every time a new microformat is developed. There are some other notes on that page too, and a ?parsing to-do? list that I'd encourage you all to add to, so that we can get this fully defined and interoperable. Thanks and regards, Ben From mkaply at us.ibm.com Thu May 22 07:44:43 2008 From: mkaply at us.ibm.com (Michael Kaply) Date: Thu May 22 07:45:14 2008 Subject: [uf-dev] Specification of Value-Excerpting In-Reply-To: <22689992-ADD3-4541-8FA0-5EF2BD61B7A0@ben-ward.co.uk> Message-ID: There is another page on the wiki about value excerpting somewhere besides this one: http://microformats.org/wiki/hCard#Value_excerpting I don't know where it is, but I saw it at one point. It specifically says that you are only supposed to use child nodes to get the values, NOT descendants. Michael Kaply Firefox Advocate mkaply@us.ibm.com http://www.kaply.com/weblog/ (External Blog) http://blogs.tap.ibm.com/weblogs/page/mkaply@us.ibm.com (Internal Blog) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080522/20426b6c/attachment.html From mail at tobyinkster.co.uk Thu May 22 10:13:50 2008 From: mail at tobyinkster.co.uk (Toby A Inkster) Date: Thu May 22 10:14:04 2008 Subject: [uf-dev] Defining and Extending Value Excepting Message-ID: Glenn Jones: > It looks like Cognition and Optimus are already picking up invisible > supplementary data pattern for the geo class > http://www.ufxtract.com/testsuite/experimental/experimental1.htm > (Press > 'Alt X' and run test) Actually, it's just a case of good luck. Cognition implements an extra optimisation for geo, which your example coincidentally triggers. It's documented here: http://buzzword.org.uk/cognition/uf-plus.html#geo -- Toby A Inkster From glenn.jones at madgex.com Fri May 23 02:18:27 2008 From: glenn.jones at madgex.com (Glenn Jones) Date: Fri May 23 02:18:32 2008 Subject: [uf-dev] The correct format of a ISO date Message-ID: <36A319113CF910438942741C4727ADFF01E978EB@MOBY.Clarence.local> I have a questions about the use of the Z char (Zulu time or UTC) and the time zone information together. So these are correct 2007-05-01T11:30:15Z 2007-05-01T11:30:15-08:00 What about 2007-05-01T11:30:15Z-08:00 I cannot seem to find a clear pointer on whether the above is a valid ISO date. Background work http://www.ufxtract.com/testsuite/documentation/iso-date-normalisation.h tm http://www.ufxtract.com/testsuite/hcard/hcard15.htm (Alt X to run test) Examples of date formats I think are OK 2008-01-21 20080121 2007-05-01T11:30 2007-05-01 11:30 20070501 11:30 20070501T1130 2007-05-01T11:30:15 20070501T113015 2007-05-01T11:30Z-08:00 2007-05-01T11:30-08:00 2007-05-01T11:30+08:00 2007-05-01T11:30Z08:00 20070501T1130Z-0800 2007-05-01T11:30Z 2007-05 07-05-01 (equals 2007-05-01) 070501 (equals 2007-05-01) The last one is interesting http://en.wikipedia.org/wiki/ISO_8601 ... "Although the standard allows both the YYYY-MM-DD and YYYYMMDD formats for complete calendar date representations, if the day [DD] is omitted then only the YYYY-MM format is allowed. By disallowing dates of the form YYYYMM, the standard avoids confusion with the truncated representation YYMMDD (still often used)." Glenn Jones From norm at cackhanded.net Fri May 23 03:26:41 2008 From: norm at cackhanded.net (Mark Norman Francis) Date: Fri May 23 03:26:46 2008 Subject: [uf-dev] The correct format of a ISO date In-Reply-To: <36A319113CF910438942741C4727ADFF01E978EB@MOBY.Clarence.local> References: <36A319113CF910438942741C4727ADFF01E978EB@MOBY.Clarence.local> Message-ID: <10B5F88E-540A-4739-B9EC-086F21F607B6@cackhanded.net> > 2007-05-01T11:30:15Z-08:00 I think that is incorrect, as it is an either-or. The timezone is: * omitted, therefore local timezone * Z, therefore UTC * +/-HH:MM, therefore an offset from UTC The W3C page on date/time formats () says: > TZD = time zone designator (Z or +hh:mm or -hh:mm) Also see for more notes on the ISO standard. -- Norm. From glenn.jones at madgex.com Fri May 23 04:55:17 2008 From: glenn.jones at madgex.com (Glenn Jones) Date: Fri May 23 04:55:26 2008 Subject: [uf-dev] The correct format of a ISO date In-Reply-To: <10B5F88E-540A-4739-B9EC-086F21F607B6@cackhanded.net> References: <36A319113CF910438942741C4727ADFF01E978EB@MOBY.Clarence.local> <10B5F88E-540A-4739-B9EC-086F21F607B6@cackhanded.net> Message-ID: <36A319113CF910438942741C4727ADFF01E979D4@MOBY.Clarence.local> Thanks Norm >The W3C page on date/time formats () says: >> TZD = time zone designator (Z or +hh:mm or -hh:mm) That pity clear, I will change my code and tests. Glenn From mkaply at us.ibm.com Fri May 23 08:05:12 2008 From: mkaply at us.ibm.com (Michael Kaply) Date: Fri May 23 08:08:36 2008 Subject: [uf-dev] The correct format of a ISO date In-Reply-To: <36A319113CF910438942741C4727ADFF01E978EB@MOBY.Clarence.local> Message-ID: microformats-dev-bounces@microformats.org wrote on 05/23/2008 04:18:27 AM: > Examples of date formats I think are OK > > 2008-01-21 > 20080121 > 2007-05-01T11:30 > 2007-05-01 11:30 > 20070501 11:30 I thought the T was required? > 20070501T1130 > 2007-05-01T11:30:15 > 20070501T113015 > 2007-05-01T11:30Z-08:00 Definitely invalid - Z and offset are mutually exclusive > 2007-05-01T11:30-08:00 > 2007-05-01T11:30+08:00 > 2007-05-01T11:30Z08:00 > 20070501T1130Z-0800 > 2007-05-01T11:30Z Definitely invalid - Z and offset are mutually exclusive > 2007-05 > 07-05-01 (equals 2007-05-01) > 070501 (equals 2007-05-01) I sincerely hope noone would ever actually do anything like this. I'm not going to handle it in Operator. I can't believe they even allow this. It's a specification. So they can say "Always have the year" I hate ambiguity in dates and I hate parsing ISO dates. > The last one is interesting > http://en.wikipedia.org/wiki/ISO_8601 ... > "Although the standard allows both the YYYY-MM-DD and YYYYMMDD formats > for complete calendar date representations, if the day [DD] is omitted > then only the YYYY-MM format is allowed. By disallowing dates of the > form YYYYMM, the standard avoids confusion with the truncated > representation YYMMDD (still often used)." Mike Kaply -------------- next part -------------- An HTML attachment was scrubbed... URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080523/2e98ab99/attachment-0001.html From norm at cackhanded.net Fri May 23 09:06:31 2008 From: norm at cackhanded.net (Mark Norman Francis) Date: Fri May 23 09:36:57 2008 Subject: [uf-dev] The correct format of a ISO date In-Reply-To: References: Message-ID: > I thought the T was required? No, it can be omitted. Most sane people do not choose that format for on-the-wire data though. It's also one of our best practices at work to use the T format to remind lazy programmers that they cannot just echo out an ISO date string to end users. -- Norm. From mail at tobyinkster.co.uk Fri May 23 11:27:31 2008 From: mail at tobyinkster.co.uk (Toby A Inkster) Date: Fri May 23 11:27:49 2008 Subject: [uf-dev] The correct format of a ISO date Message-ID: Glenn Jones wrote: > 07-05-01 (equals 2007-05-01) > 070501 (equals 2007-05-01) Thanks for these examples. Although the current version of Cognition parses these dates correctly, it marks the "resolution" of the dates as being "month" (because they only have 6 numeric digits), so when outputting them will only output the year and month even though it knows the day internally. :-( Fix in the next release. If you really want to test full ISO compatibility, then you should include: 2008-W21 2008W21 2008-W21-5 2008W215 2008-144 2008144 Plus "T..." variants (i.e. with times). Cognition supports them all because it uses the Perl DateTime::Format::ISO8601 module, which is fairly comprehensive. But I don't think implementations should be expected to support the entire ISO8601 -- the W3CDTF note subset should be all that's required. -- Toby A Inkster From norm at cackhanded.net Fri May 23 11:51:14 2008 From: norm at cackhanded.net (Mark Norman Francis) Date: Fri May 23 11:51:21 2008 Subject: [uf-dev] The correct format of a ISO date In-Reply-To: References: Message-ID: <86A6A206-396F-4EC1-B170-3201B1F7DF1F@cackhanded.net> >> 07-05-01 (equals 2007-05-01) >> 070501 (equals 2007-05-01) > > Thanks for these examples. Although the current version of Cognition > parses these dates correctly, it marks the "resolution" of the dates > as being "month" (because they only have 6 numeric digits), so when > outputting them will only output the year and month even though it > knows the day internally. :-( Fix in the next release. Actually, according to the Wikipedia page on ISO 8601: > ISO 8601 prescribes, as a minimum, a four-digit year [YYYY] to avoid > the year 2000 problem. Not having a personal copy of 8601 to check, I can't verify this, but it seems wise to me. ;) -- Norm. From mail at tobyinkster.co.uk Fri May 23 12:20:12 2008 From: mail at tobyinkster.co.uk (Toby A Inkster) Date: Fri May 23 12:20:31 2008 Subject: [uf-dev] The correct format of a ISO date Message-ID: <36DE9D7A-DF34-4049-9CFA-B998D815CE36@tobyinkster.co.uk> Mark Norman Francis wrote: > Actually, according to the Wikipedia page on ISO 8601: > > ISO 8601 prescribes, as a minimum, a four-digit year [YYYY] to avoid > > the year 2000 problem. The confusion is due to the fact that there are three editions of ISO 8601: ISO 8601:1988 (E) ISO 8601:2000 (E) ISO 8601:2004 (E) The first two allow two-digit years. The most recent edition disallows them, but IIRC parsers are still expected to accept them, as they may be produced by legacy ISO 8601 code. Also worth consideration are date formats like: --05-23 (Day and month; year not specified) ---23 (Day; month and year not specified) -145 (Ordinal day; year not specified) -W21-5 (Week and day; year not specified) Oh, and commas are allowed to be used as decimal points. Oh, and decimals are not just allowed after seconds, but also after minutes and hours. It is for these reasons that we really must specify a subset of ISO 8601 -- the W3CDTF subset would be idea. -- Toby A Inkster From scott at randomchaos.com Fri May 23 12:54:58 2008 From: scott at randomchaos.com (Scott Reynen) Date: Fri May 23 12:55:06 2008 Subject: [uf-dev] The correct format of a ISO date In-Reply-To: <36DE9D7A-DF34-4049-9CFA-B998D815CE36@tobyinkster.co.uk> References: <36DE9D7A-DF34-4049-9CFA-B998D815CE36@tobyinkster.co.uk> Message-ID: On [May 23], at [ May 23] 1:20 , Toby A Inkster wrote: > The confusion is due to the fact that there are three editions of > ISO 8601: The datetime design pattern page [1] in the wiki says: "Any microformat using the date-time-design pattern should use a profile of ISO8601. There are currently two widely used profiles which should be reused. - RFC 3339 - W3C Note on Datetimes" That seems to clear this up, but then there's more confusing language on the ISO 8601 page: "Microformats should use RFC 3339." In addition to being more specific than the previous recommendation, this one applies RFC 2119 "should" to the microformat itself rather than implementors of the microformat, which doesn't make much sense. Further confusing matters, individual microformats make no mention of RFC 3339, referring only to ISO 8601. We should probably clarify the actual source(s) for date formats before we spend too much time testing them. [1] http://microformats.org/wiki/datetime-design-pattern [2] http://microformats.org/wiki/iso-8601 Peace, Scott From mail at tobyinkster.co.uk Fri May 23 14:13:31 2008 From: mail at tobyinkster.co.uk (Toby A Inkster) Date: Fri May 23 14:13:39 2008 Subject: [uf-dev] The correct format of a ISO date Message-ID: Scott Reynen wrote: > In addition to being more specific than the previous recommendation, > this one applies RFC 2119 "should" to the microformat itself rather > than implementors of the microformat, which doesn't make much sense. > Further confusing matters, individual microformats make no mention of > RFC 3339, referring only to ISO 8601. I raised this very issue a couple of months ago: http://microformats.org/discuss/mail/microformats-discuss/2008-March/ 011712.html In short the datetime design pattern says that microformats making use of it must define a profile (i.e. subset) of ISO 8601 that is supported. But none do. I've tried to address this in my experimental hCalendar 1.1 spec: http://microformats.org/wiki/User:TobyInk/hcalendar-1.1#Dates_and_Times -- Toby A Inkster From glenn.jones at madgex.com Sun May 25 09:46:25 2008 From: glenn.jones at madgex.com (Glenn Jones) Date: Sun May 25 09:46:32 2008 Subject: [uf-dev] The correct format of a ISO date Message-ID: <36A319113CF910438942741C4727ADFF01EEF3B1@MOBY.Clarence.local> I think I now have a handle on this date stuff. Thanks for everyone's comments. I have to say that the documentation clarity for using dates and times in Microformats is not good at the moment. Pointing people at ISO 8601 is not a good idea. Toby's point about specifying a the profile for each usage in the wiki is important. Maybe all the language on the wiki should be about the profiles. Also changing it over to examples and use cases rather than point people at dry specs? There are a couple of smaller points I still outstanding like Specifying RFC 3339 plus 'T' and 'Z' MUST be caps has been suggested in the past, but then it's not RFC 3339 So here my new take http://www.ufxtract.com/testsuite/documentation/iso-date.htm New test pages http://www.ufxtract.com/testsuite/hcard/hcard15.htm http://www.ufxtract.com/testsuite/hcard/hcard16.htm W3C Note datetime profile - valid structures 2007 2007-05 2007-05-01T11:30 2007-05-01T11:30Z 2007-05-01T11:30:00Z 2007-05-01T11:30+08:00 2007-05-01T11:30:00+08:00 2007-05-01T11:30:00.0135 RFC 3339 profile - valid structures 2007 2007-05 2007-05-01T11:30 2007-05-01T11:30Z 2007-05-01T11:30:00Z 2007-05-01T11:30+08:00 2007-05-01T11:30:00+08:00 2007-05-01T11:30:00.0135 200801 20080121 20070501T1130 20070501T113015 20070501T113015Z 20070501t113025z 2007-05-01T113025 20070501T11:30:25 Valid ISO 8601 date time that SHOULD NOT be used in Microformats 070501 07-05-01 20070501 1130 20070501 113015Z 2007-05-01 11:30:00+08:00 2007-05-01 11:30:00.0135 2007-05-01T11.0150 2007-05-01T11:30.0150 2008-W21 2008W21 2008-W21-5 2008W215 2008-144 2008144 etc... Glenn Jones