From glenn.jones at madgex.com Tue Jul 1 01:28:24 2008 From: glenn.jones at madgex.com (Glenn Jones) Date: Tue Jul 1 01:28:33 2008 Subject: [uf-discuss] Human and machine readable data format In-Reply-To: References: <36A319113CF910438942741C4727ADFF02132B0D@MOBY.Clarence.local><1214820941.3171.25.camel@localhost.localdomain><61CCA888-066E-4F3E-8023-85F0EFBB37B5@adactio.com><28AAF834-9517-42AF-9FD0-982891C9AA50@ben-ward.co.uk><36A13CC9-03D2-46B6-AA3E-5DBDAFB7940A@adactio.com><3BCE3C9D-2E84-4DFF-AD18-891C8CB492FB@randomchaos.com> Message-ID: <36A319113CF910438942741C4727ADFF02132F1F@MOBY.Clarence.local> As the exchange between Ben and Jeremy has shown what is human readable is up for debate. Having spent far too much time looking at the ISO date formats they are all readable to me, but I know that's not the case for everyone else. We need to expand the discussion and ask those involved in the accessibility area what is an acceptable human readable format. The format 2008-01-25 is a compromise and as such we need to ask the other party is it's an acceptable middle ground. For example would the BBC accept 2008-01-25 in the title of a abbr. For me a good rule of thumb is as a html author would you be happy writing out the format in the text of a page for your users to read. I personally would never write 2008-01-05 in a public document. My main issue with the "value excerption optimization rule" approach that Jeremy has been talking about, is that it may not work with other data types A 2 day event Northern California EST 4 out of 5 Etc. The only way to escape the internationalisation issues is not to use anything other than numerical and separator chars. Expressing a duration of "2 weeks and 3 days" in numbers and is still making it human readable is a challenge! Could we also say the rate title attribute with a value "4" is "provide the full or expanded form of the expression" 4 out of 5. We do need to resolve this issue globally across all content which requires machine readability. Although this option looks attractive at first sight, it is still problematic. Glenn Jones From xbadosa at gmail.com Tue Jul 1 05:19:15 2008 From: xbadosa at gmail.com (Xavier Badosa) Date: Tue Jul 1 05:26:36 2008 Subject: [uf-discuss] Microformat for statistical (tabular) data Message-ID: <73b889410807010519m48e0853o3dbdf0a31e14d75b@mail.gmail.com> Is there anyone working on a microformat for statistical information? Such a microformat could be used to add more semantics to s. For example, the unit of the data, the time of reference, update time, etc. Some existing standards in the field to consider: SDMX http://www.sdmx.org COSSI http://www.stat.fi/org/tut/dthemes/drafts/cossi_en.html Also: DDI http://www.ddialliance.org XBRL http://www.xbrl.org X. From scott at randomchaos.com Tue Jul 1 05:28:03 2008 From: scott at randomchaos.com (Scott Reynen) Date: Tue Jul 1 05:28:13 2008 Subject: [uf-discuss] Human and machine readable data format In-Reply-To: References: <36A319113CF910438942741C4727ADFF02132B0D@MOBY.Clarence.local> <1214820941.3171.25.camel@localhost.localdomain> <61CCA888-066E-4F3E-8023-85F0EFBB37B5@adactio.com> <28AAF834-9517-42AF-9FD0-982891C9AA50@ben-ward.co.uk> <36A13CC9-03D2-46B6-AA3E-5DBDAFB7940A@adactio.com> <3BCE3C9D-2E84-4DFF-AD18-891C8CB492FB@randomchaos.com> Message-ID: On [Jun 30], at [ Jun 30] 11:12 , Breton Slivka wrote: > I think you'll find that metadata of any kind is a comprimise of the > "microformats core principles" What I mean by "metadata" is information about content, which already makes up the bulk of microformats, e.g. class names, rel values, tag names, none of which is readily visible to humans. Making content visible is a principle; making such metadata visible is not. The difference with ISO dates is we've previously defined them as content; I'm suggesting that's a mistaken definition, as these dates don't function as content in our reference standard iCalendar. Peace, Scott From lists at ben-ward.co.uk Tue Jul 1 06:09:12 2008 From: lists at ben-ward.co.uk (Ben Ward) Date: Tue Jul 1 06:09:18 2008 Subject: [uf-discuss] Human and machine readable data format In-Reply-To: References: <36A319113CF910438942741C4727ADFF02132B0D@MOBY.Clarence.local> <1214820941.3171.25.camel@localhost.localdomain> <61CCA888-066E-4F3E-8023-85F0EFBB37B5@adactio.com> <28AAF834-9517-42AF-9FD0-982891C9AA50@ben-ward.co.uk> <36A13CC9-03D2-46B6-AA3E-5DBDAFB7940A@adactio.com> <3BCE3C9D-2E84-4DFF-AD18-891C8CB492FB@randomchaos.com> Message-ID: <18E0B8A8-096D-479C-AC3D-445A407B48DD@ben-ward.co.uk> On 1 Jul 2008, at 13:28, Scott Reynen wrote: > The difference with ISO dates is we've previously defined them as > content; I'm suggesting that's a mistaken definition, as these dates > don't function as content in our reference standard iCalendar. In my view, it's not so much that an ISO dates isn't content per se, it's that it's not content for humans, and in this case, the date content for humans is being published in a different form. In HTML, visible content is for humans, content for machines is hidden. This makes for a violation of the DRY principal, but it's the same violation we're already making, and it applies not just to datetimes, but also to durations (which has only just been mentioned in this discussion, and is important not to ignore), hCard telephone types, geo co-ordinates, and everything else documented on http://microformats.org/wiki/machine-data . As an aside, this is why I favoured and have done some initial work into the empty-element-with-title extension to the value-excerption- pattern (which I'm also leading the effort to get properly specified, since it's previously not been). It keeps the machine content in the HTML, can be specified to keep it physical proximity to the human form, but due to the way empty elements are treated, does not expose that content to humans. It does not violate DRY any more than we already do and in relation to the ?hidden data? principal, I argue these are exceptional cases _because_ they are DRY violations. We are not hiding information, we're hiding an alternate representation of visible information. (issues page: http://microformats.org/wiki/value-excerption-pattern-issues) . Much of this same line of discussion applies to the class-name data embedding that Jake and Frances have discussed. If there's a semantically acceptable solution to this, which doesn't violate any principals, or DRY, or the semantics of HTML, doesn't compromise accessibility or internationalisation, and meets publishers demands for flexibility and doesn't compromise user experience, then that would be fantastic. None of the discussions so far seem to match that. B From michael.hausenblas at joanneum.at Tue Jul 1 07:37:05 2008 From: michael.hausenblas at joanneum.at (Hausenblas, Michael) Date: Tue Jul 1 08:36:04 2008 Subject: [uf-discuss] Microformat for statistical (tabular) data In-Reply-To: <73b889410807010519m48e0853o3dbdf0a31e14d75b@mail.gmail.com> Message-ID: <768DACDC356ED04EA1F1130F97D29852017A16A9@RZJC2EX.jr1.local> We work not precisely on a microformat, but you may also want to look at http://purl.org/NET/scovo (the statistical core vocabulary). Cheers, Michael ---------------------------------------------------------- Michael Hausenblas, MSc. Institute of Information Systems & Information Management JOANNEUM RESEARCH Forschungsgesellschaft mbH http://www.joanneum.at/iis/ ---------------------------------------------------------- >-----Original Message----- >From: microformats-discuss-bounces@microformats.org >[mailto:microformats-discuss-bounces@microformats.org] On >Behalf Of Xavier Badosa >Sent: Tuesday, July 01, 2008 2:19 PM >To: microformats-discuss@microformats.org >Subject: [uf-discuss] Microformat for statistical (tabular) data > >Is there anyone working on a microformat for statistical information? >Such a microformat could be used to add more semantics to
s. >For example, the unit of the data, the time of reference, update time, >etc. > >Some existing standards in the field to consider: > >SDMX >http://www.sdmx.org > >COSSI >http://www.stat.fi/org/tut/dthemes/drafts/cossi_en.html > >Also: > >DDI >http://www.ddialliance.org > >XBRL >http://www.xbrl.org > >X. >_______________________________________________ >microformats-discuss mailing list >microformats-discuss@microformats.org >http://microformats.org/mailman/listinfo/microformats-discuss > From guillaume at lebleu.org Tue Jul 1 09:01:33 2008 From: guillaume at lebleu.org (Guillaume Lebleu) Date: Tue Jul 1 09:01:59 2008 Subject: [uf-discuss] Human and machine readable data format In-Reply-To: <36A319113CF910438942741C4727ADFF02132F1F@MOBY.Clarence.local> References: <36A319113CF910438942741C4727ADFF02132B0D@MOBY.Clarence.local><1214820941.3171.25.camel@localhost.localdomain><61CCA888-066E-4F3E-8023-85F0EFBB37B5@adactio.com><28AAF834-9517-42AF-9FD0-982891C9AA50@ben-ward.co.uk><36A13CC9-03D2-46B6-AA3E-5DBDAFB7940A@adactio.com><3BCE3C9D-2E84-4DFF-AD18-891C8CB492FB@randomchaos.com> <36A319113CF910438942741C4727ADFF02132F1F@MOBY.Clarence.local> Message-ID: <486A54DD.2030105@lebleu.org> Glenn Jones wrote: > As the exchange between Ben and Jeremy has shown what is human readable > is up for debate. Having spent far too much time looking at the ISO date > formats they are all readable to me, but I know that's not the case for > everyone else. > > We need to expand the discussion and ask those involved in the > accessibility area what is an acceptable human readable format. The > format 2008-01-25 is a compromise and as such we need to ask the other > party is it's an acceptable middle ground. For example would the BBC > accept 2008-01-25 in the title of a abbr. > Since the BBC's request was specifically related to screen readers, we may want to distinguish "machine-readable", "human-readable" and "human-hearable". I think there is less debate re: what is "human-hearable" than there is debate re: what is "human-readable" IMO, "2008-01-25" is indeed more human-readable than "2008-01-25T12:00:11", but it is still less "human-hearable" than the plain old English "January 25th, 2008", which is human-readable and machine-readable as long as it is written following precisely English US conventions and the locale can be deduced from a lang attribute (either global to the HTML document or local to the date). Moreover, "January 25th, 2008" is indeed an expansion form of say "1/25" so, the following is correct HTML: 1/25 Guillaume From xbadosa at gmail.com Tue Jul 1 09:08:10 2008 From: xbadosa at gmail.com (Xavier Badosa) Date: Tue Jul 1 09:08:15 2008 Subject: [uf-discuss] Current state of grouping proposal Message-ID: <73b889410807010908m162c2117v907d9d56b9d228d8@mail.gmail.com> I'm a little confused about the current state of the grouping proposal. I'm not sure even if the uf-community is working on a general solution (a microformat for grouping any sort of items) (+1 vote) or a particular solution for some of the existing microformats (0 votes). I think some sort of grouping is needed in hReview if we want to follow the principle of adapting to current behaviors and usage patterns. Usually, webpages include more than one review for a single item. To solve this, hReview forces us: 1) to repeat an unnecessary hidden item for every hreview (this somehow violates the hidden (meta)data principle); or 2) to use the include-pattern (empty anchor, accessibility issues). A grouping mechanism could come to the rescue. Something like:

The Godfather II

The best!

Some guy

Soooooo good!

Enthusiastic girl

could be interpreted by a parser that the same item should be associated with every hreview. In fact, a grouping microformat would be an alternative (easy to parse) include-pattern mechanism. X. From guillaume at lebleu.org Tue Jul 1 09:27:19 2008 From: guillaume at lebleu.org (Guillaume Lebleu) Date: Tue Jul 1 09:27:23 2008 Subject: [uf-discuss] Plain Old English/French/..., human-readable/hearable alternative to ISO date Message-ID: <486A5AE7.3000705@lebleu.org> FYI. I've summarized/combined some of the ideas suggested by Glenn Jones, myself and others here [1]. I will elaborate on some of the details (ex. time) later. Guillaume [1] http://microformats.org/wiki/datetime-design-pattern#Plain_Old_English_alternative_to_ISO_date From xbadosa at gmail.com Tue Jul 1 08:11:37 2008 From: xbadosa at gmail.com (Xavier Badosa) Date: Tue Jul 1 09:59:27 2008 Subject: [uf-discuss] Current state of grouping proposal? A possible solution for hReview? Message-ID: <73b889410807010811l2404201id7e14d4e5b042592@mail.gmail.com> I'm a little confused about the current state of the grouping proposal. I'm not sure even if the uf-community is working on a general solution (a microformat for grouping any sort of items) (+1 vote) or a particular solution for some of the existing microformats (0 votes). I think some sort of grouping is needed in hReview if we want to follow the principle of adapting to current behaviors and usage patterns. Usually, webpages include more than one review for a single item. To solve this, hReview forces us: 1) to repeat an unnecessary hidden item for every hreview (this somehow violates the hidden (meta)data principle); or 2) to use the include-pattern (empty anchor, accessibility issues). A grouping mechanism could come to the rescue. Something like:

The Godfather II

The best!

Some guy

Soooooo good!

Enthusiastic girl

could be interpreted by a parser that the same item should be associated with every hreview. In fact, a grouping microformat would be an alternative (easy to parse) include-pattern mechanism. X. From lists at ben-ward.co.uk Tue Jul 1 09:42:48 2008 From: lists at ben-ward.co.uk (Ben Ward) Date: Tue Jul 1 10:04:12 2008 Subject: [uf-discuss] Human and machine readable data format In-Reply-To: <486A54DD.2030105@lebleu.org> References: <36A319113CF910438942741C4727ADFF02132B0D@MOBY.Clarence.local><1214820941.3171.25.camel@localhost.localdomain><61CCA888-066E-4F3E-8023-85F0EFBB37B5@adactio.com><28AAF834-9517-42AF-9FD0-982891C9AA50@ben-ward.co.uk><36A13CC9-03D2-46B6-AA3E-5DBDAFB7940A@adactio.com><3BCE3C9D-2E84-4DFF-AD18-891C8CB492FB@randomchaos.com> <36A319113CF910438942741C4727ADFF02132F1F@MOBY.Clarence.local> <486A54DD.2030105@lebleu.org> Message-ID: <61F0DF46-CD31-43A3-AF97-1D357E74B431@ben-ward.co.uk> On 1 Jul 2008, at 17:01, Guillaume Lebleu wrote: > Since the BBC's request was specifically related to screen readers, > we may want to distinguish "machine-readable", "human-readable" and > "human-hearable". I think there is less debate re: what is "human- > hearable" than there is debate re: what is "human-readable" The BBC complaint directly refers to both screen readers and the display of unexpected text in tool-tips. It's not just about aural output. At the core, in breaking with the semantics of an HTML element, we've broken the behaviour of technologies using the element correctly and intelligently (hence my strong opposition to continuing to stretch ABBR outside of textual abbreviations as commonly described by dictionaries: ?An abbreviation is a shortened form of a word or phrase.? ? Wikipedia, Apple OSX Dictionary, Dictionary.com) B From xbadosa at gmail.com Tue Jul 1 05:09:29 2008 From: xbadosa at gmail.com (Xavier Badosa) Date: Tue Jul 1 10:54:33 2008 Subject: [uf-discuss] Current state of grouping proposal? A possible solution for hReview? Message-ID: <73b889410807010509o6bcd2312l3cc52df066b2978b@mail.gmail.com> I'm a little confused about the current state of the grouping proposal. I'm not sure even if the uf-community is working on a general solution (a microformat for grouping any sort of items) (+1 vote) or a particular solution for some of the existing microformats (0 votes). I think some sort of grouping is needed in hReview if we want to follow the principle of adapting to current behaviors and usage patterns. Usually, webpages include more than one review for a single item. To solve this, hReview forces us: 1) to repeat an unnecessary hidden item for every hreview (this somehow violates the hidden (meta)data principle); or 2) to use the include-pattern (empty anchor, accessibility issues). A grouping mechanism could come to the rescue. Something like:

The Godfather II

The best!

Some guy

Soooooo good!

Enthusiastic girl

could be interpreted by a parser that the same item should be associated with every hreview. In fact, a grouping microformat would be an alternative (easy to parse) include-pattern mechanism. X. From danbri at danbri.org Tue Jul 1 11:16:00 2008 From: danbri at danbri.org (Dan Brickley) Date: Tue Jul 1 11:16:05 2008 Subject: [uf-discuss] Human and machine readable data format In-Reply-To: <486A54DD.2030105@lebleu.org> References: <36A319113CF910438942741C4727ADFF02132B0D@MOBY.Clarence.local><1214820941.3171.25.camel@localhost.localdomain><61CCA888-066E-4F3E-8023-85F0EFBB37B5@adactio.com><28AAF834-9517-42AF-9FD0-982891C9AA50@ben-ward.co.uk><36A13CC9-03D2-46B6-AA3E-5DBDAFB7940A@adactio.com><3BCE3C9D-2E84-4DFF-AD18-891C8CB492FB@randomchaos.com> <36A319113CF910438942741C4727ADFF02132F1F@MOBY.Clarence.local> <486A54DD.2030105@lebleu.org> Message-ID: <486A7460.1060704@danbri.org> Guillaume Lebleu wrote: > Glenn Jones wrote: >> As the exchange between Ben and Jeremy has shown what is human readable >> is up for debate. Having spent far too much time looking at the ISO date >> formats they are all readable to me, but I know that's not the case for >> everyone else. >> >> We need to expand the discussion and ask those involved in the >> accessibility area what is an acceptable human readable format. The >> format 2008-01-25 is a compromise and as such we need to ask the other >> party is it's an acceptable middle ground. For example would the BBC >> accept 2008-01-25 in the title of a abbr. >> > Since the BBC's request was specifically related to screen readers, we > may want to distinguish "machine-readable", "human-readable" and > "human-hearable". I think there is less debate re: what is > "human-hearable" than there is debate re: what is "human-readable" This reading is a little narrow: screen readers can also have Braille output; eg. see http://www.yourdolphin.com/productdetail.asp?id=5&z=1 http://en.wikipedia.org/wiki/Refreshable_Braille_display cheers, Dan -- http://danbri.org/ From uf-discuss at cilux.org Tue Jul 1 13:49:33 2008 From: uf-discuss at cilux.org (Duncan Cragg) Date: Tue Jul 1 13:49:39 2008 Subject: [uf-discuss] class="tag" In-Reply-To: References: <04DA6562-7A56-4E11-B05C-D2D2994E9709@tobyinkster.co.uk> <48679704.6050708@cilux.org> Message-ID: <486A985D.2030403@cilux.org> Ciaran McNulty wrote: > On Sun, Jun 29, 2008 at 3:07 PM, Duncan Cragg wrote: > >> Those of us who favour opaque URLs (actually for practical reasons such as >> clean separation of concerns, maintainability, etc.) are unhappy with being >> forced into a semantic URL schema when using rel-tag. >> > Can you go into a bit more detail, or point to a resource explaining > the benefits of opaque URLs? It's something I've not come across > before and I'd be intrigued to see the reasons behind it. > I'll do both. Here's a resource explaining it - I addressed the subject in this blog post: http://duncan-cragg.org/blog/post/content-types-and-uris-rest-dialogues/ That is a very transparent URL (see: I'm not obsessive about it!). The trouble with my URL is that it mixes three concerns: 1. making a connection to my server and kicking off HTTP 2. identifying a resource (with a completely opaque string) within HTTP 3. kicking off some Python code with an argument string It's 1. and 3. I'm talking about. URLs are already opaque to HTTP. As soon as you allow in syntax or schema in URLs - as soon as you start using anything other than long random numbers - you've got a problem of namespace allocation and schema standardisation. I refer to "Zooko's Triangle" on my blog's right rail which discusses the trade-off between global uniqueness, security and memorability. _________________________________________ On 1.: Unless you're running fancy P2P algorithms, it's hard to argue against putting a big hint in the URL to say where to go to find the resource. But don't forget that you needn't go to that server - you could ask an intermediary proxy - which is kind of a simplistic P2P algorithm... However, there is a case for arguing that DNS has been a failure: it isn't any more easy to type a URL when you know you have to be so precise to avoid scam sites. And it isn't any easier to use it to identify a site when you have to avoid the likes of www.yahoo.com.baddies.com or www.google.randomtld . You may as well only use IP addresses; as hard to type and as useless to read. Most programs come with a copy-paste function to save some typing... Add to this lack of security (and other security holes) the absurd scramble for domain name real estate and such bad behaviour as domain squatting, etc., and it's looking like a system that only system admins and crooks benefit from. Most people (including myself) would type 'acme' into Google instead of 'acme.com' into the URL bar, to give an extra level of intelligence, familiarity, trust and user interface consistency. _________________________________________ But really it's 3. that bothers me most. Using URLs to pass human-readable strings to an application 'above' HTTP. A transparent URL string is always a query string (whether it has a '?' or not) - in other words, it could potentially be ambiguous and return, not definitely one, but zero or many possible results. We probably get zero results when we 'hack' a URL or when the site gets reorganised. We gloss over the many-results case by returning a single page that we call 'query results'. But by allowing in zero or many resources so easily, we've loosened the Web by removing the definite 1-1 mapping of URL to resource. Hackable URLs should not be part of a self-respecting website's user interface. We would give a better user experience if we took the URL bar away and replaced it with a 'jump to first clipboard web link' button, for those copy-paste situations. Such a button would intelligently parse the text on the clipboard for URLs and jump to the first location discovered. A good information architecture and user interaction design makes hackable URLs irrelevant. Another problem is when people start using their knowledge of the URL structure to generate new URLs - it may be acceptable or encouraged (even prescribed in an HTML GET form), but each time it happens, we're creating a unique mini-contract - another non-standard schema. The Web thrives on URL proliferation, not on schema proliferation! The need for URLs to be reliable - to always return what they are expected to return each time they're used - means that whatever URL schema or namespace you come up with is something you're stuck with - people or even programs may depend on it. But there's no standards body or namespace body looking after the bigger picture for you. Your mistakes may haunt you for a long time. Also, query URLs are inherently /not/ reliable - the resource they return is /expected/ to change, which again makes their (re)-use less desirable. Clearly, the W3C's unfortunate 'httpRange-14' issue would never have occurred with opaque URLs. In other words, opaque, semantics-free HTTP URIs are /always/ dereferencable to 'information resources' and /never/ refer to cars! Strings that are part of a car domain model belong inside /content/ not in links to content - they belong above HTTP. I'm not fully conversant in the Semantic Web domain, but I suspect that there are issues in there that are caused by mixing up globally unique identifier strings used to build information structures with strings that are semantically-meaningful over those structures, and that can dereference to sets. So my main objection to transparent URLs is the way they mix up the mechanism for linking up the Web with a mechanism for querying it. The Web works fine using HTTP and opaque URLs. We have POST and Content-Type and OpenSearch schemas to query the Web. _________________________________________ Practical examples.. You can return opaque links to time-ordered collections listing the latest documents to be tagged 'semweb': semweb Keep your URLs opaque (like GUIDs in databases) and put your application data and queries in the content (like SQL queries and result sets in databases). Give your query content resources a first-class schema - see OpenSearch - and even their own URLs. POST these queries to opaque collection URLs. Make your result sets transient (returned in the POST response, thus no-cache by default). Result sets should only be 'grounded' (thus linkable and cacheable) if explicitly asked for in the query, when you should redirect to a new resource in the POST response. Of course, you can still surround the UUID/GUID part of your opaque URLs with human-readable string decorations, as long as they're never used to dereference the resource but just for mnemonic purpose, or for search engine optimisation. _________________________________________ I've gone on at length (again!), but hope you have had the patience to get my point of view. =0) Cheers! Duncan Cragg PS I work at the Financial Times over the river from you - but I was a URL opacitist /before/ having to wrangle with the FT CMS...! From brian.suda at gmail.com Tue Jul 1 14:05:10 2008 From: brian.suda at gmail.com (Brian Suda) Date: Tue Jul 1 14:05:22 2008 Subject: [uf-discuss] class="tag" In-Reply-To: <486A985D.2030403@cilux.org> References: <04DA6562-7A56-4E11-B05C-D2D2994E9709@tobyinkster.co.uk> <48679704.6050708@cilux.org> <486A985D.2030403@cilux.org> Message-ID: <21e770780807011405o601dbab4s54156e6a34ab6431@mail.gmail.com> On Tue, Jul 1, 2008 at 8:49 PM, Duncan Cragg wrote: > Practical examples.. > > You can return opaque links to time-ordered collections listing the latest > documents to be tagged 'semweb': > > semweb --- i think we are trying to re-invent: semweb Instead of trying to create "tag" as a class value which does the exact same thing as "category" we should approach the various microformats and see if they can/should simply include 'category' as one of the values they recognize rather than trying to re-invent rel-tag as class-tag. -brian -- brian suda http://suda.co.uk From mdagn at spraci.com Wed Jul 2 00:49:45 2008 From: mdagn at spraci.com (Michael MD) Date: Wed Jul 2 00:49:49 2008 Subject: [uf-discuss] Human and machine readable data format References: <36A319113CF910438942741C4727ADFF02132B0D@MOBY.Clarence.local><1214820941.3171.25.camel@localhost.localdomain><61CCA888-066E-4F3E-8023-85F0EFBB37B5@adactio.com><28AAF834-9517-42AF-9FD0-982891C9AA50@ben-ward.co.uk><36A13CC9-03D2-46B6-AA3E-5DBDAFB7940A@adactio.com><3BCE3C9D-2E84-4DFF-AD18-891C8CB492FB@randomchaos.com> <36A319113CF910438942741C4727ADFF02132F1F@MOBY.Clarence.local> <486A54DD.2030105@lebleu.org> Message-ID: <001401c8dc18$327bf750$116bacca@COMCEN> > IMO, "2008-01-25" is indeed more human-readable than > "2008-01-25T12:00:11", but it is still less "human-hearable" than the > plain old English "January 25th, 2008", which is human-readable and > machine-readable as long as it is written following precisely English US > conventions and the locale can be deduced from a lang attribute (either > global to the HTML document or local to the date). Allowing language conventions for date parsing to be determined by anything "global" sounds a bit dangerous to me. Someone might post on a shared blog/forum site in a different country and mark it up in a way that does not match a lang attribute somewhere else on the page! also - who is going to say that all replies to the post or comments that might also appear on that same page are going to follow the same language rules From guillaume at lebleu.org Wed Jul 2 09:36:30 2008 From: guillaume at lebleu.org (Guillaume Lebleu) Date: Wed Jul 2 09:36:39 2008 Subject: [uf-discuss] Human and machine readable data format In-Reply-To: <001401c8dc18$327bf750$116bacca@COMCEN> References: <36A319113CF910438942741C4727ADFF02132B0D@MOBY.Clarence.local><1214820941.3171.25.camel@localhost.localdomain><61CCA888-066E-4F3E-8023-85F0EFBB37B5@adactio.com><28AAF834-9517-42AF-9FD0-982891C9AA50@ben-ward.co.uk><36A13CC9-03D2-46B6-AA3E-5DBDAFB7940A@adactio.com><3BCE3C9D-2E84-4DFF-AD18-891C8CB492FB@randomchaos.com> <36A319113CF910438942741C4727ADFF02132F1F@MOBY.Clarence.local> <486A54DD.2030105@lebleu.org> <001401c8dc18$327bf750$116bacca@COMCEN> Message-ID: <486BAE8E.90401@lebleu.org> Michael MD wrote: > Allowing language conventions for date parsing to be determined by > anything "global" sounds a bit dangerous to me. > > Someone might post on a shared blog/forum site in a different country > and mark it up in a way that does not match a lang attribute somewhere > else on the page! > > also - who is going to say that all replies to the post or comments > that might also appear on that same page are going to follow the same > language rules Sorry if I didn't express myself clearly. What I meant here was that a lang="..." attribute on the element of class vevent or dstart is recommended at all times (to deal with the very ambiguity you are referring to), but is optional to comply with DRY. If not present, its value may be inferred from the closest containing/ancestor element with a lang attribute, for instance a lang attribute value at the level of the html element. In other words, if I want to write my date in French in an en-us html document, I'd have to attach lang="fr" to my date or its containing content, but if I want to write my date in American English in the same document, I don't have to attach lang="en-us", although it wouldn't hurt to. Do you still see this as dangerous practice? G From ameer1234567890 at gmail.com Wed Jul 2 13:04:00 2008 From: ameer1234567890 at gmail.com (Ameer Dawood) Date: Wed Jul 2 13:04:10 2008 Subject: [uf-discuss] Human and machine readable data format In-Reply-To: GmailId11ae4a7a3727c1b1 Message-ID: <17993138.377781215029040996.JavaMail.flurry@web1> Hi G, Internationalization of metadata is a bad and ineffective concept. It would not only result bloat in pharsers, but also bloat in the data format itself. You are proposing to internationalize the machine readable date which is metadata (and not content, in this case). A very clear example would be; if we are to internationalize CSS then "border-color" would become "border-colour" in en-uk. It's like proposing this change with a lang attribute/element. Ameer _____________________________ Sent from my phone using flurry - Get free mobile email and news at: http://www.flurry.com --- Original Message --- Date: Wed Jul 02 09:43:00 PDT 2008 From: Guillaume Lebleu To: Microformats Discuss Subject: Re: [uf-discuss] Human and machine readable data format --- Michael MD wrote: > Allowing language conventions for date parsing to be determined by anything "global" sounds a bit dangerous to me. > > Someone might post on a shared blog/forum site in a different country and mark it up in a way that does not match a lang attribute somewhere else on the page! > > also - who is going to say that all replies to the post or comments that might also appear on that same page are going to follow the same language rules Sorry if I didn't express myself clearly. What I meant here was that a lang="..." attribute on the element of class vevent or dstart is recommended at all times (to deal with the very ambiguity you are referring to), but is optional to comply with DRY. If not present, its value may be inferred from the closest containing/ancestor element with a lang attribute, for instance a lang attribute value at the level of the html element. In other words, if I want to write my date in French in an en-us html document, I'd have to attach lang="fr" to my date or its containing content, but if I want to write my date in American English in the same document, I don't have to attach lang="en-us", although it wouldn't hurt to. Do you still see this as dangerous practice? G _______________________________________________ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss From bjonkman at sobac.com Wed Jul 2 13:24:27 2008 From: bjonkman at sobac.com (Bob Jonkman) Date: Wed Jul 2 13:25:14 2008 Subject: [uf-discuss] Current state of grouping proposal In-Reply-To: <73b889410807010908m162c2117v907d9d56b9d228d8@mail.gmail.com> References: <73b889410807010908m162c2117v907d9d56b9d228d8@mail.gmail.com> Message-ID: <486BABBB.11260.1234965@bjonkman.sobac.com> I think the grouping mechanism can be accomplished with XOXO, http://microformats.org/wiki/xoxo --Bob. >>> 1 Jul 2008 18:08 Xavier Badosa >>> > I'm a little confused about the current state of the grouping > proposal. I'm not sure even if the uf-community is working on a > general solution (a microformat for grouping any sort of items) (+1 > vote) or a particular solution for some of the existing microformats > (0 votes). > > I think some sort of grouping is needed in hReview if we want to > follow the principle of adapting to current behaviors and usage > patterns. Usually, webpages include more than one review for a single > item. To solve this, hReview forces us: > > 1) to repeat an unnecessary hidden item for every hreview (this > somehow violates the hidden (meta)data principle); > > or > > 2) to use the include-pattern (empty anchor, accessibility issues). > > A grouping mechanism could come to the rescue. Something like: > >
>

The Godfather > II

>
The best!
>

Some guy

>
>
>
Soooooo good!
>

Enthusiastic > girl

>
>
> > could be interpreted by a parser that the same item should be > associated with every hreview. In fact, a grouping microformat would > be an alternative (easy to parse) include-pattern mechanism. > > X. -- -- -- -- Bob Jonkman http://sobac.com/sobac/ SOBAC Microcomputer Services Voice: +1-519-669-0388 6 James Street, Elmira ON Canada N3B 1L5 Cel: +1-519-635-9413 Software --- Office & Business Automation --- Consulting From bjonkman at sobac.com Wed Jul 2 15:37:05 2008 From: bjonkman at sobac.com (Bob Jonkman) Date: Wed Jul 2 16:09:20 2008 Subject: [uf-discuss] Human and machine readable data format In-Reply-To: References: <36A319113CF910438942741C4727ADFF02132B0D@MOBY.Clarence.local>, , Message-ID: <486BCAD1.20690.19CB840@bjonkman.sobac.com> >>> 1 Jul 2008 6:28 Scott Reynen >>> > The difference with ISO dates is we've previously defined them as > content; I'm suggesting that's a mistaken definition, as these dates > don't function as content in our reference standard iCalendar. I disagree. In an appointment, the date IS the content. The metadata is the markup that identifies the date and its purpose, eg. class="dtstart". With an the date content is represented in two different ways, one as prose ("tomorrow at noon"), and once as an expansion. In prosaic HTML it is valid (and appropriate) to write tomorrow at noon but that's not a suitable machine readable format. Microformats have properly used to expand prosaic dates, but the syntax has been friendly to neither screen readers nor title popups. So, the compromise is to have an expansion that's friendly to both screen readers and title popups, and is also machine readable. Splitting dates and time into separate chunks accomplishes most of that. tomorrow at noon For those who think this violates the semantic intent of I'm all in favour of a element. This can be combined nicely with for the screen reader and popup crowd:
Big blowout lunch party tomorrow at noon
(using the newly proposed date and time value excerpts) I've put inside to speak/display the innermost title (this needs testing!) --Bob. -- -- -- -- Bob Jonkman http://sobac.com/sobac/ SOBAC Microcomputer Services Voice: +1-519-669-0388 6 James Street, Elmira ON Canada N3B 1L5 Cel: +1-519-635-9413 Software --- Office & Business Automation --- Consulting From karl at w3.org Wed Jul 2 17:02:29 2008 From: karl at w3.org (Karl Dubost) Date: Wed Jul 2 17:02:37 2008 Subject: [uf-discuss] Human and machine readable data format In-Reply-To: <486BAE8E.90401@lebleu.org> References: <36A319113CF910438942741C4727ADFF02132B0D@MOBY.Clarence.local><1214820941.3171.25.camel@localhost.localdomain><61CCA888-066E-4F3E-8023-85F0EFBB37B5@adactio.com><28AAF834-9517-42AF-9FD0-982891C9AA50@ben-ward.co.uk><36A13CC9-03D2-46B6-AA3E-5DBDAFB7940A@adactio.com><3BCE3C9D-2E84-4DFF-AD18-891C8CB492FB@randomchaos.com> <36A319113CF910438942741C4727ADFF02132F1F@MOBY.Clarence.local> <486A54DD.2030105@lebleu.org> <001401c8dc18$327bf750$116bacca@COMCEN> <486BAE8E.90401@lebleu.org> Message-ID: Le 3 juil. 2008 ? 01:36, Guillaume Lebleu a ?crit : > In other words, if I want to write my date in French in an en-us > html document, I'd have to attach lang="fr" to my date or its > containing content, [?] > Do you still see this as dangerous practice? not dangerous but unpractical in the case of editions through web forms. Because of the state of art of browser implementations, there is no real and interoperable editing tool in the browser context. I guess it's one of the major blows for interesting authoring on the Web, now. -- Karl Dubost - W3C http://www.w3.org/QA/ Be Strict To Be Cool From zen at zenpsycho.com Wed Jul 2 19:04:44 2008 From: zen at zenpsycho.com (Breton Slivka) Date: Wed Jul 2 19:04:48 2008 Subject: [uf-discuss] Human and machine readable data format In-Reply-To: <17993138.377781215029040996.JavaMail.flurry@web1> References: <17993138.377781215029040996.JavaMail.flurry@web1> Message-ID: This is not internationalization of metadata, this is internationalization of Data, as in Content. We are talking about a date format that will be read out to users not just in the US, but potentially anywhere in the world. I honestly believe the "bloat" to parsers would be significant. Particularly if our precious parser authors are not incompetant, and we hope they are not. Please note that many programs (Excel is one example off the top of my head) provides exactly this type date parsing dependant on locale. Many more programs and operating systems provide services for the generation of Locale specific dates. The ecmascript standard includes such a facility for both generation, and parsing of locale specific dates. Ecmascript parsers must be light enough to work on a mobile device with a browser. I hope I have been persuasive in demonstrating that more sophisticated parsers will be necessary if we are to satisfy the "No Information Hiding" and "Humans First, Machines Second" principles of the microformat community. I find it frustrating that we still have people being sensitive about the bloat. I offer the challenge to those developers: If you sincerely believe that simple internationalized date parsing is an unsolvable or difficult problem (which, as I have pointed out has been solved numerous times already, with two examples), please present your evidence. Why is avoiding this work more important than Accessibility? Why is avoiding this work more important than avoiding hidden metadata? On Thu, Jul 3, 2008 at 6:04 AM, Ameer Dawood wrote: > Hi G, > Internationalization of metadata is a bad and ineffective concept. It would not only result bloat in pharsers, but also bloat in the data format itself. You are proposing to internationalize the machine readable date which is metadata (and not content, in this case). A very clear example would be; if we are to internationalize CSS then "border-color" would become "border-colour" in en-uk. It's like proposing this change with a lang attribute/element. > > Ameer > > _____________________________ > Sent from my phone using flurry - Get free mobile email and news at: http://www.flurry.com > > --- Original Message --- > Date: Wed Jul 02 09:43:00 PDT 2008 > From: Guillaume Lebleu > To: Microformats Discuss > Subject: Re: [uf-discuss] Human and machine readable data format > --- > > Michael MD wrote: >> Allowing language conventions for date parsing to be determined by anything "global" sounds a bit dangerous to me. >> >> Someone might post on a shared blog/forum site in a different country and mark it up in a way that does not match a lang attribute somewhere else on the page! >> >> also - who is going to say that all replies to the post or comments that might also appear on that same page are going to follow the same language rules > Sorry if I didn't express myself clearly. What I meant here was that a lang="..." attribute on the element of class vevent or dstart is recommended at all times (to deal with the very ambiguity you are referring to), but is optional to comply with DRY. If not present, its value may be inferred from the closest containing/ancestor element with a lang attribute, for instance a lang attribute value at the level of the html element. > > In other words, if I want to write my date in French in an en-us html document, I'd have to attach lang="fr" to my date or its containing content, but if I want to write my date in American English in the same document, I don't have to attach lang="en-us", although it wouldn't hurt to. > > Do you still see this as dangerous practice? > > G > _______________________________________________ > microformats-discuss mailing list > microformats-discuss@microformats.org > http://microformats.org/mailman/listinfo/microformats-discuss > > _______________________________________________ > microformats-discuss mailing list > microformats-discuss@microformats.org > http://microformats.org/mailman/listinfo/microformats-discuss > From zen at zenpsycho.com Wed Jul 2 19:06:24 2008 From: zen at zenpsycho.com (Breton Slivka) Date: Wed Jul 2 19:06:28 2008 Subject: [uf-discuss] Human and machine readable data format In-Reply-To: References: <17993138.377781215029040996.JavaMail.flurry@web1> Message-ID: "I honestly believe the "bloat" to > parsers would be significant" sorry, I meant "I Honestly believe the 'bloat' to parsers would _not_ be significant" From xbadosa at gmail.com Thu Jul 3 01:09:31 2008 From: xbadosa at gmail.com (Xavier Badosa) Date: Thu Jul 3 01:09:36 2008 Subject: [uf-discuss] Current state of grouping proposal In-Reply-To: <486BABBB.11260.1234965@bjonkman.sobac.com> References: <73b889410807010908m162c2117v907d9d56b9d228d8@mail.gmail.com> <486BABBB.11260.1234965@bjonkman.sobac.com> Message-ID: <73b889410807030109g8608cd8k7070773ba2a656c1@mail.gmail.com> > I think the grouping mechanism can be accomplished with XOXO, XOXO as it is now (or as I understand it) is based on list elements (ol ul dl), and these are not suited for "grouping" purposes, not in my sense at least: maybe I should say "classing" instead of "grouping" because in my meaning the idea of inheritance is important. List elements don't allow to associate data to the group itself. In my previous example, >>
>>

The Godfather >> II

>>
The best!
>>

Some guy

>>
>>
>>
Soooooo good!
>>

Enthusiastic >> girl

>>
"item" is associated with the group ("hset"), telling implicitly the machine that it must be replicated for every member ("hreview") of the "group" (or "class"). It's a sort of include-pattern mechanism that happens to follow the principle of adapting to current behaviors in the publication of reviews on webpages. I think you can't do that with XOXO. X. On Wed, Jul 2, 2008 at 10:24 PM, Bob Jonkman wrote: > I think the grouping mechanism can be accomplished with XOXO, > http://microformats.org/wiki/xoxo > > --Bob. > >>>> 1 Jul 2008 18:08 Xavier Badosa discuss@microformats.org> >>> > >> I'm a little confused about the current state of the grouping >> proposal. I'm not sure even if the uf-community is working on a >> general solution (a microformat for grouping any sort of items) (+1 >> vote) or a particular solution for some of the existing microformats >> (0 votes). >> >> I think some sort of grouping is needed in hReview if we want to >> follow the principle of adapting to current behaviors and usage >> patterns. Usually, webpages include more than one review for a single >> item. To solve this, hReview forces us: >> >> 1) to repeat an unnecessary hidden item for every hreview (this >> somehow violates the hidden (meta)data principle); >> >> or >> >> 2) to use the include-pattern (empty anchor, accessibility issues). >> >> A grouping mechanism could come to the rescue. Something like: >> >>
>>

The Godfather >> II

>>
The best!
>>

Some guy

>>
>>
>>
Soooooo good!
>>

Enthusiastic >> girl

>>
>>
>> >> could be interpreted by a parser that the same item should be >> associated with every hreview. In fact, a grouping microformat would >> be an alternative (easy to parse) include-pattern mechanism. >> >> X. > > > -- -- -- -- > Bob Jonkman http://sobac.com/sobac/ > SOBAC Microcomputer Services Voice: +1-519-669-0388 > 6 James Street, Elmira ON Canada N3B 1L5 Cel: +1-519-635-9413 > Software --- Office & Business Automation --- Consulting > > > _______________________________________________ > microformats-discuss mailing list > microformats-discuss@microformats.org > http://microformats.org/mailman/listinfo/microformats-discuss > From danbri at danbri.org Thu Jul 3 02:04:10 2008 From: danbri at danbri.org (Dan Brickley) Date: Thu Jul 3 02:04:17 2008 Subject: [uf-discuss] Human and machine readable data format In-Reply-To: References: <17993138.377781215029040996.JavaMail.flurry@web1> Message-ID: <486C960A.5080305@danbri.org> Breton Slivka wrote: > I offer the challenge to those developers: If you sincerely believe > that simple internationalized date parsing is an unsolvable or > difficult problem (which, as I have pointed out has been solved > numerous times already, with two examples), please present your > evidence. Why is avoiding this work more important than Accessibility? > Why is avoiding this work more important than avoiding hidden > metadata? The examples you gave (ecmascript, spreadsheets) relate to the interpretation of a single simple date string. Much of the discussion here has instead been about the interpretation of marked up paragraphs of natural language prose where dates are mentioned. The former is a big enough job, as you point out. But the latter is a substantially larger undertaking. Imagine the English language permutations of "Tuesday the forteenth of July, next year" in terms of word order. Then allow for all natural languages (in all written scripts). And don't forget we use a variety of calendars. Big job. In theory it could be attempted; but the culture around here is averse to 'theoretical' solutions. While there is value in minimising "hidden metadata", this isn't an all or nothing decision. Having the data within the HTML document itself is already a big win in many cases, compared to putting it in a separate XML file. Having the data local to the paragraph within the HTML document (rather than in the head section) is also a major achievement w.r.t. maintainability. Both of these factors reduce the hiddenness of data; putting info in an attribute is not the end of the world. Given the other tradeoffs, I think it should be seriously considered. cheers, Dan -- http://danbri.org/ From zen at zenpsycho.com Thu Jul 3 05:39:32 2008 From: zen at zenpsycho.com (Breton Slivka) Date: Thu Jul 3 05:39:36 2008 Subject: [uf-discuss] Human and machine readable data format In-Reply-To: <486C960A.5080305@danbri.org> References: <17993138.377781215029040996.JavaMail.flurry@web1> <486C960A.5080305@danbri.org> Message-ID: On Thu, Jul 3, 2008 at 7:04 PM, Dan Brickley wrote: > Breton Slivka wrote: > >> I offer the challenge to those developers: If you sincerely believe >> that simple internationalized date parsing is an unsolvable or >> difficult problem (which, as I have pointed out has been solved >> numerous times already, with two examples), please present your >> evidence. Why is avoiding this work more important than Accessibility? >> Why is avoiding this work more important than avoiding hidden >> metadata? > Imagine the English language permutations of "Tuesday the forteenth of July, > next year" in terms of word order. Then allow for all natural languages (in > all written scripts). And don't forget we use a variety of calendars. Big > job. In theory it could be attempted; but the culture around here is averse > to 'theoretical' solutions. > Once again this straw man is trotted out. Who is discussing this type of solution other than to specifically discredit the approach as too hard? I certainly am not suggesting this kind of wide ranging natural language parser. I haven't seen anyone else seriously suggesting it It's a foolish undertaking, and it's obviously a foolish undertaking. Then WHY OH WHY does this keep being brought up as though it were being seriously discussed? Where does this idea keep popping out from? Let me give an example in pseudocode of a parser that would work, and would be simple to write, and whose format could be read by a screen reader. function parser ( datestring, locale ) { en-months = [January, February, March, April, May, June, July, August, September, October, November, December] if locale === "en-us" dateparse[month, day, year] = regex(datestring, "([A-Za-z]+) ([1-3]?[0-9])s|n|r|tt|d|h, ([0-9]{1, 4})); if locale === "en-au" dateparse[day, month, year] = regex(datestring, "([1-3]?[0-9])s|n|r|tt|d|h ([A-Za-z]+), ([0-9]{1, 4})); if locale === "en-uk" dateparse[day, month, year] = regex(datestring, "([1-3]?[0-9])s|n|r|tt|d|h ([A-Za-z]+), ([0-9]{1, 4})); if locale.contains("en") dateparse.month = en-months.indexOf(dateparse.month); return dateparse AS [year, month, day]; } This is a simple example. There are likely better techniques for doing this than regexes, (or not) but the point is, that you can make a human READABLE format without having to cover the whole spectrum of human expression. Instead, you have ONE precise format for US dates, ONE precise format for UK dates, ONE precise format for japanese dates, etc, etc. You stick this format of date in the title of an ABBR, and you can say whatever you want about the date in whatever language you like in the contents of the ABBR. The parser shouldn't care about the contents. IT's just looking at the title. IT already is. The only change from the current pattern is that we'd be using a less geeky and obscure format than ISO-8601. The lang attribute of the ABBR element provides the format in use. Honestly how difficult is it for a parser author to collect one format for each locale? I've seen far more heroic efforts on simpler things. How difficult is it for content publishers to learn ONE format? (The one for their own locale) ? How difficult is it to ask content authors to learn a format like this? We're already asking them to learn a more difficult format! Yes it's more complicated than parsing ISO 8601. But it's not boiling the ocean. This isn't a binary decision we're facing. It's not a choice between "I could implement it in an hour" level of simplicity and "Human level" AI. Comprimise has to be made if we are to make any progress. From qidydl at gmail.com Thu Jul 3 07:04:51 2008 From: qidydl at gmail.com (David O) Date: Thu Jul 3 07:04:55 2008 Subject: [uf-discuss] Human and machine readable data format In-Reply-To: References: <17993138.377781215029040996.JavaMail.flurry@web1> <486C960A.5080305@danbri.org> Message-ID: On Thu, Jul 3, 2008 at 8:39 AM, Breton Slivka wrote: > On Thu, Jul 3, 2008 at 7:04 PM, Dan Brickley wrote: >> Breton Slivka wrote: >> >>> I offer the challenge to those developers: If you sincerely believe >>> that simple internationalized date parsing is an unsolvable or >>> difficult problem (which, as I have pointed out has been solved >>> numerous times already, with two examples), please present your >>> evidence. Why is avoiding this work more important than Accessibility? >>> Why is avoiding this work more important than avoiding hidden >>> metadata? > This is a simple example. There are likely better techniques for doing > this than regexes, (or not) but the point is, that you can make a > human READABLE format without having to cover the whole spectrum of > human expression. Instead, you have ONE precise format for US dates, > ONE precise format for UK dates, ONE precise format for japanese > dates, etc, etc. You stick this format of date in the title of an > ABBR, and you can say whatever you want about the date in whatever > language you like in the contents of the ABBR. The parser shouldn't > care about the contents. IT's just looking at the title. IT already > is. The only change from the current pattern is that we'd be using a > less geeky and obscure format than ISO-8601. The lang attribute of the > ABBR element provides the format in use. http://en.wikipedia.org/wiki/List_of_languages_by_name http://en.wikipedia.org/wiki/List_of_ISO_639-2_codes http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 Feel free to get started. I'm sure you can start a wiki page with a listing of language/region codes and the suggested date format for each. Since the current system handles every one of those languages and countries/regions, it would only be logical to expect the same of a suggested replacement. From Scott at randomchaos.com Thu Jul 3 09:03:08 2008 From: Scott at randomchaos.com (Scott Reynen) Date: Thu Jul 3 09:03:14 2008 Subject: [uf-discuss] Human and machine readable data format In-Reply-To: <486BCAD1.20690.19CB840@bjonkman.sobac.com> References: <36A319113CF910438942741C4727ADFF02132B0D@MOBY.Clarence.local>, , <486BCAD1.20690.19CB840@bjonkman.sobac.com> Message-ID: <2708D12A-BE7C-4E75-9028-AA23E2A0B19D@randomchaos.com> On [Jul 2], at [ Jul 2] 4:37 , Bob Jonkman wrote: >> The difference with ISO dates is we've previously defined them as >> content; I'm suggesting that's a mistaken definition, as these dates >> don't function as content in our reference standard iCalendar. > > I disagree. In an appointment, the date IS the content. *A* date is, but not the ISO date. I think that's a subtle but important distinction we've overlooked too often. You never see ISO dates presented to (nor entered by) people in applications that work with iCalendar. They're only used to *produce* content. I think HTML entities are probably the closest analogy. The entities themselves are not the content; they're merely used to produce the content in various contexts (i.e. character sets). We don't display entities; we only display the content they're used (by machines) to produce. If we recognize that ISO dates are the same type of information ("metadata" or whatever you want to call it), then not displaying them isn't a compromise; it's just the obvious way to treat that type of information, the same way it's treated everywhere else. Peace, Scott From guillaume at lebleu.org Thu Jul 3 09:54:51 2008 From: guillaume at lebleu.org (Guillaume Lebleu) Date: Thu Jul 3 09:54:55 2008 Subject: [uf-discuss] Human and machine readable data format In-Reply-To: <486BCAD1.20690.19CB840@bjonkman.sobac.com> References: <36A319113CF910438942741C4727ADFF02132B0D@MOBY.Clarence.local>, , <486BCAD1.20690.19CB840@bjonkman.sobac.com> Message-ID: <486D045B.60009@lebleu.org> Bob Jonkman wrote: > > > > tomorrow > > Bob, assuming that screen readers only read out the content of abbr's @title, this solution looks promising (I've tried with VoiceOver, but the title content is ignored.) The only problem of course is for human content authors who are effectively asked to write the same information 3 times in 3 different formats (not very DRY)! I just don't see myself doing that manually. For this to work, I'd expect at least an extra button in my HTML editor to tag "tomorrow" as 2008-07-23 by selecting a date in a calendar widget, or better, for my HTML editor to detect some of these date shortcuts automatically for me, and suggest machine data for them, which I can confirm before publishing, something similar to [1]. It seems to me that it would be a practical way to distribute the CPU-intensive task of semantically tagging Web content [3]. BTW, on the use of abbr for dates, I've researched a number of style guides such as [2]. It seems that "2/03/2005" is legitimate as an abbreviated form of the inline format "February 3, 2005". So, 2/03/2005 seems correct, but February 3, 2005 isn't (at least according to the style guide below). Guillaume [1] http://wordpress.org/extend/plugins/yahoo-shortcuts/ [2] http://web.mit.edu/comdor/editguide/style-matters/date_time.html#dates [3] http://gigaom.com/2008/07/02/the-real-reason-powerset-sold-out/ From jim at eatyourgreens.org.uk Thu Jul 3 15:03:35 2008 From: jim at eatyourgreens.org.uk (Jim O'Donnell) Date: Thu Jul 3 15:03:42 2008 Subject: [uf-discuss] hoard.it Message-ID: <07843653-3749-4C33-97CF-95A4BAC93710@eatyourgreens.org.uk> Hello, This might be of interest to members of this group, as it deals with extracting data from semantic HTML. Prior to this year's Mashed Museum event at the University of Leicester, Dan Zambonini put together a prototype which aggregates data by spidering online museum catalogues: http://hoardit.pbwiki.com/ It's a pretty fantastic demo of how information can be extracted from well-structured HTML, even before you think of putting microformats etc. on top. In particular, it does a pretty good job of figuring out when an object was made: http://feeds.boxuk.com/museums/object_100yrs.php The date parser is based on some code Dan & I knocked together at Mashed Museum 2007, which looks at strings like 'late Victorian', 'early 20th Century', '4th January 1853' and so on, and converts them to machine-readable ISO dates. Our original idea, which we never got round to actually implementing, was that this would be useful as a web service - you give it a string, it gives you a machine-parsable representation of that string. The recent discussion here about dates has made me wonder if such a web service woud be useful for microformats parsers. What do others think? Cheers Jim Jim O'Donnell jim@eatyourgreens.org.uk http://eatyourgreens.org.uk http://flickr.com/photos/eatyourgreens From bjonkman at sobac.com Sat Jul 5 12:07:30 2008 From: bjonkman at sobac.com (Bob Jonkman) Date: Sat Jul 5 12:09:58 2008 Subject: [uf-discuss] Human and machine readable data format In-Reply-To: <2708D12A-BE7C-4E75-9028-AA23E2A0B19D@randomchaos.com> References: <36A319113CF910438942741C4727ADFF02132B0D@MOBY.Clarence.local>, <486BCAD1.20690.19CB840@bjonkman.sobac.com>, <2708D12A-BE7C-4E75-9028-AA23E2A0B19D@randomchaos.com> Message-ID: <486F8E32.14417.482D3F8@bjonkman.sobac.com> On 3 Jul 2008 at 10:03, Scott Reynen wrote: > On [Jul 2], at [ Jul 2] 4:37 , Bob Jonkman wrote: > > > In an appointment, the date IS the content. > > *A* date is, but not the ISO date. I think that's a subtle but > important distinction we've overlooked too often. You never see ISO > dates presented to (nor entered by) people in applications that work > with iCalendar. They're only used to *produce* content. I think HTML > entities are probably the closest analogy. The entities themselves > are not the content; they're merely used to produce the content in > various contexts (i.e. character sets). We don't display entities; we > only display the content they're used (by machines) to produce. If > we recognize that ISO dates are the same type of information > ("metadata" or whatever you want to call it), then not displaying > them isn't a compromise; it's just the obvious way to treat that type > of information, the same way it's treated everywhere else. In that case it should be acceptable avoid the use of altogether, so that neither sighted nor hearing people have to put up with seeing or hearing the metadata. tomorrow The title text still shows a popup in my browser (FF3), but I don't believe screen readers speak it. It also doesn't distract sighted users since a element is by default undecorated, while shows with a dotted underline in FF3. However, styling is dependent on the browser implemention and can always be specified with CSS anyway. I believe that an ISO date is a valid expansion of prosaic dates, so that is less semantic than using tomorrow but that debate appears to have no resolution and I'm willing to cede just to move along. --Bob. -- -- -- -- Bob Jonkman http://sobac.com/sobac/ SOBAC Microcomputer Services Voice: +1-519-669-0388 6 James Street, Elmira ON Canada N3B 1L5 Cel: +1-519-635-9413 Software --- Office & Business Automation --- Consulting From bjonkman at sobac.com Sat Jul 5 13:15:44 2008 From: bjonkman at sobac.com (Bob Jonkman) Date: Sat Jul 5 13:17:09 2008 Subject: [uf-discuss] Human and machine readable data format In-Reply-To: <486D045B.60009@lebleu.org> References: <36A319113CF910438942741C4727ADFF02132B0D@MOBY.Clarence.local>, <486BCAD1.20690.19CB840@bjonkman.sobac.com>, <486D045B.60009@lebleu.org> Message-ID: <486F9E30.15175.4C14ADD@bjonkman.sobac.com> On 3 Jul 2008 at 9:54, Guillaume Lebleu wrote: > Bob, assuming that screen readers only read out the content of abbr's > @title, this solution looks promising (I've tried with VoiceOver, but > the title content is ignored.) > > The only problem of course is for human content authors who are > effectively asked to write the same information 3 times in 3 different > formats (not very DRY)! Agreed. So, based on Scott Reynen's observation that these 'date entities' don't need to be displayed (either visually or aurally) I propose that we dispense with the tag altogether (and, IMHO, the semantic value of the date expansion). We move on, the BBC publishes hCalendar again, and someone gets around to developing a genealogy microformat now that the date issue is settled. > BTW, on the use of abbr for dates, I've researched a number of style > guides such as [2]. It seems that "2/03/2005" is legitimate as an > abbreviated form of the inline format "February 3, 2005". > [2] > http://web.mit.edu/comdor/editguide/style-matters/date_time.html#dates I'm not sure that any particular style guide is authoritative. I had a look around some other sources, and while they mostly agree there's enough variation to make any date-parser author shudder in fear. A most disturbing trend is the use of spelled out dates, eg. "the sixth of July 2008" [1]. A humourous aside: I create computer systems validation documentation for a European consulting firm. Oddly enough, they've decided on the American date format MM/DD/YY for all their systems documentation, not the ISO date standard. My documents are constantly being returned to me for invalid dates -- my first inclination is to always write the date as YYYY-MM-DD, and DD/MM/YYYY as a second inclination. Even MM/DD/YYYY gets returned as an invalid date. Participation in the Microformats community hasn't helped my professional career :-) --Bob. [1] National Geographic Style Manual: DATES http://stylemanual.ngs.org/Intranet/styleman.nsf/024cc3c609acdb02852 56648004af446/f0d90cec94e539c78525668a006dacd0?OpenDocument or http://natgeodatestyle.notlong.com for the word-wrap challenged. -- -- -- -- Bob Jonkman http://sobac.com/sobac/ SOBAC Microcomputer Services Voice: +1-519-669-0388 6 James Street, Elmira ON Canada N3B 1L5 Cel: +1-519-635-9413 Software --- Office & Business Automation --- Consulting From lists at ben-ward.co.uk Sun Jul 6 06:22:08 2008 From: lists at ben-ward.co.uk (Ben Ward) Date: Sun Jul 6 06:22:36 2008 Subject: [uf-discuss] Wiki Documentation of recent date-time discussion Message-ID: <61AB090A-75B8-45C4-B582-AE461389B50E@ben-ward.co.uk> Hi all, Recently discussion of solutions to the datetime issues has been massive and become difficult to track the current state of issues and counterpoints as threads have become interleaved. I have *attempted* to document the most recent points on the wiki, under the following pages: * http://microformats.org/wiki/datetime-design-pattern (most stuff) * http://microformats.org/wiki/hcalendar-issues#2008 (for the HTML5