From lists at ben-ward.co.uk  Fri Jun  6 11:37:20 2008
From: lists at ben-ward.co.uk (Ben Ward)
Date: Fri Jun  6 16:08:08 2008
Subject: [uf-dev] Defining and Extending Value Excepting
In-Reply-To: <36A319113CF910438942741C4727ADFF01E97814@MOBY.Clarence.local>
References: <FC7E8DA5-2AB8-4D41-AFFF-1D4B0049D389@tobyinkster.co.uk>
	<25D51B07-201C-48F3-A1F6-8B2909B88B15@ben-ward.co.uk>
	<36A319113CF910438942741C4727ADFF01E97814@MOBY.Clarence.local>
Message-ID: <8AA2B099-6FAA-43FC-B33E-C81326A95DE5@ben-ward.co.uk>

Hey guys,

I've tried to move this on a bit.

I've clarified the ?no nesting value inside value? discussion under  
the parsing bullet points, see http://microformats.org/wiki/value-excerption-pattern

I've also moved the ?parsing to-do? section off that page and pushed  
it onto a proper -issues page for the pattern. I've restructured that  
from the discussions, so we've something to focus on there now.

Since it's better organised now, and by extension better organised for  
wider feedback, I'm going to publicise the existence of these pages on  
uf-discuss and invite the wider community to raise other issues.

Concerning the current open issues, I'd like to draw your attentions  
to my most recent notes on them, see what you think.

* Excluded Fields (http://microformats.org/wiki/value-excerption-pattern-issues#Excluded_Fields 
)

We've been thinking in terms of excluding particular fields from being  
used with value excerpting. What if we flipped it? Make it opt-in for  
particular fields? Have each spec clarify ?this field _may_  be used  
with value excerpting. That way, large fields like hAtom's entry- 
title, where value-excerpting has no (ahem) ?value?, won't be affected  
by it _and_ this would actually allow many of the problems with  
nesting microformats to be avoided without need for an ?mfo?-like  
class processing instruction.

* Depth of Parsing (http://microformats.org/wiki/value-excerption-pattern-issues#Depth_of_Parsing 
)

Currently parsing all descendants can cause the nested-microformat- 
value-overwriting-potential-world-of-pain issue, an MKaply seemed to  
think he'd seen documentation that restrcting value excerpting to  
children only.

Options are the mfo processing instruction proposal, which I dislike  
because it adds a processing instruction into an element which should  
be about the semantics of the content, not ?how to parse this?, we  
could restrict it to children only, which I suspect could break lots  
of hCards TEL fields, or perhaps the third option I've added today,  
which is to specify parsing children only as the default behaviour,  
but allow individual properties to override this to all descendants.  
Properties like TEL, where it's reasonable that it will be at the  
outer edge of a DOM tree, could permit all descendants to be parsed as  
value.


Both excluding fields and parsing grandchildren add optionality to  
particular fields, from a parsing POV I see it as a set of switches  
when parsing each field, which can at least be clearly defined. Does  
that seem reasonable?

As always, feedback most welcome and requested!

B
From lists at ben-ward.co.uk  Wed Jun 11 13:26:26 2008
From: lists at ben-ward.co.uk (Ben Ward)
Date: Wed Jun 11 14:10:46 2008
Subject: [uf-dev] How do we (want to) document parsing?
Message-ID: <BFDB42F7-8822-4B95-A126-9BBCEBC03BF0@ben-ward.co.uk>

Parser devs,

I've been carrying on work on speccing value-excerpting, I'm keen that  
we set a good example of specifying parsing rules with this, with a  
view to requiring a higher standard in future and also going back to  
better spec the other patterns and microformats.

To be honest, I'm underqualified for this. Actually, wait, that's not  
true, I'm amply qualified but haven't applied any of my knowledge of  
representing processes and so forth in the real world. Anyway,  
digression.

I have, for better, worse or more likely embarrassment, put together a  
shoddy flow chart of how parsing of the value-excerption-pattern could  
work, factoring in the open issue of parsing @titles from empty  
elements (I'm working on the issues one at a time).

We don't have uploading enabled on the wiki, so it's here: -ward.co.uk/ 
microformats/value-excerption-pattern/ValueExcerptionParseFlowChart.png

My question is simple, in creating it I came across one open issue  
with the parsing flow, so it's been useful to do, but I need to know  
is it actually useful documentation  in itself? Would you refer to  
something diagrammatic when implementing a parser? Or is there some  
other, better (perhaps more Wiki compatible) means of representing  
parsing rules and method branching that we should adopt? Would pseudo- 
code be sufficient?

I know test cases are also a big thing, and I'll produce some of those  
as well as I work through the issue log.

Thanks,

Ben
From aconbere at gmail.com  Wed Jun 11 14:31:29 2008
From: aconbere at gmail.com (anders conbere)
Date: Wed Jun 11 15:28:11 2008
Subject: [uf-dev] How do we (want to) document parsing?
In-Reply-To: <BFDB42F7-8822-4B95-A126-9BBCEBC03BF0@ben-ward.co.uk>
References: <BFDB42F7-8822-4B95-A126-9BBCEBC03BF0@ben-ward.co.uk>
Message-ID: <8ca3fbe80806111431j53544bdek5620f7ad87f3394a@mail.gmail.com>

On Wed, Jun 11, 2008 at 1:26 PM, Ben Ward <lists@ben-ward.co.uk> wrote:
> Parser devs,
>
> I've been carrying on work on speccing value-excerpting, I'm keen that we
> set a good example of specifying parsing rules with this, with a view to
> requiring a higher standard in future and also going back to better spec the
> other patterns and microformats.
>
> To be honest, I'm underqualified for this. Actually, wait, that's not true,
> I'm amply qualified but haven't applied any of my knowledge of representing
> processes and so forth in the real world. Anyway, digression.
>
> I have, for better, worse or more likely embarrassment, put together a
> shoddy flow chart of how parsing of the value-excerption-pattern could work,
> factoring in the open issue of parsing @titles from empty elements (I'm
> working on the issues one at a time).
>
> We don't have uploading enabled on the wiki, so it's here:
> -ward.co.uk/microformats/value-excerption-pattern/ValueExcerptionParseFlowChart.png

I'm not getting an image back here.

>
> My question is simple, in creating it I came across one open issue with the
> parsing flow, so it's been useful to do, but I need to know is it actually
> useful documentation  in itself? Would you refer to something diagrammatic
> when implementing a parser? Or is there some other, better (perhaps more
> Wiki compatible) means of representing parsing rules and method branching
> that we should adopt? Would pseudo-code be sufficient?

I've been a big fan of representing the parsing rules in terms of
claim or triples. This is how rdf describes it's parsing rules, and
allows for easily codified tests.

~ Anders

>
> I know test cases are also a big thing, and I'll produce some of those as
> well as I work through the issue log.
>
> Thanks,
>
> Ben
> _______________________________________________
> microformats-dev mailing list
> microformats-dev@microformats.org
> http://microformats.org/mailman/listinfo/microformats-dev
>
From scott at randomchaos.com  Wed Jun 11 17:58:53 2008
From: scott at randomchaos.com (Scott Reynen)
Date: Wed Jun 11 17:59:04 2008
Subject: [uf-dev] How do we (want to) document parsing?
In-Reply-To: <8ca3fbe80806111431j53544bdek5620f7ad87f3394a@mail.gmail.com>
References: <BFDB42F7-8822-4B95-A126-9BBCEBC03BF0@ben-ward.co.uk>
	<8ca3fbe80806111431j53544bdek5620f7ad87f3394a@mail.gmail.com>
Message-ID: <701EEC62-84BD-4F4B-9559-2C40504A599D@randomchaos.com>

On [Jun 11], at [ Jun 11] 3:31 , anders conbere wrote:

>> We don't have uploading enabled on the wiki, so it's here:
>> -ward.co.uk/microformats/value-excerption-pattern/ 
>> ValueExcerptionParseFlowChart.png
>
> I'm not getting an image back here.

I believe it's here:

http://ben-ward.co.uk/microformats/value-excerption-pattern/ValueExcerptionParseFlowChart.png

> I've been a big fan of representing the parsing rules in terms of
> claim or triples. This is how rdf describes it's parsing rules, and
> allows for easily codified tests.

I'm not clear on how that would work with microformats.  I can see how  
triples could be used for testing, as the result a parser should get,  
but I'm not clear on how they could be used for describing the process  
by which a parser should arrive at that result, which seems to be what  
Ben is seeking.  If you still think that would work after looking at  
Ben's flow chart, could you maybe translate it into triples as a  
demonstration?

Peace,
Scott

From brian.suda at gmail.com  Thu Jun 12 01:21:08 2008
From: brian.suda at gmail.com (Brian Suda)
Date: Thu Jun 12 01:21:15 2008
Subject: [uf-dev] How do we (want to) document parsing?
In-Reply-To: <701EEC62-84BD-4F4B-9559-2C40504A599D@randomchaos.com>
References: <BFDB42F7-8822-4B95-A126-9BBCEBC03BF0@ben-ward.co.uk>
	<8ca3fbe80806111431j53544bdek5620f7ad87f3394a@mail.gmail.com>
	<701EEC62-84BD-4F4B-9559-2C40504A599D@randomchaos.com>
Message-ID: <21e770780806120121l57de8fcagf190d68cf31be257@mail.gmail.com>

On Thu, Jun 12, 2008 at 12:58 AM, Scott Reynen <scott@randomchaos.com> wrote:
> http://ben-ward.co.uk/microformats/value-excerption-pattern/ValueExcerptionParseFlowChart.png

--- i had a look at the flow chart and found a few things that i think
should be fixed and a few that i disagree with.

(maybe we should number these nodes so it is easier to reference?)

1) I don't think values should be concatenated with a unicode char
0020 (a space). If there was intention to add white-space then those
should be part of the value. We should not introduce additional
information that was not explicitly marked-up.

2) If the value contains no inner-text, then use the @title. I think
this was a proposal, but until we get more feedback it probably should
not be part of our paring rules. What would be the semantics in that?
I know this is an attempt at a worker-a-round, but i don't think it
should be included in these parsing rules until we discuss it further.
TIDY still has bugs (or maybe it is a feature) with empty nodes.

Also, i don?t know if this chart can handle or should handle nested
values? did we make a decision that nested value properties were to be
ignored?

Great work Ben, this is much easier for people to understand than a
series of bullet points.

Thanks,
-brian

-- 
brian suda
http://suda.co.uk

From lists at ben-ward.co.uk  Thu Jun 12 02:36:52 2008
From: lists at ben-ward.co.uk (Ben Ward)
Date: Thu Jun 12 02:36:59 2008
Subject: Value Excerption Pattern Parsing (was: [uf-dev] How do we (want to)
	document parsing?)
In-Reply-To: <21e770780806120121l57de8fcagf190d68cf31be257@mail.gmail.com>
References: <BFDB42F7-8822-4B95-A126-9BBCEBC03BF0@ben-ward.co.uk>
	<8ca3fbe80806111431j53544bdek5620f7ad87f3394a@mail.gmail.com>
	<701EEC62-84BD-4F4B-9559-2C40504A599D@randomchaos.com>
	<21e770780806120121l57de8fcagf190d68cf31be257@mail.gmail.com>
Message-ID: <D8F86ADD-77B2-4EA6-965D-587A3BFAD812@ben-ward.co.uk>

On 12 Jun 2008, at 09:21, Brian Suda wrote:

> On Thu, Jun 12, 2008 at 12:58 AM, Scott Reynen  
> <scott@randomchaos.com> wrote:
>> http://ben-ward.co.uk/microformats/value-excerption-pattern/ValueExcerptionParseFlowChart.png
>
> --- i had a look at the flow chart and found a few things that i think
> should be fixed and a few that i disagree with.

Disagreement is fine and very welcome. This is all draft, in progress  
work :-) Fundamentally, I'm keen to establish _how_ we represent this  
sort of process going forward, with the complete understanding that  
the detail of this current diagram can and will change.

> (maybe we should number these nodes so it is easier to reference?)

Could do, although http://microformats.org/wiki/value-excerption-pattern-issues 
  provides numbering of sorts so perhaps refer to those for now?

> 1) I don't think values should be concatenated with a unicode char
> 0020 (a space). If there was intention to add white-space then those
> should be part of the value. We should not introduce additional
> information that was not explicitly marked-up.

The open issue is: http://microformats.org/wiki/value-excerption-pattern-issues#White-space_behaviour_when_concatenating_value_nodes 
.

Seems reasonable. The default case I was thinking of at the time was  
actually somewhat muddled with concatenating repeat properties: e.g.  
additional-name properties in hCard, which would want to be space- 
separated.

For value, I now lean toward agreeing with you, in so far as  
regardless of number of segments, we're still marking up a single ?f  
property, rather than multiple occurrences of the same ?f property.

> 2) If the value contains no inner-text, then use the @title. I think
> this was a proposal, but until we get more feedback it probably should
> not be part of our paring rules. What would be the semantics in that?
> I know this is an attempt at a worker-a-round, but i don't think it
> should be included in these parsing rules until we discuss it further.

This (http://microformats.org/wiki/value-excerption-pattern-issues#Parsing_title_from_Empty_value_Elements 
) is the open issue I'm currently working on, and building the diagram  
was development exercise to clarify how it could be parsed.

The semantics are a little tricky, because we're working with the fact  
that HTML does not have a native means of doing this. I think it's  
definable, though, so will have a go later.

> TIDY still has bugs (or maybe it is a feature) with empty nodes.

It does, and for some reason dropping empty elements is not a feature  
that can be switched off at the command line like other behaviours.

However, I've found it's trivial to compile a version of tidy with  
?don't drop empty elements with class names? behaviour added, and will  
submit it as a patch when I get time. That said, even without the  
patch making it back into the Tidy trunk any time soon, the fact is  
that Tidy can be made to work with an empty element technique.

I've documented that on the -issues page (http://microformats.org/wiki/value-excerption-pattern-issues#Parsing_title_from_Empty_value_Elements 
).

> Also, i don?t know if this chart can handle or should handle nested
> values? did we make a decision that nested value properties were to be
> ignored?

The reaction was negative, and you pointed out that from a publisher  
point-of-view nesting value in value was unnecessary; there's seems to  
be no reason to do it. So (http://microformats.org/wiki/value-excerption-pattern-issues#Nested_value 
) I closed that issue and intend that we spec the pattern not to act  
recursively.

> Great work Ben, this is much easier for people to understand than a
> series of bullet points.

That's my intention. I think there's a lot of potential to explore  
better ways of documenting parsing rules.

B
From mkaply at us.ibm.com  Thu Jun 12 07:30:50 2008
From: mkaply at us.ibm.com (Michael Kaply)
Date: Thu Jun 12 07:55:12 2008
Subject: Value Excerption Pattern Parsing (was: [uf-dev] How do we (want
	to) document parsing?)
In-Reply-To: <D8F86ADD-77B2-4EA6-965D-587A3BFAD812@ben-ward.co.uk>
Message-ID: <OF159D155C.BBC224BC-ON86257466.004F85F2-86257466.004FBA72@us.ibm.com>

Do we want to do a final collapsing of whitespace after everything is
concatenated?

Like if the values are:

<value>                       Michael                      </value><value>
Kaply


</value>

It would end up as:

<>Michael Kaply<>

Note that

<value>                       Michael</value><value>Kaply       </value>

would be

<>MichaelKaply<>

Because there is no whitespace between Michael and Kaply

This would be similar to how we clean up whitespace in other properties.

Michael Kaply
Firefox Advocate
mkaply@us.ibm.com
http://www.kaply.com/weblog/ (External Blog)
http://blogs.tap.ibm.com/weblogs/page/mkaply@us.ibm.com (Internal Blog)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080612/bcc290ad/attachment.html
From brian.suda at gmail.com  Fri Jun 13 01:14:31 2008
From: brian.suda at gmail.com (Brian Suda)
Date: Fri Jun 13 01:14:42 2008
Subject: Value Excerption Pattern Parsing (was: [uf-dev] How do we (want
	to) document parsing?)
In-Reply-To: <OF159D155C.BBC224BC-ON86257466.004F85F2-86257466.004FBA72@us.ibm.com>
References: <D8F86ADD-77B2-4EA6-965D-587A3BFAD812@ben-ward.co.uk>
	<OF159D155C.BBC224BC-ON86257466.004F85F2-86257466.004FBA72@us.ibm.com>
Message-ID: <21e770780806130114s51a11e5sbc3f716018a74d06@mail.gmail.com>

On 6/12/08, Michael Kaply <mkaply@us.ibm.com> wrote:
> Do we want to do a final collapsing of whitespace after everything is
> concatenated?

--- that's a good question? right now this is what i do (not to say it
is the best way)

>  Like if the values are:
>
>  <value> Michael </value><value> Kaply
>
>
>
>  </value>
>
>  It would end up as:
>
>  <>Michael Kaply<>

--- in my case i would take the full first value " Michael " and
concatenate that with " Kaply


" then do a trim on the result. So i would drop the leading and
trailing white-space, but preserve the double space in the middle of
the name. My result would be "Michael  Kaply" it is debatable if that
is correct or not.

>  Note that
>
>  <value> Michael</value><value>Kaply </value>
>
>  would be
>
>  <>MichaelKaply<>

--- correct. I don't personally like the idea of adding a space by
default. I tend to use value for things like phone numbers or email

<value>123</value>-ABCD(<value>5678</value>)
<value>brian</value>do_not_spam_me<value>@suda.co.uk</value>

adding a space by default would be incorrect in these instances.

>  This would be similar to how we clean up whitespace in other properties.

--- most of my value output is concatenated, then something like
trim() is applied and only removed the leading and trailing
white-space, but preserves any internal white-space. If the person is
explicitly adding the spacing into the values, then we should probably
honor that.

-brian

-- 
brian suda
http://suda.co.uk
From rff.rff at gmail.com  Mon Jun 16 06:27:51 2008
From: rff.rff at gmail.com (gabriele renzi)
Date: Mon Jun 16 06:27:55 2008
Subject: [uf-dev] badly formatted hCard? (elements in children of class)
Message-ID: <828083e70806160627n464bb71l3338373854e58e94@mail.gmail.com>

Hi everyone,

given this code

div class="vcard">
  <em>
    <span class="url fn">
      <a href="http://privpages.de">Melanie Kl??</a>
    </span>
    <span class="adr">
      <span class="type" style="display:none;">home</span>
      <span class="street-address">Ippendorfer Weg. 24</span><br />
      <span class="postal-code">53127</span> <span
class="locality">Bonn</span><br />
      <span class="country-name" style="display:none;">Germany</span>
  </span>
 </em>
</div>


found online[1], am I correct in asuming that it is badly formatted?
Specifically, the "url" property is in a SPAN element, so I would use
"Melanie Kl??" as its value, but I take it that the author wanted me
to use the child element, so the href value of the A tag.

Other uF parsers seem to handle this the same way that I do: look at
the element with that class, not at its children, thus extracting a
name as a url value.
Is this behaviour correct, or shall I do it differently (and report
bugs to other uf-parser authors) ?

Thanks in advance.


[1] http://weblog.netzgeschaedigt.de/?p=763

-- 
goto 10: http://www.goto10.it
blog it: http://riffraff.blogsome.com
blog en: http://www.riffraff.info

From brian.suda at gmail.com  Mon Jun 16 08:47:40 2008
From: brian.suda at gmail.com (Brian Suda)
Date: Mon Jun 16 08:47:46 2008
Subject: [uf-dev] badly formatted hCard? (elements in children of class)
In-Reply-To: <828083e70806160627n464bb71l3338373854e58e94@mail.gmail.com>
References: <828083e70806160627n464bb71l3338373854e58e94@mail.gmail.com>
Message-ID: <21e770780806160847h374e18cblbff2c882a280b381@mail.gmail.com>

On 6/16/08, gabriele renzi <rff.rff@gmail.com> wrote:
>  given this code [..]  found online[1], am I correct in asuming that it is badly formatted?
>  Specifically, the "url" property is in a SPAN element, so I would use
>  "Melanie Kl??" as its value, but I take it that the author wanted me
>  to use the child element, so the href value of the A tag.

--- you are correct in you parsing rules, if the class="url" is on the
span, then it would use "Melanie Kl??" and if you did want the HTTP
value, you would need to move the class="url" onto the 'a' element.

>  Is this behaviour correct, or shall I do it differently (and report
>  bugs to other uf-parser authors) ?

--- you parsing and the parsing of other parsers is correct, it seems
to be an issue on the website[1], it would be best to report the issue
to them and help get it corrected.

Thanks for spotting that one,
-brian

>  [1] http://weblog.netzgeschaedigt.de/?p=763


-- 
brian suda
http://suda.co.uk

From rff.rff at gmail.com  Mon Jun 16 09:05:43 2008
From: rff.rff at gmail.com (gabriele renzi)
Date: Mon Jun 16 09:05:50 2008
Subject: [uf-dev] badly formatted hCard? (elements in children of class)
In-Reply-To: <21e770780806160847h374e18cblbff2c882a280b381@mail.gmail.com>
References: <828083e70806160627n464bb71l3338373854e58e94@mail.gmail.com>
	<21e770780806160847h374e18cblbff2c882a280b381@mail.gmail.com>
Message-ID: <828083e70806160905w2d399908gdac11d7b01f03681@mail.gmail.com>

On Mon, Jun 16, 2008 at 4:47 PM, Brian Suda <brian.suda@gmail.com> wrote:
> On 6/16/08, gabriele renzi <rff.rff@gmail.com> wrote:
>>  given this code [..]  found online[1], am I correct in asuming that it is badly formatted?
>>  Specifically, the "url" property is in a SPAN element, so I would use
>>  "Melanie Kl??" as its value, but I take it that the author wanted me
>>  to use the child element, so the href value of the A tag.
>
> --- you are correct in you parsing rules, if the class="url" is on the
> span, then it would use "Melanie Kl??" and if you did want the HTTP
> value, you would need to move the class="url" onto the 'a' element.


yay :D

would it make sense to add something on the lines of this to the test
suite? I can provide a patch if needed, since I already have it in my
test suite.


>>  Is this behaviour correct, or shall I do it differently (and report
>>  bugs to other uf-parser authors) ?
>
> --- you parsing and the parsing of other parsers is correct, it seems
> to be an issue on the website[1], it would be best to report the issue
> to them and help get it corrected.
>


I will, thanks again.


-- 
goto 10: http://www.goto10.it
blog it: http://riffraff.blogsome.com
blog en: http://www.riffraff.info

From dangiankit at gmail.com  Thu Jun 19 02:30:37 2008
From: dangiankit at gmail.com (Ankit Dangi)
Date: Thu Jun 19 02:30:59 2008
Subject: [uf-dev] XFN needs a tool for recommending users to help link to
	blog posts
Message-ID: <fb8a61a70806190230t2b359fa8ga6862e14e89d5470@mail.gmail.com>

Hi XFN Mates,

As per my understanding of XFN, I feel, it allows the owner of the blog to
link to his/her friend's blog. Probably, that's what XFN stands for too. It
seems to be a manual task. And, also not full-proof. My reasons are
mentioned below.

There are three things which I see, in concern to XFN, they are - a blog,
blog posts, and a blog roll. As far as my understanding has developed over
years, a blog is a container for blog posts, and blog roll. On the blog, the
user links to his/her friend's blog via blog roll, but might also link to
his/her friend's specific blog post, in his/her own blog post.

We add XFN to the blog roll, and NOT to the blog post, of course we can!
(Refer: http://gmpg.org/xfn/faq). But, if we seriously want the Friends
Network to be strong enough, then, I see, a need for a tool which shall
detect the user's friends URL (as matching against the ones at the blog
roll), from the user's blog posts, and recommend the user to update those
links, and add XFN accordingly. The user, then, has a choice to, add XFN to
those links too, of which there are higher chances that those links go
unnoticed.

*The key idea is to recommend the user (that XFN is applicable, and useful)
of/to any link that he/she is making to his/her friend's blog
post.*Probably, using which, similar blog posts could be identified,
from a
friend's group, and they might be able to collaborate, in a better way.
Adding the true sense, to the XHTML Friends Network (XFN).

Cross posted at
*microformats-discuss<http://microformats.org/mailman/listinfo/microformats-discuss>mailing
list.
*

-- 
Ankit Dangi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080619/08cd78da/attachment.html
From fberriman at gmail.com  Fri Jun 20 03:40:40 2008
From: fberriman at gmail.com (Frances Berriman)
Date: Fri Jun 20 03:40:42 2008
Subject: [uf-dev] Using class for non-human data
Message-ID: <e86992a40806200340p3056c616h46db9a5cc0011e63@mail.gmail.com>

Hey all,

Firstly, with my BBC hat on, I wanted to point out that our Standards
and Guidelines group have recently added a few additional clauses to
our semantic markup standards.  They are as follows (I don't think the
most recent document is available yet, but I'll certainly link through
to it when it's available):

--
5.1. Title attributes MUST contain human-readable data.
--
8.1. You MAY use microformats on your site where there are agreed
specifications (refer to the Microformats community wiki site for
details) with the exception of those that use the title attribute of
HTML's abbr element.
8.1.1. Some microformats use the abbr element to conceal
machine-readable data; for example, date-times and geographical
coordinates. For screen-reader users that expand abbreviations they
will hear the full date-time or coordinate; for example
2008-05-15T19:30:00+01:00 instead of 19:30.
8.1.2. If you want to use microformats in the abbr element you MUST
first discuss this with the Editor, Standards and Guidelines.
8.2. If you do use microformats, you MUST ensure that the title
attribute contains human-readable data. See also Title attributes
above.
--

Consequently, we've been looking at the machine-data proposals in the
hope that we'll be able to keep using things like hCalendar.  After
having a chat with Ben about that document, we (myself and colleagues)
have these additional concerns with the proposed solution:
* The empty tag causes potential problems in CMS implementations (i.e.
some of our tools, for example, will publish <span title="foo" />
instead of the desired empty element).
* Using two elements for one job.
* The data is not discretely associated with what it *should* be surrounding.
* Future proof?  What if screen readers did start to implement always
expanding title attributes, even on empty elements?

Additionally, we felt a concern about using empty elements could
encourage bad practices and also with our new (but not necessarily
irreversible) guidelines about the contents of title attributes, we're
a little stuck.

Being that our main concerns centre around the questionable use of
"title", we've been looking at the idea of using "class" instead.
Something along the lines of:

<span class="dtstart data-20051010T10:10:10-0100">10 o'clock on the 10th</span>

The pros to this would be that it's non-harmful and the HTML spec does
suggest that user-agent data may be stored in class.  On the downside,
the semantics again could be questionable (but arguably less so than
the semantics of title).

What I'm interested in talking about is what other problems arise from
using "class". Can it be used, should it be used and what problems
could there be from the parsers point of view?  Have we missed
something fundamental about why we don't already use the class
attribute more often?

I'll create a wiki page for this shortly (any preference where you'd
like this to live, anyone?).  Just wanted to get this out there.

Cheers :)
F


-- 
Frances Berriman
http://fberriman.com
From brian.suda at gmail.com  Fri Jun 20 04:23:06 2008
From: brian.suda at gmail.com (Brian Suda)
Date: Fri Jun 20 04:23:09 2008
Subject: [uf-dev] Using class for non-human data
In-Reply-To: <e86992a40806200340p3056c616h46db9a5cc0011e63@mail.gmail.com>
References: <e86992a40806200340p3056c616h46db9a5cc0011e63@mail.gmail.com>
Message-ID: <21e770780806200423v7104791sdc476037c3a260a3@mail.gmail.com>

On 6/20/08, Frances Berriman <fberriman@gmail.com> wrote:
>  Being that our main concerns centre around the questionable use of
>  "title", we've been looking at the idea of using "class" instead.
>  Something along the lines of:
>
>  <span class="dtstart data-20051010T10:10:10-0100">10 o'clock on the 10th</span>
>
>  I'll create a wiki page for this shortly (any preference where you'd
>  like this to live, anyone?).  Just wanted to get this out there.

---- much of this discussion has already happened and is documented here:
http://microformats.org/wiki/datetime-design-pattern
We can add, rebut, expand on what is there.

-brian

-- 
brian suda
http://suda.co.uk
From fberriman at gmail.com  Sat Jun 21 05:56:19 2008
From: fberriman at gmail.com (Frances Berriman)
Date: Sat Jun 21 05:56:22 2008
Subject: [uf-dev] Using class for non-human data
In-Reply-To: <21e770780806200423v7104791sdc476037c3a260a3@mail.gmail.com>
References: <e86992a40806200340p3056c616h46db9a5cc0011e63@mail.gmail.com>
	<21e770780806200423v7104791sdc476037c3a260a3@mail.gmail.com>
Message-ID: <e86992a40806210556o789edc1dm9b18353421fcdeb9@mail.gmail.com>

2008/6/20 Brian Suda <brian.suda@gmail.com>:
> On 6/20/08, Frances Berriman <fberriman@gmail.com> wrote:
>>  Being that our main concerns centre around the questionable use of
>>  "title", we've been looking at the idea of using "class" instead.
>>  Something along the lines of:
>>
>>  <span class="dtstart data-20051010T10:10:10-0100">10 o'clock on the 10th</span>
>>
>>  I'll create a wiki page for this shortly (any preference where you'd
>>  like this to live, anyone?).  Just wanted to get this out there.
>
> ---- much of this discussion has already happened and is documented here:
> http://microformats.org/wiki/datetime-design-pattern
> We can add, rebut, expand on what is there.
>
> -brian

Cool - started a new section.


http://microformats.org/wiki/datetime-design-pattern#Machine-data_in_class


-- 
Frances Berriman
http://fberriman.com
From lists at ben-ward.co.uk  Sat Jun 21 13:43:04 2008
From: lists at ben-ward.co.uk (Ben Ward)
Date: Sat Jun 21 13:43:36 2008
Subject: [uf-dev] [value-excerption-pattern] Resolve Depth of Parsing
Message-ID: <90CF2E70-3192-4166-832F-30ADDF86FCC6@ben-ward.co.uk>

Hi devs,

I'm back on the value-excerption-pattern issues list, and working on  
the open Depth of Parsing issue. I'll quote it here for your  
convenience, but it's live on the wiki here: <http://microformats.org/wiki/value-excerption-pattern-issues#Depth_of_Parsing 
 >

> <div class="hentry vevent">
>     <h1 class="entry-title summary">Party on Sunday!</h1>
>     <div class="updated published">Tuesday <span  
> class="value">2008-06-17</span></div>
>     <p class="entry-content description">We're having a party on
>         <span class="dtstart">Sunday, at 7pm!
>             <span class="value">2008-06-22T19:00:00+0100</span>
>         </span>.
>         Please bring your friends!
>     </p>
> </div>


Where the parsing rules for value-excerption-pattern parse all  
descendants by default, that results in the follow hAtom structure:

> ENTRY
>     ENTRY-TITLE=Party on Sunday!
>     UPDATED=2008-06-17
>     PUBLISHED=2008-06-17
>     ENTRY-CONTENT=2008-06-22T19:00:00+0100

Note, this is not a case of one microformat embedded within another ?  
which alone could be resolved by including the ?mfo? pattern in this  
spec (assuming it were seen as a good idea, which would be debated in  
itself). Instead, I propose the following parsing behaviour. It would  
solve this issue, and would not introduce additional processing  
instructions to the class attribute for properties (or root nodes).

So:

   * Specify that by default, parsers only parse *children* of the  
parent element and not all descendants
   * Ideally that would be it, Toby I expressed children-only is too  
restrictive, so also provision for individual properties to override  
the child-only default, and instead parse *all* descendants (where we  
do not feel a child will not contain other properties)

This would result in a parse-depth flag on all fields, with some  
getting overridden to parse all descendants, which can be well  
structured solution and documented. Property name dictionaries in  
parsers would have to include the depth flag with the affected  
properties.

I think this is a better solution than adding mfo and an equivalent  
property level processing flag ? which is lot of publishing complexity  
? and I think it makes most sense to default to the more conservative  
model (children only) with overriding to a liberal descendants parse  
for properties where it is required.

Feedback on this behaviour would be greatly appreciated,

Thanks,

Ben
From fberriman at gmail.com  Mon Jun 23 03:12:53 2008
From: fberriman at gmail.com (Frances Berriman)
Date: Mon Jun 23 03:12:56 2008
Subject: [uf-dev] Using class for non-human data
In-Reply-To: <e86992a40806210556o789edc1dm9b18353421fcdeb9@mail.gmail.com>
References: <e86992a40806200340p3056c616h46db9a5cc0011e63@mail.gmail.com>
	<21e770780806200423v7104791sdc476037c3a260a3@mail.gmail.com>
	<e86992a40806210556o789edc1dm9b18353421fcdeb9@mail.gmail.com>
Message-ID: <e86992a40806230312t10eb3ca9w5ce141e45cb45a48@mail.gmail.com>

On 21/06/2008, Frances Berriman <fberriman@gmail.com> wrote:
> 2008/6/20 Brian Suda <brian.suda@gmail.com>:
>
> > On 6/20/08, Frances Berriman <fberriman@gmail.com> wrote:
>  >>  Being that our main concerns centre around the questionable use of
>  >>  "title", we've been looking at the idea of using "class" instead.
>  >>  Something along the lines of:
>  >>
>  >>  <span class="dtstart data-20051010T10:10:10-0100">10 o'clock on the 10th</span>
>  >>
>  >>  I'll create a wiki page for this shortly (any preference where you'd
>  >>  like this to live, anyone?).  Just wanted to get this out there.
>  >
>  > ---- much of this discussion has already happened and is documented here:
>  > http://microformats.org/wiki/datetime-design-pattern
>  > We can add, rebut, expand on what is there.
>  >
>  > -brian
>
>
> Cool - started a new section.
>
>
>  http://microformats.org/wiki/datetime-design-pattern#Machine-data_in_class
>


Again, more information pertaining to this.  The Programmes team have
just announced their upcoming removal of hCal from /programmes to
backstage.bbc.

http://www.bbc.co.uk/blogs/radiolabs/2008/06/removing_microformats_from_bbc.shtml

-- 
Frances Berriman
http://fberriman.com
From glenn.jones at madgex.com  Mon Jun 23 08:12:24 2008
From: glenn.jones at madgex.com (Glenn Jones)
Date: Mon Jun 23 08:12:40 2008
Subject: [uf-dev] Using class for non-human data
Message-ID: <36A319113CF910438942741C4727ADFF020DF7E0@MOBY.Clarence.local>

>> Again, more information pertaining to this.  The Programmes team have
just announced their upcoming removal of hCal from /programmes >> to
backstage.bbc.

>>
http://www.bbc.co.uk/blogs/radiolabs/2008/06/removing_microformats_from_
bbc.shtml

I must say that although I am equal frustrate that there has not been a
resolve the abbreviation design pattern accessible issue, the BBC
response seems like a heavy handed ploy to force things. 

I sort of like the suggestion that Frances put forward, as Toby said on
the wiki "least harmful solution proposed so far". It should not take
too much to add this the UfXtract. 

I would have only used this pattern for data types meant as machine
alternatives which remove human ambiguity.

Datetime  
Durations
Timezones
Geo

These formats do not use spaces and resolve some of parsing issue Toby
raised on the wiki page.


The following may of been easier for authors to understand?

<span class="dtstart{2005-10-10T10:10:10-0100}">10 o'clock</span></span>

The use of {} for data is becoming more popular with OpenSearch etc. It
directly links the property and value.


Glenn Jones


From fberriman at gmail.com  Mon Jun 23 08:42:59 2008
From: fberriman at gmail.com (Frances Berriman)
Date: Mon Jun 23 08:43:03 2008
Subject: [uf-dev] Using class for non-human data
In-Reply-To: <36A319113CF910438942741C4727ADFF020DF7E0@MOBY.Clarence.local>
References: <36A319113CF910438942741C4727ADFF020DF7E0@MOBY.Clarence.local>
Message-ID: <e86992a40806230842i2b74a2c2ib301ca6a256b5360@mail.gmail.com>

On 23/06/2008, Glenn Jones <glenn.jones@madgex.com> wrote:

>  The following may of been easier for authors to understand?
>
>  <span class="dtstart{2005-10-10T10:10:10-0100}">10 o'clock</span></span>
>
>  The use of {} for data is becoming more popular with OpenSearch etc. It
>  directly links the property and value.

Merging it like that wouldn't be ideal for styling (we did toy with
the idea of dstart-2005-10-10T10:10:10-0100, for example).

The data- prefix would make the same data available to more than one
attribute too - rather than having to repeat the same data more than
once if it happens to be in the same element.

I'm not sure if it's really easier to understand, to be honest.


-- 
Frances Berriman
http://fberriman.com
From csarven at gmail.com  Mon Jun 23 08:57:33 2008
From: csarven at gmail.com (Sarven Capadisli)
Date: Mon Jun 23 08:57:38 2008
Subject: [uf-dev] Using class for non-human data
In-Reply-To: <e86992a40806230842i2b74a2c2ib301ca6a256b5360@mail.gmail.com>
References: <36A319113CF910438942741C4727ADFF020DF7E0@MOBY.Clarence.local>
	<e86992a40806230842i2b74a2c2ib301ca6a256b5360@mail.gmail.com>
Message-ID: <d4154bcf0806230857x5bd8917clf138ee0ec34e8b65@mail.gmail.com>

Earlier this year, Andy Mabbett proposed a clear use of the "data" prefix here:

http://microformats.org/discuss/mail/microformats-discuss/2008-February/011583.html

Doesn't conflict with styling.

-Sarven


On Mon, Jun 23, 2008 at 10:42 AM, Frances Berriman <fberriman@gmail.com> wrote:
> On 23/06/2008, Glenn Jones <glenn.jones@madgex.com> wrote:
>
>>  The following may of been easier for authors to understand?
>>
>>  <span class="dtstart{2005-10-10T10:10:10-0100}">10 o'clock</span></span>
>>
>>  The use of {} for data is becoming more popular with OpenSearch etc. It
>>  directly links the property and value.
>
> Merging it like that wouldn't be ideal for styling (we did toy with
> the idea of dstart-2005-10-10T10:10:10-0100, for example).
>
> The data- prefix would make the same data available to more than one
> attribute too - rather than having to repeat the same data more than
> once if it happens to be in the same element.
>
> I'm not sure if it's really easier to understand, to be honest.
>
>
>
> --
> Frances Berriman
> http://fberriman.com
> _______________________________________________
> microformats-dev mailing list
> microformats-dev@microformats.org
> http://microformats.org/mailman/listinfo/microformats-dev
>
From mkaply at us.ibm.com  Mon Jun 23 08:57:57 2008
From: mkaply at us.ibm.com (Michael Kaply)
Date: Mon Jun 23 08:58:23 2008
Subject: [uf-dev] Using class for non-human data
In-Reply-To: <36A319113CF910438942741C4727ADFF020DF7E0@MOBY.Clarence.local>
Message-ID: <OFE327A670.69C3153F-ON86257471.00578C00-86257471.0057B454@us.ibm.com>

>
> The following may of been easier for authors to understand?
>
> <span class="dtstart{2005-10-10T10:10:10-0100}">10 o'clock</span></span>
>
> The use of {} for data is becoming more popular with OpenSearch etc. It
> directly links the property and value.

But how would you detect this in a parser? Currently we look for a class of
dtstart. how would you do a getElementsByClassName?

I personally don't like the BBC suggestion at all. Hiding data in the class
tag just seems like a hack. Especially since I have to look at every class
attribute to decide if it is data for the microformat.

I'd almost rather use a non standard attribute.

Michael Kaply
Firefox Advocate
mkaply@us.ibm.com
http://www.kaply.com/weblog/ (External Blog)
http://blogs.tap.ibm.com/weblogs/page/mkaply@us.ibm.com (Internal Blog)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080623/891e9c2e/attachment.html
From mail at tobyinkster.co.uk  Mon Jun 23 09:01:10 2008
From: mail at tobyinkster.co.uk (Toby A Inkster)
Date: Mon Jun 23 09:01:37 2008
Subject: [uf-dev] Using class for non-human data
Message-ID: <37129E32-A67B-44A3-BB57-4C3C1FE456BE@tobyinkster.co.uk>

Of course the other approach is to say "to hell with validity" and  
embrace RDFa's "content" attribute, which can be introduced in a very  
easy and straight-forward manner without using the rest of RDFa:

<span class="dtstart" content="2008-06-23">Today</span>

(Of course, this *can* be made to be valid by using a custom DTD, or  
indeed the RDFa DTD.)

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>


From jaffathecake at gmail.com  Mon Jun 23 09:43:15 2008
From: jaffathecake at gmail.com (Jake Archibald)
Date: Mon Jun 23 09:43:20 2008
Subject: [uf-dev] Using class for non-human data
In-Reply-To: <OFE327A670.69C3153F-ON86257471.00578C00-86257471.0057B454@us.ibm.com>
References: <36A319113CF910438942741C4727ADFF020DF7E0@MOBY.Clarence.local>
	<OFE327A670.69C3153F-ON86257471.00578C00-86257471.0057B454@us.ibm.com>
Message-ID: <3be0bf100806230943i69524869w654648f77633a750@mail.gmail.com>

2008/6/23 Michael Kaply <mkaply@us.ibm.com>:

>
>
> But how would you detect this in a parser? Currently we look for a class of
> dtstart. how would you do a getElementsByClassName?
>
>

> I personally don't like the BBC suggestion at all. Hiding data in the class
> tag just seems like a hack. Especially since I have to look at every class
> attribute to decide if it is data for the microformat.
>
> I'd almost rather use a non standard attribute.
>

It is a hack, but so is using title. I find using class less hacky because
the data doesn't end up in a human readable space (as title does). "For
general purpose processing by user agents" is what the HTML spec says of the
class attribute.

But yes, the dtstart class should remain, followed by a separate data class.

In implementations and standards, the class attribute has always been for
machine data. This is not true of title.

Jake.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080623/adab769d/attachment.html
From guillaume at lebleu.org  Mon Jun 23 14:09:51 2008
From: guillaume at lebleu.org (Guillaume Lebleu)
Date: Mon Jun 23 14:10:13 2008
Subject: [uf-dev] Using class for non-human data
In-Reply-To: <37129E32-A67B-44A3-BB57-4C3C1FE456BE@tobyinkster.co.uk>
References: <37129E32-A67B-44A3-BB57-4C3C1FE456BE@tobyinkster.co.uk>
Message-ID: <4860111F.5010104@lebleu.org>

Toby A Inkster wrote:
> Of course the other approach is to say "to hell with validity" and 
> embrace RDFa's "content" attribute, which can be introduced in a very 
> easy and straight-forward manner without using the rest of RDFa
Having followed the discussions on this matter for some time, it seems 
to me that we are indeed reaching a limit here, in terms of keeping both 
compliant with XHTML semantics and adhering to a (unwritten?) principle 
that microformats should not influence how the human-readable content is 
written in the first place.

For those implementations not willing to say "to hell with validity", 
could they get away with a machine-readable content for dates that gets 
formatted in a human friendly way in JavaScript for display to humans?

For instance, the HTML would be <span class="dstart 
date">2005-10-10T10:10:10-0100</span>, but by way of a "data pretty 
printer" (something like http://ejohn.org/blog/javascript-pretty-date/), 
it would be displayed as "10:10am on October 10th 2005".

Is this a heresy? What do you think?

Guillaume


From norm at cackhanded.net  Mon Jun 23 14:21:46 2008
From: norm at cackhanded.net (Mark Norman Francis)
Date: Mon Jun 23 14:21:51 2008
Subject: [uf-dev] Using class for non-human data
In-Reply-To: <36A319113CF910438942741C4727ADFF020DF7E0@MOBY.Clarence.local>
References: <36A319113CF910438942741C4727ADFF020DF7E0@MOBY.Clarence.local>
Message-ID: <79025A92-1F49-4863-AB58-D4792C8DA6BC@cackhanded.net>

> I must say that although I am equal frustrate that there has not  
> been a
> resolve the abbreviation design pattern accessible issue, the BBC
> response seems like a heavy handed ploy to force things.


I just want to say as a little aside, I didn't take it as a ploy nor  
as heavy-handed myself. Although this could be a side-effect of my  
having made the same decision for what is probably very similar  
reasons in my job at Y!. I just didn't blog about it openly, whereas  
the BBC did.

It's a matter of priorities -- and it would seem that, for the people  
who set the semantic standards at the BBC, accessibility and clarity  
of content for humans takes priority over encoding data to be machine  
readable.

-- Norm.

From brady.k at gmail.com  Mon Jun 23 14:31:17 2008
From: brady.k at gmail.com (Kyle Brady)
Date: Mon Jun 23 14:31:21 2008
Subject: [uf-dev] Implementation Question
Message-ID: <ad68fd140806231431q409c27b8jbf5ba5cc5fdbb288@mail.gmail.com>

Hi,

I've recently started working on a project that I've dubbed "mySocialBlog" (
http://code.google.com/p/my-social-blog), and was wondering if anyone would
be interested in being the "microformats expert" on this project?

Basically, I'm trying to create a way for people to implement the same
social information they might put on Facebook on their blog, using CSV
[exported spreadsheets, for now], and microformat it.  In essence, an "all
about me" network... not really social, but the profile aspect of social
networks.

Anyways, I want to make sure I'm doing it right, and could use some help.
If you want to check out the progress so far, see it on my blog (
http://www.kyle-brady.com/my-library is a good example)... the code release
is coming soon.

Thanks, and hope to hear from some of you!

--
  Kyle Brady
  750 Miller St., Apt. 404
  San Jose, California 95110
  408-828-3861

   My Business:  http://www.int-ind.com
   My OneSwirl:  http://www.oneswirl.com/KyleBrady

  [all contact methods available at OneSwirl]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080623/11362065/attachment-0001.html
From aconbere at gmail.com  Mon Jun 23 14:58:10 2008
From: aconbere at gmail.com (anders conbere)
Date: Mon Jun 23 14:58:12 2008
Subject: [uf-dev] Implementation Question
In-Reply-To: <ad68fd140806231431q409c27b8jbf5ba5cc5fdbb288@mail.gmail.com>
References: <ad68fd140806231431q409c27b8jbf5ba5cc5fdbb288@mail.gmail.com>
Message-ID: <8ca3fbe80806231458s60273f49i21856ad0d3975dea@mail.gmail.com>

On Mon, Jun 23, 2008 at 2:31 PM, Kyle Brady <brady.k@gmail.com> wrote:
> Hi,
>
> I've recently started working on a project that I've dubbed "mySocialBlog"
> (http://code.google.com/p/my-social-blog), and was wondering if anyone would
> be interested in being the "microformats expert" on this project?
>
> Basically, I'm trying to create a way for people to implement the same
> social information they might put on Facebook on their blog, using CSV
> [exported spreadsheets, for now], and microformat it.  In essence, an "all
> about me" network... not really social, but the profile aspect of social
> networks.
>
> Anyways, I want to make sure I'm doing it right, and could use some help.
> If you want to check out the progress so far, see it on my blog
> (http://www.kyle-brady.com/my-library is a good example)... the code release
> is coming soon.

Not sure if you've talked to them, but it might be interesting for you
to talk with the Diso project.

~ Anders

>
> Thanks, and hope to hear from some of you!
>
> --
>   Kyle Brady
>   750 Miller St., Apt. 404
>   San Jose, California 95110
>   408-828-3861
>
>    My Business:  http://www.int-ind.com
>    My OneSwirl:  http://www.oneswirl.com/KyleBrady
>
>   [all contact methods available at OneSwirl]
> _______________________________________________
> microformats-dev mailing list
> microformats-dev@microformats.org
> http://microformats.org/mailman/listinfo/microformats-dev
>
>
From danbri at danbri.org  Mon Jun 23 14:58:45 2008
From: danbri at danbri.org (Dan Brickley)
Date: Mon Jun 23 14:58:50 2008
Subject: [uf-dev] Using class for non-human data
In-Reply-To: <3be0bf100806230943i69524869w654648f77633a750@mail.gmail.com>
References: <36A319113CF910438942741C4727ADFF020DF7E0@MOBY.Clarence.local>	<OFE327A670.69C3153F-ON86257471.00578C00-86257471.0057B454@us.ibm.com>
	<3be0bf100806230943i69524869w654648f77633a750@mail.gmail.com>
Message-ID: <48601C95.2080205@danbri.org>

Jake Archibald wrote:
> 2008/6/23 Michael Kaply <mkaply@us.ibm.com <mailto:mkaply@us.ibm.com>>:
> 
> 
> 
>     But how would you detect this in a parser? Currently we look for a
>     class of dtstart. how would you do a getElementsByClassName?
>      
> 
> 
>     I personally don't like the BBC suggestion at all. Hiding data in
>     the class tag just seems like a hack. Especially since I have to
>     look at every class attribute to decide if it is data for the
>     microformat.
> 
>     I'd almost rather use a non standard attribute.
> 
> 
> It is a hack, but so is using title. I find using class less hacky 
> because the data doesn't end up in a human readable space (as title 
> does). "For general purpose processing by user agents" is what the HTML 
> spec says of the class attribute.
> 
> But yes, the dtstart class should remain, followed by a separate data class.
> 
> In implementations and standards, the class attribute has always been 
> for machine data. This is not true of title.

That's my reading too; 'class' seems a home worth investigating for this 
data...

Dan

--
http://danbri.org/
From brady.k at gmail.com  Mon Jun 23 15:14:33 2008
From: brady.k at gmail.com (Kyle Brady)
Date: Mon Jun 23 15:14:37 2008
Subject: [uf-dev] Implementation Question
In-Reply-To: <8ca3fbe80806231458s60273f49i21856ad0d3975dea@mail.gmail.com>
References: <ad68fd140806231431q409c27b8jbf5ba5cc5fdbb288@mail.gmail.com>
	<8ca3fbe80806231458s60273f49i21856ad0d3975dea@mail.gmail.com>
Message-ID: <ad68fd140806231514t6a1c8910i2cf959f62bdd8aaa@mail.gmail.com>

I've heard of them, only just recently actually, but are you saying they
would be interested in helping me with the microformats check?  Or that they
are doing something like I am?

Thanks

--
Kyle Brady
750 Miller St., Apt. 404
San Jose, California 95110
408-828-3861

My Business: http://www.int-ind.com
My Blog: http://www.kyle-brady.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080623/c4bee9d0/attachment.html
From jaffathecake at gmail.com  Tue Jun 24 00:00:38 2008
From: jaffathecake at gmail.com (Jake Archibald)
Date: Tue Jun 24 00:00:42 2008
Subject: [uf-dev] Using class for non-human data
In-Reply-To: <4860111F.5010104@lebleu.org>
References: <37129E32-A67B-44A3-BB57-4C3C1FE456BE@tobyinkster.co.uk>
	<4860111F.5010104@lebleu.org>
Message-ID: <3be0bf100806240000q2b34da95lc895c1016f7707c8@mail.gmail.com>

Any solution which requires CSS, JavaScript, prevents HTML4 / XHTML
validation, or puts machine data in a human readable place isn't
really an option for the BBC (and sites with a similar range of
users).

For me, the great thing about microformats is they don't break
validation and *shouldn't* impact on usability & accessibility.

On 6/23/08, Guillaume Lebleu <guillaume@lebleu.org> wrote:
> Toby A Inkster wrote:
>> Of course the other approach is to say "to hell with validity" and
>> embrace RDFa's "content" attribute, which can be introduced in a very
>> easy and straight-forward manner without using the rest of RDFa
> Having followed the discussions on this matter for some time, it seems
> to me that we are indeed reaching a limit here, in terms of keeping both
> compliant with XHTML semantics and adhering to a (unwritten?) principle
> that microformats should not influence how the human-readable content is
> written in the first place.
>
> For those implementations not willing to say "to hell with validity",
> could they get away with a machine-readable content for dates that gets
> formatted in a human friendly way in JavaScript for display to humans?
>
> For instance, the HTML would be <span class="dstart
> date">2005-10-10T10:10:10-0100</span>, but by way of a "data pretty
> printer" (something like http://ejohn.org/blog/javascript-pretty-date/),
> it would be displayed as "10:10am on October 10th 2005".
>
> Is this a heresy? What do you think?
>
> Guillaume
>
>
> _______________________________________________
> microformats-dev mailing list
> microformats-dev@microformats.org
> http://microformats.org/mailman/listinfo/microformats-dev
>

-- 
Sent from Google Mail for mobile | mobile.google.com
From andr3.pt at gmail.com  Tue Jun 24 04:52:23 2008
From: andr3.pt at gmail.com (=?ISO-8859-1?Q?Andr=E9_Lu=EDs?=)
Date: Tue Jun 24 04:52:27 2008
Subject: [uf-dev] Using class for non-human data
In-Reply-To: <48601C95.2080205@danbri.org>
References: <36A319113CF910438942741C4727ADFF020DF7E0@MOBY.Clarence.local>
	<OFE327A670.69C3153F-ON86257471.00578C00-86257471.0057B454@us.ibm.com>
	<3be0bf100806230943i69524869w654648f77633a750@mail.gmail.com>
	<48601C95.2080205@danbri.org>
Message-ID: <dc1a17860806240452y98f7d23pd212cdd3f4c1d85a@mail.gmail.com>

Since the problems are arising from machine-data ending up on
human-readable attributes, why can't we "compromise" and accept to
have machine-data-values on non-human-readable attributes?

Also, extending the document with namespaces limits the usage to
xhtml, and according to POSH principles, we don't want that.

Like you guys mentioned, leaving the dtstart but adding an extra
value... would it be too much of a hassle for parsers?

<abbr class="dtstart data{2008-06-23}" title=June 23rd, 2008">Today</abbr>

1. grab elementByClassName( dtstart )
2. get classnames as array
3. grab classname after dtstart(ie, i+1, i being the index of
dtstart), does it match /data{[^}]*}/ ?
4. if yes, use it as value.

What's so wrong with this approach?

Isn't it widely accepted that this is the achilles' heel of all design
patterns used by microformats? We must start accepting the fact that
without extending html we don't have much attributes to choose from...

--
Andr? Lu?s


On Mon, Jun 23, 2008 at 10:58 PM, Dan Brickley <danbri@danbri.org> wrote:
> Jake Archibald wrote:
>>
>> 2008/6/23 Michael Kaply <mkaply@us.ibm.com <mailto:mkaply@us.ibm.com>>:
>>
>>
>>
>>    But how would you detect this in a parser? Currently we look for a
>>    class of dtstart. how would you do a getElementsByClassName?
>>
>>
>>    I personally don't like the BBC suggestion at all. Hiding data in
>>    the class tag just seems like a hack. Especially since I have to
>>    look at every class attribute to decide if it is data for the
>>    microformat.
>>
>>    I'd almost rather use a non standard attribute.
>>
>>
>> It is a hack, but so is using title. I find using class less hacky because
>> the data doesn't end up in a human readable space (as title does). "For
>> general purpose processing by user agents" is what the HTML spec says of the
>> class attribute.
>>
>> But yes, the dtstart class should remain, followed by a separate data
>> class.
>>
>> In implementations and standards, the class attribute has always been for
>> machine data. This is not true of title.
>
> That's my reading too; 'class' seems a home worth investigating for this
> data...
>
> Dan
>
> --
> http://danbri.org/
> _______________________________________________
> microformats-dev mailing list
> microformats-dev@microformats.org
> http://microformats.org/mailman/listinfo/microformats-dev
>

From eivindu at ifi.uio.no  Tue Jun 24 05:13:57 2008
From: eivindu at ifi.uio.no (Eivind Uggedal)
Date: Tue Jun 24 05:14:01 2008
Subject: [uf-dev] Using class for non-human data
In-Reply-To: <dc1a17860806240452y98f7d23pd212cdd3f4c1d85a@mail.gmail.com>
References: <36A319113CF910438942741C4727ADFF020DF7E0@MOBY.Clarence.local>
	<OFE327A670.69C3153F-ON86257471.00578C00-86257471.0057B454@us.ibm.com>
	<3be0bf100806230943i69524869w654648f77633a750@mail.gmail.com>
	<48601C95.2080205@danbri.org>
	<dc1a17860806240452y98f7d23pd212cdd3f4c1d85a@mail.gmail.com>
Message-ID: <824b51d00806240513x36e9009cof64436ff7550c811@mail.gmail.com>

> <abbr class="dtstart data{2008-06-23}" title=June 23rd, 2008">Today</abbr>
>
> 1. grab elementByClassName( dtstart )
> 2. get classnames as array
> 3. grab classname after dtstart(ie, i+1, i being the index of
> dtstart), does it match /data{[^}]*}/ ?

It would be potentially dangerous to have assumptions of the ordering
of class names. Another level of unneeded complexity. This snippet
should also be parseable:

<abbr class="fancy data{2008-06-23} dstart" title=June 23rd, 2008">Today</abbr>

-- 
Cheers,
Eivind Uggedal
Engineer,
Faculty of Social Science,
MSc Computer Science,
University of Oslo
From andr3.pt at gmail.com  Tue Jun 24 05:24:07 2008
From: andr3.pt at gmail.com (=?ISO-8859-1?Q?Andr=E9_Lu=EDs?=)
Date: Tue Jun 24 05:24:10 2008
Subject: [uf-dev] Using class for non-human data
In-Reply-To: <824b51d00806240513x36e9009cof64436ff7550c811@mail.gmail.com>
References: <36A319113CF910438942741C4727ADFF020DF7E0@MOBY.Clarence.local>
	<OFE327A670.69C3153F-ON86257471.00578C00-86257471.0057B454@us.ibm.com>
	<3be0bf100806230943i69524869w654648f77633a750@mail.gmail.com>
	<48601C95.2080205@danbri.org>
	<dc1a17860806240452y98f7d23pd212cdd3f4c1d85a@mail.gmail.com>
	<824b51d00806240513x36e9009cof64436ff7550c811@mail.gmail.com>
Message-ID: <dc1a17860806240524w1989f1e2uc07f14055be996a6@mail.gmail.com>

On Tue, Jun 24, 2008 at 1:13 PM, Eivind Uggedal <eivindu@ifi.uio.no> wrote:
>> <abbr class="dtstart data{2008-06-23}" title=June 23rd, 2008">Today</abbr>
>>
>> 1. grab elementByClassName( dtstart )
>> 2. get classnames as array
>> 3. grab classname after dtstart(ie, i+1, i being the index of
>> dtstart), does it match /data{[^}]*}/ ?
>
> It would be potentially dangerous to have assumptions of the ordering
> of class names. Another level of unneeded complexity. This snippet
> should also be parseable:
>
> <abbr class="fancy data{2008-06-23} dstart" title=June 23rd, 2008">Today</abbr>
>
> --

Eivind,

I understand that. I was trying provide an example that allowed
multiple classnames + associated values within the same element.

I agree it's added complexity... if you never really need to add extra
classnames to the same element and specify their data values, it's
perfectly fine using whatever data{.*} you find (first?). :)

Oh one thing I haven't seen mentioned is... this doesn't have to
_replace_ abbr design pattern, does it? If parsers added this way to
parse values, authors with accessibility concerns could use this
instead, while avoding breaking current deployments of abbr DP.

--
Andr?

From scott at randomchaos.com  Tue Jun 24 05:37:59 2008
From: scott at randomchaos.com (Scott Reynen)
Date: Tue Jun 24 05:38:11 2008
Subject: [uf-dev] Using class for non-human data
In-Reply-To: <dc1a17860806240452y98f7d23pd212cdd3f4c1d85a@mail.gmail.com>
References: <36A319113CF910438942741C4727ADFF020DF7E0@MOBY.Clarence.local>
	<OFE327A670.69C3153F-ON86257471.00578C00-86257471.0057B454@us.ibm.com>
	<3be0bf100806230943i69524869w654648f77633a750@mail.gmail.com>
	<48601C95.2080205@danbri.org>
	<dc1a17860806240452y98f7d23pd212cdd3f4c1d85a@mail.gmail.com>
Message-ID: <85F3290F-AA48-4A70-9FEB-00046724A2CB@randomchaos.com>

On [Jun 24], at [ Jun 24] 5:52 , Andr? Lu?s wrote:

> <abbr class="dtstart data{2008-06-23}" title=June 23rd, 2008">Today</ 
> abbr>
>
> 1. grab elementByClassName( dtstart )
> 2. get classnames as array
> 3. grab classname after dtstart(ie, i+1, i being the index of
> dtstart), does it match /data{[^}]*}/ ?
> 4. if yes, use it as value.
>
> What's so wrong with this approach?


I'd say there's nothing "so" wrong about it, but there are problems.   
Specifically, "data{2008-06-23}" doesn't seem to be an actual  
classification of "Today."  It doesn't make much sense to say "Today  
belongs to the class data{2008-06-23}."  But the HTML spec says " the  
element may be said to belong to these classes."  Unfortunately we may  
not see the practical implications of such a seemingly insignificant  
deviation from the spec until after a decision is made, as happened  
with the abbr pattern.  Another seemingly small issue: this solution  
binds us to machine-readable data formats that have no spaces.  These  
may not be reasons to discard this solution, but I hope they're at  
least reasons to more thoroughly research potential problems so we  
don't make the same type of mistake again.

Peace,
Scott


From fberriman at gmail.com  Tue Jun 24 05:56:06 2008
From: fberriman at gmail.com (Frances Berriman)
Date: Tue Jun 24 05:56:21 2008
Subject: [uf-dev] Using class for non-human data
In-Reply-To: <dc1a17860806240524w1989f1e2uc07f14055be996a6@mail.gmail.com>
References: <36A319113CF910438942741C4727ADFF020DF7E0@MOBY.Clarence.local>
	<OFE327A670.69C3153F-ON86257471.00578C00-86257471.0057B454@us.ibm.com>
	<3be0bf100806230943i69524869w654648f77633a750@mail.gmail.com>
	<48601C95.2080205@danbri.org>
	<dc1a17860806240452y98f7d23pd212cdd3f4c1d85a@mail.gmail.com>
	<824b51d00806240513x36e9009cof64436ff7550c811@mail.gmail.com>
	<dc1a17860806240524w1989f1e2uc07f14055be996a6@mail.gmail.com>
Message-ID: <e86992a40806240556od212874i5728de350783ef14@mail.gmail.com>

On 24/06/2008, Andr? Lu?s <andr3.pt@gmail.com> wrote:

>  Oh one thing I haven't seen mentioned is... this doesn't have to
>  _replace_ abbr design pattern, does it? If parsers added this way to
>  parse values, authors with accessibility concerns could use this
>  instead, while avoding breaking current deployments of abbr DP.
>

No - I don't think it should replace it.  If an author feels the abbr
is the correct option, they should still be able to use that.


-- 
Frances Berriman
http://fberriman.com

From mail at tobyinkster.co.uk  Tue Jun 24 06:40:58 2008
From: mail at tobyinkster.co.uk (Toby A Inkster)
Date: Tue Jun 24 06:41:29 2008
Subject: [uf-dev] Using class for non-human data
Message-ID: <1EB717A2-C738-4D1C-A166-DFBEEF55CA61@tobyinkster.co.uk>

Scott Reynen wrote:

> Another seemingly small issue: this solution
> binds us to machine-readable data formats that have no spaces.

If you take a look at the Wiki section for this proposal, you'll see  
details of my experimental implementation of this pattern. It allows  
publishers to percent-encode characters such as spaces, which can't  
occur in class names.

For example:
<span class="country-name data-United%20Kingdom">UK</span>

(Though of course, in the example above, the <abbr> design pattern is  
perfectly accessible.)

http://microformats.org/wiki/datetime-design-pattern#Machine- 
data_in_class

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>


From jaffathecake at gmail.com  Tue Jun 24 08:54:42 2008
From: jaffathecake at gmail.com (Jake Archibald)
Date: Tue Jun 24 08:54:46 2008
Subject: [uf-dev] Using class for non-human data
In-Reply-To: <85F3290F-AA48-4A70-9FEB-00046724A2CB@randomchaos.com>
References: <36A319113CF910438942741C4727ADFF020DF7E0@MOBY.Clarence.local>
	<OFE327A670.69C3153F-ON86257471.00578C00-86257471.0057B454@us.ibm.com>
	<3be0bf100806230943i69524869w654648f77633a750@mail.gmail.com>
	<48601C95.2080205@danbri.org>
	<dc1a17860806240452y98f7d23pd212cdd3f4c1d85a@mail.gmail.com>
	<85F3290F-AA48-4A70-9FEB-00046724A2CB@randomchaos.com>
Message-ID: <3be0bf100806240854x4eacb35epec1590b4771d8b2e@mail.gmail.com>

One possible issue with the data{blah} pattern, if you were to point
at that with a css selector, you'd need to escape the curly braces.

span.dtstart.data\{20080101\} { color:red; }

obviously the above wouldn't work at all in IE6, but you see what I'm
getting at. This wouldn't be an issue with data-blah.

On 6/24/08, Scott Reynen <scott@randomchaos.com> wrote:
> On [Jun 24], at [ Jun 24] 5:52 , Andr? Lu?s wrote:
>
>> <abbr class="dtstart data{2008-06-23}" title=June 23rd, 2008">Today</
>> abbr>
>>
>> 1. grab elementByClassName( dtstart )
>> 2. get classnames as array
>> 3. grab classname after dtstart(ie, i+1, i being the index of
>> dtstart), does it match /data{[^}]*}/ ?
>> 4. if yes, use it as value.
>>
>> What's so wrong with this approach?
>
>
> I'd say there's nothing "so" wrong about it, but there are problems.
> Specifically, "data{2008-06-23}" doesn't seem to be an actual
> classification of "Today."  It doesn't make much sense to say "Today
> belongs to the class data{2008-06-23}."  But the HTML spec says " the
> element may be said to belong to these classes."  Unfortunately we may
> not see the practical implications of such a seemingly insignificant
> deviation from the spec until after a decision is made, as happened
> with the abbr pattern.  Another seemingly small issue: this solution
> binds us to machine-readable data formats that have no spaces.  These
> may not be reasons to discard this solution, but I hope they're at
> least reasons to more thoroughly research potential problems so we
> don't make the same type of mistake again.
>
> Peace,
> Scott
>
>
> _______________________________________________
> microformats-dev mailing list
> microformats-dev@microformats.org
> http://microformats.org/mailman/listinfo/microformats-dev
>

-- 
Sent from Google Mail for mobile | mobile.google.com

From guillaume at lebleu.org  Wed Jun 25 13:13:47 2008
From: guillaume at lebleu.org (Guillaume Lebleu)
Date: Wed Jun 25 13:13:54 2008
Subject: [uf-dev] impact of new vCard on hCard
Message-ID: <4862A6FB.8060901@lebleu.org>

I noticed that the latest vCard draft specification [1] requires the 
content of the TEL property to be of type URI with tel scheme.

Operator supports this and hCard allows it, but if I understand 
correctly, that would mean that an hCard compliant with this new spec 
would require the phone number to always be represented with an HTML anchor:

<a class="tel" href="tel:+14154075856">+1 (415) 407 5856</a>

Thoughts? should I add this to hCard issues on the wiki?

Guillaume

---

[1] http://www.ietf.org/internet-drafts/draft-ietf-vcarddav-vcardrev-02.txt

Excerpt:

7.4.1.  TEL

  Purpose:  To specify the telephone number for telephony communication
     with the object the vCard represents.

  Value type:  A single URI value.  It is expected that the URI scheme
     will be "tel", as specified in [RFC3966], but other schemes MAY be
     used.
From mail at tobyinkster.co.uk  Wed Jun 25 14:09:20 2008
From: mail at tobyinkster.co.uk (Toby A Inkster)
Date: Wed Jun 25 14:09:59 2008
Subject: [uf-dev] impact of new vCard on hCard
Message-ID: <9D659279-6A24-4A47-BA63-046A101A0D57@tobyinkster.co.uk>

Guillaume Lebleu wrote:

> I noticed that the latest vCard draft specification [1] requires the
> content of the TEL property to be of type URI with tel scheme.
>
> Operator supports this and hCard allows it, but if I understand
> correctly, that would mean that an hCard compliant with this new spec
> would require the phone number to always be represented with an  
> HTML anchor:
>
> <a class="tel" href="tel:+14154075856">+1 (415) 407 5856</a>
>
> Thoughts? should I add this to hCard issues on the wiki?

I'm not sure why this would be an issue.

Firstly, hCard normatively references version *3.0* of vCard. The  
draft spec is for vCard 4.0.

Secondly, just because hCard re-uses vCard's terms and ideas, it does  
not follow that hCard re-uses vCard's syntax. For example, address  
components in vCard need to be separated by semicolons -- but they do  
not need to be separated by semicolons in hCard. As another example,  
the "N" property in vCard is always presented in family name, given  
name, additional name, honorific prefix, honorific suffix order, but  
the sub-properties of "n" in hCard may be given in any order. A  
change in vCard syntax does not need to carry over to hCard, as they  
already have entirely different syntaxes.

Some of the more interesting implications of a new version of vCard  
is the new properties available. For example, people using hCard for  
geneology purposes may have been frustrated that although hCard  
offers "bday" for marking up a date of birth, it does not offer a  
corresponding property for date of death. Now that vCard has added a  
DDAY property for marking up a contact's date of death, it is fairly  
safe to say, that if ever hCard does include a property for marking  
up dates of death, then it will almost certainly be called "dday".  
Geneologists can start replacing their own custom class="date-of- 
death", class="died", etc markup with class="dday".

Since April, Cognition has included additional support for the  
following vCard 4.0 properties:

	- kind (e.g. "individual", "org", etc)
	- gender
	- birth (place of birth)
	- dday
	- death (place of death)
	- impp (like "url", but for instant messaging)
	- lang (preferred spoken/written languages)

This support is documented here:
http://buzzword.org.uk/cognition/uf-plus.html#hcard

I see that as of today, they've also added "related" and "member".  
The former will be especially interesting to fans of XFN. I'll look  
into implementing them in Cognition too.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>


From fberriman at gmail.com  Thu Jun 26 09:17:50 2008
From: fberriman at gmail.com (Frances Berriman)
Date: Thu Jun 26 09:17:52 2008
Subject: [uf-dev] Re: Using class for non-human data
In-Reply-To: <e86992a40806200340p3056c616h46db9a5cc0011e63@mail.gmail.com>
References: <e86992a40806200340p3056c616h46db9a5cc0011e63@mail.gmail.com>
Message-ID: <e86992a40806260917y653d919fy1219658986125d95@mail.gmail.com>

On 20/06/2008, Frances Berriman <fberriman@gmail.com> wrote:
> Hey all,
>
>  Firstly, with my BBC hat on, I wanted to point out that our Standards
>  and Guidelines group have recently added a few additional clauses to
>  our semantic markup standards.  They are as follows (I don't think the
>  most recent document is available yet, but I'll certainly link through
>  to it when it's available):


As promised:

http://www.bbc.co.uk/guidelines/newmedia/technical/semantic_markup.shtml#microformats

-- 
Frances Berriman
http://fberriman.com
From glenn.jones at madgex.com  Sun Jun 29 07:17:17 2008
From: glenn.jones at madgex.com (Glenn Jones)
Date: Sun Jun 29 07:17:24 2008
Subject: [uf-dev] Human and machine readable data format
Message-ID: <36A319113CF910438942741C4727ADFF02132AFF@MOBY.Clarence.local>

As we turnaround on the spot about machine data issue, the question of
Natural Language Processing (NPL) has come up again. The main problem
with any form of NLP is there are too many ambiguities in reading dates
or any other form of freeform human written text.  I don't want us to go
down this path it is unworkable with currently available technologies. 

Against this we have statements like Tantek's. "I'm vehemently opposed
to putting data in the class attribute. We must find better
alternatives. We must not go down the path of invisible (dark)
(meta)data - IMHO that principle is inviolable for microformats."

So I have tried to look at this again and reconcile the two opposing
drivers above. Each time it makes me think of a mixed mode human and
machine readable format. The date format which is human readable but has
a very strict format which can be parsed.  So rather than talk about it
I have built a little prototype which demos the idea.  

http://ufxtract.com/experimental/hm-readable-date.htm
    
This approach is not without its own problems, but it would provide a
semantic use of the abbr pattern which does not raise any accessibility
concerns. 

<abbr class="dtstart" title="Date: 25 January 2008 at 15:30, Time zone
+1:00">Jan 25 08</abbr>

On the down side we would have to re-invent the wheel with yet another
date format. This approach would make parsers a lot heavier. Authors
would have to understand the strict nature of the extended format using
the abbr title. etc

I thought I would put this forward - to get shot down ;-)


This concept could be extended to the other data formats:

Date: 25 January 2008
Date: 25 January 2008 at 15:30 
Date: 25 January 2008 at 15:30, Time zone  +1:30
Duration: 3 minutes, 47 seconds
Location:  latitude 37.77, longitude -122.41
Time zone: +1:30
Rated 1 out of 5 


Glenn Jones 


From danny.ayers at gmail.com  Sun Jun 29 10:35:56 2008
From: danny.ayers at gmail.com (Danny Ayers)
Date: Sun Jun 29 10:36:00 2008
Subject: [uf-dev] Human and machine readable data format
In-Reply-To: <36A319113CF910438942741C4727ADFF02132AFF@MOBY.Clarence.local>
References: <36A319113CF910438942741C4727ADFF02132AFF@MOBY.Clarence.local>
Message-ID: <1f2ed5cd0806291035t556ec1datcbbe2340d3244a97@mail.gmail.com>

2008/6/29 Glenn Jones <glenn.jones@madgex.com>:

> As we turnaround on the spot about machine data issue, the question of
> Natural Language Processing (NPL) has come up again. The main problem
> with any form of NLP is there are too many ambiguities in reading dates
> or any other form of freeform human written text.  I don't want us to go
> down this path it is unworkable with currently available technologies.


I'm sure others are more capable than I of giving good responses to your
date format suggestions.  But I find it interesting you should bring NLP up
over here. I'm afraid  I can't resist chipping in on that ;-)

So the basic scenario is presumably the producer(s) wish to convey
information to the consumer(s). [Either of which may be human or largely
automated systems]

* With an isolated Plain Old Semantic HTML document, the majority of the
information is encoded in human-readable text, enhanced with markup elements
(e.g. for emphasis).
* With HTML+HTTP, we get extra semantics through linking - even if it's just
pageA is somehow related to pageB
* With microformats there can be communication of machine-readable data
embedded in the HTML
- caveat: as generally found in the wild, interpretation of the message from
producer to consumer relies on them both having prior knowledge of the
conventions of microformats.org - effectively a registry of keywords (though
only discoverable with manual intervention - Google etc)
- however where @profile URIs are provided, the consumer can "follow their
nose" to these other resources to discover the semantics intended by the
producer,
* Other languages are available (notably RDF, in this context especially
RDFa and microformats used in concert with GRDDL) where there is, thanks to
the 'follow your nose' discovery of URIs/HTTP, a more direct route to
machine-interpretability

In all these cases, at the end of the chain (of authority) there will be a
human element - the folks that designed the super-duper furniture ontology
may have their own world view that differs from those of others in the
furniture trade. They may simply have got stuff wrong. Fortunately use of
URIs allows potentially conflicting statements (in data, as in Web
documents) to coexist, and it's up to the consumer to apply their own
judgement on what to trust (based on provenance etc).

Now in the case of NLP, consumer-side heuristics will be applied to extract
something from text which *may* correspond to the producer's intended
message. So now not only do you have issues of provenance/trust, there's
also the margin of error of the heuristics to be factored in.

Overall, this seems to be a situation with a range of communication
possibilities - from lo-fidelity tag soup markup up to generally unambiguous
hi-fidelity communication thanks to data expressed as microformats with
@profile URIs, or (more or less equivalently) using web data oriented
languages such as RDF.

Going back to the "extra semantics through linking" remark above, in
whichever of the above approaches the data is expressed and/or interpreted,
the value of that data can be significantly increased through using linked
data techniques. Yeah, I had to get that in.
http://en.wikipedia.org/wiki/Linked_Data

Bottom line is that the Web is a vastly broad church, and ideally we should
maximising the benefit from all these approaches in as interoperable fasion
as possible - something like the old "think global act local" slogan.

Cheers,
Danny.


-- 
http://dannyayers.com
~
http://blogs.talis.com/nodalities/this_weeks_semantic_web/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080629/66f652a7/attachment-0001.html
From norm at cackhanded.net  Mon Jun 30 01:58:09 2008
From: norm at cackhanded.net (Mark Norman Francis)
Date: Mon Jun 30 01:58:14 2008
Subject: [uf-dev] Human and machine readable data format
In-Reply-To: <36A319113CF910438942741C4727ADFF02132AFF@MOBY.Clarence.local>
References: <36A319113CF910438942741C4727ADFF02132AFF@MOBY.Clarence.local>
Message-ID: <DC4B02B4-7DE9-4B61-9312-2F3ED7A2ADAA@cackhanded.net>

On 29 Jun 2008, at 15:17, Glenn Jones wrote:
> <http://ufxtract.com/experimental/hm-readable-date.htm>


Glenn, in this page you state:

> MUST format must follow the pattern order i.e.in English  date,  
> month year, time, timezone

That is an internationalisation no-no. It's not longer "human  
readable" if the language is question demands a different order and  
you break it for the sake of easier machine parsing.

-- Norm.

From glenn.jones at madgex.com  Mon Jun 30 02:37:24 2008
From: glenn.jones at madgex.com (Glenn Jones)
Date: Mon Jun 30 02:37:29 2008
Subject: [uf-dev] Human and machine readable data format
In-Reply-To: <DC4B02B4-7DE9-4B61-9312-2F3ED7A2ADAA@cackhanded.net>
References: <36A319113CF910438942741C4727ADFF02132AFF@MOBY.Clarence.local>
	<DC4B02B4-7DE9-4B61-9312-2F3ED7A2ADAA@cackhanded.net>
Message-ID: <36A319113CF910438942741C4727ADFF02132BD8@MOBY.Clarence.local>

On 29 Jun 2008, Norm wrote:
>That is an internationalisation no-no. It's not longer "human  
>readable" if the language is question demands a different order and  
>you break it for the sake of easier machine parsing.

That's only an example for English. At the bottom of the page

http://ufxtract.com/experimental/hm-readable-date.htm

you will find a language descriptions which will be need to configure
parser's. There is a pattern property which allow for different orders.
The Simplified Chinese data format has a different order to the others. 

The idea is that the parsers read the lang attribute on the abbr and
applies the correct language description. It will be a pain to build up
all the international descriptions needed, but it's the only way if we
wish to have human readable date's that can be parsed by machines.

Glenn  


-----Original Message-----
From: microformats-dev-bounces@microformats.org
[mailto:microformats-dev-bounces@microformats.org] On Behalf Of Mark
Norman Francis
Sent: 30 June 2008 09:58
To: A list for people developing tools with microformats.
Subject: Re: [uf-dev] Human and machine readable data format

On 29 Jun 2008, at 15:17, Glenn Jones wrote:
> <http://ufxtract.com/experimental/hm-readable-date.htm>


Glenn, in this page you state:

> MUST format must follow the pattern order i.e.in English  date,  
> month year, time, timezone

That is an internationalisation no-no. It's not longer "human  
readable" if the language is question demands a different order and  
you break it for the sake of easier machine parsing.

-- Norm.

_______________________________________________
microformats-dev mailing list
microformats-dev@microformats.org
http://microformats.org/mailman/listinfo/microformats-dev

From danbri at danbri.org  Mon Jun 30 02:52:29 2008
From: danbri at danbri.org (Dan Brickley)
Date: Mon Jun 30 02:52:34 2008
Subject: [uf-dev] Human and machine readable data format
In-Reply-To: <36A319113CF910438942741C4727ADFF02132BD8@MOBY.Clarence.local>
References: <36A319113CF910438942741C4727ADFF02132AFF@MOBY.Clarence.local>	<DC4B02B4-7DE9-4B61-9312-2F3ED7A2ADAA@cackhanded.net>
	<36A319113CF910438942741C4727ADFF02132BD8@MOBY.Clarence.local>
Message-ID: <4868ACDD.6090102@danbri.org>

Glenn Jones wrote:
> On 29 Jun 2008, Norm wrote:
>> That is an internationalisation no-no. It's not longer "human  
>> readable" if the language is question demands a different order and  
>> you break it for the sake of easier machine parsing.
> 
> That's only an example for English. At the bottom of the page
> 
> http://ufxtract.com/experimental/hm-readable-date.htm
> 
> you will find a language descriptions which will be need to configure
> parser's. There is a pattern property which allow for different orders.
> The Simplified Chinese data format has a different order to the others. 
> 
> The idea is that the parsers read the lang attribute on the abbr and
> applies the correct language description. It will be a pain to build up
> all the international descriptions needed, but it's the only way if we
> wish to have human readable date's that can be parsed by machines.

That's a very interesting approach. But do you think this can reasonably 
extend to use of other calendars, rather than just other scripts / 
natural languages?

cheers,

Dan

--
http://danbri.org/
From norm at cackhanded.net  Mon Jun 30 03:04:32 2008
From: norm at cackhanded.net (Mark Norman Francis)
Date: Mon Jun 30 03:04:36 2008
Subject: [uf-dev] Human and machine readable data format
In-Reply-To: <36A319113CF910438942741C4727ADFF02132BD8@MOBY.Clarence.local>
References: <36A319113CF910438942741C4727ADFF02132AFF@MOBY.Clarence.local>
	<DC4B02B4-7DE9-4B61-9312-2F3ED7A2ADAA@cackhanded.net>
	<36A319113CF910438942741C4727ADFF02132BD8@MOBY.Clarence.local>
Message-ID: <5C71CA7E-7E22-493E-8C3D-AC0E23AA10A3@cackhanded.net>

> The idea is that the parsers read the lang attribute on the abbr and
> applies the correct language description. It will be a pain to build  
> up
> all the international descriptions needed, but it's the only way if we
> wish to have human readable date's that can be parsed by machines.


Ah, missed that. My apologies, then.

-- Norm.

From mdagn at spraci.com  Mon Jun 30 03:12:25 2008
From: mdagn at spraci.com (Michael MD)
Date: Mon Jun 30 03:12:28 2008
Subject: [uf-dev] Human and machine readable data format
References: <36A319113CF910438942741C4727ADFF02132AFF@MOBY.Clarence.local><DC4B02B4-7DE9-4B61-9312-2F3ED7A2ADAA@cackhanded.net>
	<36A319113CF910438942741C4727ADFF02132BD8@MOBY.Clarence.local>
Message-ID: <004801c8da99$cb631960$116bacca@COMCEN>

> The idea is that the parsers read the lang attribute on the abbr and
> applies the correct language description. It will be a pain to build up
> all the international descriptions needed, but it's the only way if we
> wish to have human readable date's that can be parsed by machines.
>

and what do we do about people who write something like "25th January" when 
they really mean "25th January 2008" ?

I think we have opened a nasty can of worms here!

Some libraries for parsing dates will assume that it is this year.... which 
is VERY bad
... it should be rejected as being ambiguous.


From danbri at danbri.org  Mon Jun 30 03:24:04 2008
From: danbri at danbri.org (Dan Brickley)
Date: Mon Jun 30 03:24:08 2008
Subject: [uf-dev] Human and machine readable data format
In-Reply-To: <004801c8da99$cb631960$116bacca@COMCEN>
References: <36A319113CF910438942741C4727ADFF02132AFF@MOBY.Clarence.local><DC4B02B4-7DE9-4B61-9312-2F3ED7A2ADAA@cackhanded.net>	<36A319113CF910438942741C4727ADFF02132BD8@MOBY.Clarence.local>
	<004801c8da99$cb631960$116bacca@COMCEN>
Message-ID: <4868B444.8030102@danbri.org>

Michael MD wrote:
>> The idea is that the parsers read the lang attribute on the abbr and
>> applies the correct language description. It will be a pain to build up
>> all the international descriptions needed, but it's the only way if we
>> wish to have human readable date's that can be parsed by machines.
>>
> 
> and what do we do about people who write something like "25th January" 
> when they really mean "25th January 2008" ?
> 
> I think we have opened a nasty can of worms here!
> 
> Some libraries for parsing dates will assume that it is this year.... 
> which is VERY bad
> ... it should be rejected as being ambiguous.

Similarly, times of day without specifying a reference timezone...

cheers,

Dan

--
http://danbri.org/
From glenn.jones at madgex.com  Mon Jun 30 03:47:51 2008
From: glenn.jones at madgex.com (Glenn Jones)
Date: Mon Jun 30 03:47:57 2008
Subject: [uf-dev] Human and machine readable data format
In-Reply-To: <004801c8da99$cb631960$116bacca@COMCEN>
References: <36A319113CF910438942741C4727ADFF02132AFF@MOBY.Clarence.local><DC4B02B4-7DE9-4B61-9312-2F3ED7A2ADAA@cackhanded.net><36A319113CF910438942741C4727ADFF02132BD8@MOBY.Clarence.local>
	<004801c8da99$cb631960$116bacca@COMCEN>
Message-ID: <36A319113CF910438942741C4727ADFF02132C71@MOBY.Clarence.local>

What I was suggesting is that the date in the title of abbr be in a
fixed format, but also human readable. The text of the abbr tag could
any format the author wanted.

If a date in the did not comply to the fix format in any way it would be
completely rejected by the parser  

The format I suggested, has enough data to not be ambiguous. 
Date: 25 January 2008
Date: 25 January 2008 at 15:30 
Date: 25 January 2008 at 15:30, Time zone +1:30

Under this pattern if someone created the following 
<abbr class="dtstart" title="Date: 25 January 2008">Jan 25 08</abbr>
<abbr class="dtstart" title="Date: 25 January 2008">Two weeks
Monday</abbr>
they would be converted into 2008-01-25

If they got the format wrong in the title attribute
<abbr class="dtstart" title="Date: 25th January 2008">Jan 25 08</abbr>
It would be rejected.

We would have internationalise the scheme so 
<abbr lang="fr" class="dtstart" title="Date: 25 janvier 2008">Jan 25
08</abbr>
Would also parse correctly 

What I am suggesting is exchanging the title attribute from ISO format
to a human readable format, not freeform text.


Glenn


-----Original Message-----
From: microformats-dev-bounces@microformats.org
[mailto:microformats-dev-bounces@microformats.org] On Behalf Of Michael
MD
Sent: 30 June 2008 11:12
To: A list for people developing tools with microformats.
Subject: Re: [uf-dev] Human and machine readable data format

> The idea is that the parsers read the lang attribute on the abbr and
> applies the correct language description. It will be a pain to build
up
> all the international descriptions needed, but it's the only way if we
> wish to have human readable date's that can be parsed by machines.
>

and what do we do about people who write something like "25th January"
when 
they really mean "25th January 2008" ?

I think we have opened a nasty can of worms here!

Some libraries for parsing dates will assume that it is this year....
which 
is VERY bad
... it should be rejected as being ambiguous.


_______________________________________________
microformats-dev mailing list
microformats-dev@microformats.org
http://microformats.org/mailman/listinfo/microformats-dev

From gulopine at gamemusic.org  Mon Jun 30 07:28:27 2008
From: gulopine at gamemusic.org (Marty Alchin)
Date: Mon Jun 30 07:28:30 2008
Subject: [uf-dev] A sensible alternative for representing dates
Message-ID: <7e8d40920806300728k3456aaa1ubd86b8ffae7569d5@mail.gmail.com>

This is my first foray into the microformats community, so I apologize
if I'm missing some necessary past history on this topic. I'm sure
it's been discussed before, I know it's being discussed now, and I'd
just like to add another option to the discussion.

Also, yes I realize that by using the word "sensible" in the subject
of this email, I'm introducing a likelihood of wild tangents regarding
the subjectivity of such a word. I'll just try to stem it by saying
that yes, I do realize it's subjective, and it's my opinion that what
I'm proposing is sensible. Enough said.

Since the BBC announcement, I keep seeing discussions about how to
make the abbr's title attribute more accessible, and I keep wondering,
why are we so stuck on using abbr at all? I read the justification for
it, and it makes sense, but it's hardly the only way to go, so I'd
like to take a different approach and see what you make of it.

Many sites I've seen include daily archives, whether they be of
events, blog posts, new links, whatever. Pages including lists of such
events, or just the detail of a single event, will usually link to
that daily archive. The key is that that URL for the daily archive is
typically in just one of the two following formats:

* /2008/06/30/
* /2008/jun/30/

Call me crazy, but that looks as much like a machine-readable date
format as any I've ever seen. Better yet, the first form is completely
internationalized already, so it doesn't rely on NLP or anything. The
second form is also common, though, so it seems like an allowable
alternative (but maybe that's just because I use that form myself).
Links of this form could either use a class, as is currently done for
the abbr tag, or use something like rel="date", since it's fairly
similar to the rel-tag format. Also like rel-tag, it would look at the
*end* of the URL only. If someone had a link like
/weblog/2008/jun/30/, that would work just the same as
/corporate/public/events/2008/06/30/.

If microformat dates used URLs in links, rather than the titles of
abbreviations, the data would be just as visible as the rel-tag
pattern, wouldn't have "machine data" presented to users as standard
content, allows for flexible human-readable presentation (since it
could just be ignored), and has the added benefit of encouraging daily
archives on those sites that might not currently implement them. (And
yes, I realize that the "benefit" of daily archives is debatable.
Please don't bother, since that debate is irrelevant to this
discussion.)

Of course, that still leaves the issue of time, but times are much
easier to parse automatically than dates, as data formats are far
fewer and much more recognizable. This is the one area of the link's
content that I'd suggest to be parsed, so that times become part of
the link. Essentially, there are two dominant formats for time:

* 13:23
* 1:23 pm

Given the need for internationalization, I'd suggest that a 24-hour
time be assumed, if no suffix is given. The "pm" we use in English is
common, but isn't necessarily so throughout the world, so it should be
an available alternative, but not the assumed standard. It should be
acceptable with or without periods, and perhaps it could even look
only at the first letter, so even just "a" or "p" could be allowable.
Of course, that means that anyone who publishes a 12-hour time without
a suffix will cause all of their events between noon and midnight to
be misinterpreted, but that could happen even without microformats.

In summary, I'd like to offer a couple options for displaying dates,
though I'm not sure which one is "best". Feel free to discuss, comment
on, improve, or outright deny them. Consider some possible
representations of the following: June 30, 2008 at 1:00 pm.

* <a class="dtstart" href="/2008/06/30/">June 30, 2008 at 1:00 pm</a>
* <a class="dtend" href="/events/2008/jun/30">13:00</a>

To add a bit more flexibility, I think it might be better still to
include a new rel-date pattern, which would specify the date as a URL,
as well as a class which defines the entire date/time combination.
Such an approach would allow for a more more usable markup structure,
such as:

<span class="dtstart"><a rel="date" href="/2008/jun/30/">June 30</a>,
from 1:00p</span> to <span class="dtend">2:00p</span>

If nested formats aren't allowed, I'll concede that, but I think
there's value in allowing the time to be separate from the link, since
the destination won't be tied to a particular time, but rather just
the day. Also, note that in the event that a dtend doesn't specify a
date, it should be assumed to be the same as dtstart, which must
always specify a date.

Also, before I get accused of not thinking about it, I don't know yet
how would be best to deal with time zones. On one hand, I think they
could be parsed as part of the time format above, but I don't know if
it's any more accessible to include "-0500" in the content, or if the
names or abbreviations of time zones are standardized enough
(especially internationally) to work well. It's also possible to store
just the time zone as an offset in some meta tag on the page itself,
providing a hint for microformat parsers on how to process times
within that page. I have no suggestion on that issue, but I do
acknowledge that it's left undefined at the moment. Suggestions are
welcome.

-Gul
From guillaume at lebleu.org  Mon Jun 30 07:37:37 2008
From: guillaume at lebleu.org (Guillaume Lebleu)
Date: Mon Jun 30 07:37:41 2008
Subject: [uf-dev] Human and machine readable data format
In-Reply-To: <36A319113CF910438942741C4727ADFF02132C71@MOBY.Clarence.local>
References: <36A319113CF910438942741C4727ADFF02132AFF@MOBY.Clarence.local><DC4B02B4-7DE9-4B61-9312-2F3ED7A2ADAA@cackhanded.net><36A319113CF910438942741C4727ADFF02132BD8@MOBY.Clarence.local>	<004801c8da99$cb631960$116bacca@COMCEN>
	<36A319113CF910438942741C4727ADFF02132C71@MOBY.Clarence.local>
Message-ID: <4868EFB1.7010807@lebleu.org>

Glenn Jones wrote:
> What I was suggesting is that the date in the title of abbr be in a
> fixed format, but also human readable. The text of the abbr tag could
> any format the author wanted.
>
> If a date in the did not comply to the fix format in any way it would be
> completely rejected by the parser  
>
> The format I suggested, has enough data to not be ambiguous. 
> Date: 25 January 2008
> Date: 25 January 2008 at 15:30 
> Date: 25 January 2008 at 15:30, Time zone +1:30
That looks to me like a possible good compromise. A couple questions:

    * What is the purpose of "Date:". Couldn't this be moved to the
      class attribute? or in the hCalendar context be inferred from
      class="dstart"?
    * What do you think of my earlier suggestion to base the human and
      machine-readable on official writing practices in each locale (ex.
      in en-us: "January 25, 2008")
    * What do you think of the idea of making title optional if the
      date/time is already written in official writing practices in each
      locale.

Guillaume
From jaffathecake at gmail.com  Mon Jun 30 09:29:57 2008
From: jaffathecake at gmail.com (Jake Archibald)
Date: Mon Jun 30 09:30:00 2008
Subject: [uf-dev] A sensible alternative for representing dates
In-Reply-To: <7e8d40920806300728k3456aaa1ubd86b8ffae7569d5@mail.gmail.com>
References: <7e8d40920806300728k3456aaa1ubd86b8ffae7569d5@mail.gmail.com>
Message-ID: <3be0bf100806300929y57b8bd5cy4c9dcfd9a64a663c@mail.gmail.com>

2008/6/30 Marty Alchin <gulopine@gamemusic.org>:


> If microformat dates used URLs in links, rather than the titles of
> abbreviations, the data would be just as visible as the rel-tag
> pattern, wouldn't have "machine data" presented to users as standard
> content, allows for flexible human-readable presentation (since it
> could just be ignored)
> * <a class="dtstart" href="/2008/06/30/">June 30, 2008 at 1:00 pm</a>
> * <a class="dtend" href="/events/2008/jun/30">13:00</a>
>

The problem with this is it requires an anchor, and requires the author to
build a meaningful page at that address.

Jake.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080630/ec188412/attachment.html
From glenn.jones at madgex.com  Mon Jun 30 21:59:13 2008
From: glenn.jones at madgex.com (Glenn Jones)
Date: Mon Jun 30 21:59:21 2008
Subject: [uf-dev] Human and machine readable data format
In-Reply-To: <4868EFB1.7010807@lebleu.org>
References: <36A319113CF910438942741C4727ADFF02132AFF@MOBY.Clarence.local><DC4B02B4-7DE9-4B61-9312-2F3ED7A2ADAA@cackhanded.net><36A319113CF910438942741C4727ADFF02132BD8@MOBY.Clarence.local>	<004801c8da99$cb631960$116bacca@COMCEN><36A319113CF910438942741C4727ADFF02132C71@MOBY.Clarence.local>
	<4868EFB1.7010807@lebleu.org>
Message-ID: <36A319113CF910438942741C4727ADFF02132EAD@MOBY.Clarence.local>

Guillaume Lebleu wrote

>    * What is the purpose of "Date:". Couldn't this be moved to the
>      class attribute? or in the hCalendar context be inferred from
>      class="dstart"?

I added a prefix which describes the data type, in this case "Date:" to
help the parser developers test the format by using a string sartsWith
functions. Whatever solution we come up with ISO dates will most likely
be kept for backwards compatibility. It may be possible to drop the
prefix. The ISO duration is a hard format to test for without a prefix.

>    * What do you think of my earlier suggestion to base the human and
>      machine-readable on official writing practices in each locale
(ex.
>      in en-us: "January 25, 2008")

The language descriptions that I suggested is flexible enough to allow
for language and culture/locale differences.

I.e. we could use a British format "25 January 2008"
{
"language-name" : "English", 
"language-codes" : ["en-gb"], 
""pattern": "date,month,year,time,timezone",  
"scrub-terms": ["Date:", "at", "," "Time zone"],
"month-names": ["January", "February", "March", "April", "May", "June",
"July", "August", "September", "October", "November", "December"]
}

I.e. we could use a US format "January 25, 2008"
{
"language-name" : "English", 
"language-codes" : ["en-us"], 
""pattern": "month,date,year,time,timezone",  
"scrub-terms": ["Date:", "at", "," "Time zone"],
"month-names": ["January", "February", "March", "April", "May", "June",
"July", "August", "September", "October", "November", "December"]
}

The pattern property allows for a reordering of the elements. Working
out the fall back to just language code "en" would be fun.  

>    * What do you think of the idea of making title optional if the
>      date/time is already written in official writing practices in
each
>      locale.

That rule already exists as part of how the parsers work today.

<span class="dtstart">2008-01-25</span>

The above is valid, this would natural be extended to any new format.

Glenn