From mkaply at us.ibm.com  Thu May  1 10:15:17 2008
From: mkaply at us.ibm.com (Michael Kaply)
Date: Thu May  1 10:26:00 2008
Subject: [uf-dev] Include-Pattern Infinite Loop Test Cases
In-Reply-To: <1005d65f0804291257x35022f49vdf96a4499796bfc7@mail.gmail.com>
Message-ID: <OF2FCCB0C2.ADBDC102-ON8625743C.005E99CF-8625743C.005EC889@us.ibm.com>

Skipped content of type multipart/alternative-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
Url : http://microformats.org/discuss/mail/microformats-dev/attachments/20080501/9e72020e/graycol.gif
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pic10152.gif
Type: image/gif
Size: 1255 bytes
Desc: not available
Url : http://microformats.org/discuss/mail/microformats-dev/attachments/20080501/9e72020e/pic10152.gif
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
Url : http://microformats.org/discuss/mail/microformats-dev/attachments/20080501/9e72020e/ecblank.gif
From zack.carter at gmail.com  Thu May  1 22:08:12 2008
From: zack.carter at gmail.com (Zachary Carter)
Date: Thu May  1 22:08:16 2008
Subject: [uf-dev] Preventing false positives
Message-ID: <f56dbfbd0805012208j2f921b42kdc300ff074f490a@mail.gmail.com>

I was trying to write a monkeyformat[1] for Facebook but there are
many false positives within profiles. So I have two questions: 1) is
there a way to ignore an entire element and its descendants from being
parsed? 2) Is there a way to have the parser ignore all class names on
an element? (as if the class names were removed from the element prior
to parsing)

Thanks.

[1]http://userscripts.org/scripts/search?q=monkeyformats

-- 
Zach Carter
http://zachcarter.info
From mail at tobyinkster.co.uk  Fri May  2 01:25:10 2008
From: mail at tobyinkster.co.uk (Toby A Inkster)
Date: Fri May  2 01:25:18 2008
Subject: [uf-dev] Preventing false positives
Message-ID: <AE8C8FB2-0A57-49CD-AFC6-9348E22E0D07@tobyinkster.co.uk>

Zachary Carter wrote:

> So I have two questions: 1) is
> there a way to ignore an entire element and its descendants from being
> parsed?

Not that I know of. I suppose that putting the content into an IFRAME  
instead of on the main page ought to do it, but it's an ugly  
solution; and because it's not an officially sanctioned method for  
hiding content from parsers, you have no guarantee that future  
parsers will not start parsing within IFRAMEs.

> 2) Is there a way to have the parser ignore all class names on
> an element? (as if the class names were removed from the element prior
> to parsing)

The MFO effort <http://microformats.org/wiki/mfo> is an attempt to do  
something like this. The list of parsers that actually support MFO is  
pretty short though.

Cognition <http://buzzword.org.uk/cognition/> does support MFO. I  
mention this because the technique it uses is close to what you  
describe. When it parses a microformat, it takes a *clone* of the  
element and its children (so as not to damage the original DOM tree),  
then tries to parse embedded microformats -- e.g. "adr", "geo" and  
"agent vcard" within a "vcard".

I'll break off the parsing procedure here for a little terminology: I  
make a distinction between "embedded microformats" which are those  
that imply a special meaning by being nested within each other; and  
"nested microformats" which are those that are nested within each  
other by mere co-incidence, or perhaps to convey some kind of  
undefined relationship between the objects (e.g. an hCard could be  
nested within a geo -- perhaps the author meant to convey that the  
person represented by the hCard lives at that location, but this type  
of nesting is not defined in the specs)

Anyway, after parsing *embedded* microformats, Cognition searches for  
*nested* microformats. It uses a list of all known root element  
classes (e.g. "hatom", "hresume", "hlisting", "vcalendar") --  
including the class names for microformats which Cognition does not  
yet support. It also includes the class name "mfo".

Now, if it finds any of these nested microformats, it reaches within  
them and tampers with every descendent element, setting the "rel",  
"rev" and "class" attributes to the empty string. Remember, that this  
is on a clone of the DOM. Thus these elements will be excluded from  
supplying any unintentional semantics to the outer microformat.

Let's look at an example:

	<div class="vcard">
	  <h1 class="fn n">
	    <span class="honorific-prefix">Dr.</span>
	    <span class="given-name">Marvin</span>
	    <span class="family-name">Candle</span>
	  </h1>
	  <p class="note">
	    <span class="mfo">
	      Worked for a company called
	      <b class="vcard">
	        <span class="fn org">The Hanzo Foundation</span>
	      </b>.
	    </span>
	  </p>
	</div>

Now, when we come to parse the outer hCard, the clone is reduced to  
the following using MFO:

	<div class="vcard">
	  <h1 class="fn n">
	    <span class="honorific-prefix">Dr.</span>
	    <span class="given-name">Marvin</span>
	    <span class="family-name">Candle</span>
	  </h1>
	  <p class="note">
	    <span class="mfo">
	      Worked for a company called
	      <b>
	        <span>The Hanzo Foundation</span>
	      </b>.
	    </span>
	  </p>
	</div>

And the following vCard may be produced:

BEGIN:VCARD
FN:Dr. Marvin Candle
N:Candle;Marvin;;Dr.
NOTE:Worked for a company called The Hanzo Foundation.
END:VCARD

Note that the full text of the note is included, but there is no  
"ORG" property in the vCard.

As it happens, because "vcard" is included in that big list of known  
microformats (remember? "hatom", "hresume", "hlisting",  
"vcalendar"...), the same effect would have happened even if we  
hadn't included <span class="mfo"> -- but the MFO class is still  
useful because new microformats could arise at some point in the  
future which are not on that list.

It is also worth noting that while this MFO step masks the properties  
of the inner hCard from the outer hCard, the inner hCard will still  
be parsed as a later step, resulting in a second vCard:

BEGIN:VCARD
FN:The Hanzo Foundation
ORG:The Hanzo Foundation
END:VCARD

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>


-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>


From zack.carter at gmail.com  Fri May  2 15:47:07 2008
From: zack.carter at gmail.com (Zachary Carter)
Date: Fri May  2 15:47:10 2008
Subject: [uf-dev] Preventing false positives
In-Reply-To: <AE8C8FB2-0A57-49CD-AFC6-9348E22E0D07@tobyinkster.co.uk>
References: <AE8C8FB2-0A57-49CD-AFC6-9348E22E0D07@tobyinkster.co.uk>
Message-ID: <f56dbfbd0805021547x4d6674f1pab4976f4dd12f33f@mail.gmail.com>

The mfo approach is interesting, and would probably be the ideal type
of approach. The handling by Cognition would be identical to mfo
except the classes aren't added back at a later step. For the second
situation, where the descendants are still parsed as belonging to the
scope of the uF but not the element, it would remove class/rev/rel
from just the element it's placed on.

To help elaborate my situation:

       <div class="vcard">
         <h1 class="fn n">
           <span class="honorific-prefix">Dr.</span>
           <span class="given-name">Marvin</span>
           <span class="family-name">Candle</span>
         </h1>
         <p>
           <span class="label">Website:</span> <a
href="http://example.org" class="url">http://example.org</a>
         </p>
         <h2 class="title">Applications</h2>
         <p class="applications">
           [... third party content ...]
         </p>
       </div>

Title and label classes are not being used as hcard properties, so I
would want to exclude them. The third party application area I would
want to ignore completely (placing it in an iframe would likely break
lots of functionality.) Are there any plans (or should there be) to
support something like this?

Alternatively, is it possible to assign content distributed on the
page as belonging to a single microformat?

On Fri, May 2, 2008 at 4:25 AM, Toby A Inkster <mail@tobyinkster.co.uk> wrote:
> Zachary Carter wrote:
>
>
> > So I have two questions: 1) is
> > there a way to ignore an entire element and its descendants from being
> > parsed?
> >
>
>  Not that I know of. I suppose that putting the content into an IFRAME
> instead of on the main page ought to do it, but it's an ugly solution; and
> because it's not an officially sanctioned method for hiding content from
> parsers, you have no guarantee that future parsers will not start parsing
> within IFRAMEs.
>
>
> > 2) Is there a way to have the parser ignore all class names on
> > an element? (as if the class names were removed from the element prior
> > to parsing)
> >
>
>  The MFO effort <http://microformats.org/wiki/mfo> is an attempt to do
> something like this. The list of parsers that actually support MFO is pretty
> short though.
>
>  Cognition <http://buzzword.org.uk/cognition/> does support MFO. I mention
> this because the technique it uses is close to what you describe. When it
> parses a microformat, it takes a *clone* of the element and its children (so
> as not to damage the original DOM tree), then tries to parse embedded
> microformats -- e.g. "adr", "geo" and "agent vcard" within a "vcard".
>
>  I'll break off the parsing procedure here for a little terminology: I make
> a distinction between "embedded microformats" which are those that imply a
> special meaning by being nested within each other; and "nested microformats"
> which are those that are nested within each other by mere co-incidence, or
> perhaps to convey some kind of undefined relationship between the objects
> (e.g. an hCard could be nested within a geo -- perhaps the author meant to
> convey that the person represented by the hCard lives at that location, but
> this type of nesting is not defined in the specs)
>
>  Anyway, after parsing *embedded* microformats, Cognition searches for
> *nested* microformats. It uses a list of all known root element classes
> (e.g. "hatom", "hresume", "hlisting", "vcalendar") -- including the class
> names for microformats which Cognition does not yet support. It also
> includes the class name "mfo".
>
>  Now, if it finds any of these nested microformats, it reaches within them
> and tampers with every descendent element, setting the "rel", "rev" and
> "class" attributes to the empty string. Remember, that this is on a clone of
> the DOM. Thus these elements will be excluded from supplying any
> unintentional semantics to the outer microformat.
>
>  Let's look at an example:
>
>         <div class="vcard">
>           <h1 class="fn n">
>             <span class="honorific-prefix">Dr.</span>
>             <span class="given-name">Marvin</span>
>             <span class="family-name">Candle</span>
>           </h1>
>           <p class="note">
>             <span class="mfo">
>               Worked for a company called
>               <b class="vcard">
>                 <span class="fn org">The Hanzo Foundation</span>
>               </b>.
>             </span>
>           </p>
>         </div>
>
>  Now, when we come to parse the outer hCard, the clone is reduced to the
> following using MFO:
>
>         <div class="vcard">
>           <h1 class="fn n">
>             <span class="honorific-prefix">Dr.</span>
>             <span class="given-name">Marvin</span>
>             <span class="family-name">Candle</span>
>           </h1>
>           <p class="note">
>             <span class="mfo">
>               Worked for a company called
>               <b>
>                 <span>The Hanzo Foundation</span>
>               </b>.
>             </span>
>           </p>
>         </div>
>
>  And the following vCard may be produced:
>
>  BEGIN:VCARD
>  FN:Dr. Marvin Candle
>  N:Candle;Marvin;;Dr.
>  NOTE:Worked for a company called The Hanzo Foundation.
>  END:VCARD
>
>  Note that the full text of the note is included, but there is no "ORG"
> property in the vCard.
>
>  As it happens, because "vcard" is included in that big list of known
> microformats (remember? "hatom", "hresume", "hlisting", "vcalendar"...), the
> same effect would have happened even if we hadn't included <span
> class="mfo"> -- but the MFO class is still useful because new microformats
> could arise at some point in the future which are not on that list.
>
>  It is also worth noting that while this MFO step masks the properties of
> the inner hCard from the outer hCard, the inner hCard will still be parsed
> as a later step, resulting in a second vCard:
>
>  BEGIN:VCARD
>  FN:The Hanzo Foundation
>  ORG:The Hanzo Foundation
>  END:VCARD
>
>  --
>  Toby A Inkster
>  <mailto:mail@tobyinkster.co.uk>
>  <http://tobyinkster.co.uk>
>
>
>
>
>
>  --
>  Toby A Inkster
>  <mailto:mail@tobyinkster.co.uk>
>  <http://tobyinkster.co.uk>
>
>
>
>  _______________________________________________
>  microformats-dev mailing list
>  microformats-dev@microformats.org
>  http://microformats.org/mailman/listinfo/microformats-dev
>


-- 
Zach Carter
http://zachcarter.info
From mail at tobyinkster.co.uk  Sat May  3 02:15:04 2008
From: mail at tobyinkster.co.uk (Toby A Inkster)
Date: Sat May  3 02:15:14 2008
Subject: [uf-dev] Preventing false positives
Message-ID: <C28BF658-935C-4683-A5BD-ED6413FB759C@tobyinkster.co.uk>

Zachary Carter wrote:

> To help elaborate my situation:
>
>        <div class="vcard">
>          <h1 class="fn n">
>            <span class="honorific-prefix">Dr.</span>
>            <span class="given-name">Marvin</span>
>            <span class="family-name">Candle</span>
>          </h1>
>          <p>
>            <span class="label">Website:</span> <a
> href="http://example.org" class="url">http://example.org</a>
>          </p>
>          <h2 class="title">Applications</h2>
>          <p class="applications">
>            [... third party content ...]
>          </p>
>        </div>
>
> Title and label classes are not being used as hcard properties, so I
> would want to exclude them.

Well, TITLE is a singular property, so it should be easy to force  
microformat parsers to ignore your title class -- simply include a  
blank title:

	<span class="title" style="display:none"></span>

earlier on in the vCard (before your <h2> element). Parsers should  
just pick up the first title and ignore the later one. LABEL is a  
plural property, so this approach will not work for that.

MFO as implemented by Cognition (and I emphasise that the MFO effort  
is still in the brainstorming stage, so the final MFO spec, if any,  
may be completely different) can be used to provide a solution for  
both the TITLE and LABEL:

    <div class="vcard">
      <h1 class="fn n">
        <span class="honorific-prefix">Dr.</span>
        <span class="given-name">Marvin</span>
        <span class="family-name">Candle</span>
      </h1>
      <p>
        <span class="mfo"><span class="label">Website:</span></span>
        <a href="http://example.org" class="url">http://example.org</a>
      </p>
      <div class="mfo">
        <h2 class="title">Applications</h2>
        <p class="applications">
          [... third party content ...]
        </p>
      </div>
    </div>

Of course the most obvious solution is simply:

    <div class="vcard">
      <h1 class="fn n">
        <span class="honorific-prefix">Dr.</span>
        <span class="given-name">Marvin</span>
        <span class="family-name">Candle</span>
      </h1>
      <a href="http://example.org" class="url" style="display:none"></a>
    </div>
    <p>
      <span class="label">Website:</span>
      <a href="http://example.org">http://example.org</a>
    </p>
    <h2 class="title">Applications</h2>
    <p class="applications">
      [... third party content ...]
    </p>

Which should work in all present-day parsers.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>


From mail at tobyinkster.co.uk  Sat May  3 02:45:23 2008
From: mail at tobyinkster.co.uk (Toby A Inkster)
Date: Sat May  3 02:45:28 2008
Subject: [uf-dev] Preventing false positives
In-Reply-To: <C28BF658-935C-4683-A5BD-ED6413FB759C@tobyinkster.co.uk>
References: <C28BF658-935C-4683-A5BD-ED6413FB759C@tobyinkster.co.uk>
Message-ID: <5128F867-8393-4E50-AF9E-0A3BA5815ECF@tobyinkster.co.uk>

On 3 May 2008, at 10:15, Toby A Inkster wrote:

> Well, TITLE is a singular property


Wrong, I was. TITLE is plural. But singular properties do exist (e.g.  
'fn', 'class', 'bday'), so the technique outined may still be of some  
use for those.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>


From danbri at danbri.org  Sat May  3 07:13:43 2008
From: danbri at danbri.org (Dan Brickley)
Date: Sat May  3 07:13:43 2008
Subject: [uf-dev] Preventing false positives
In-Reply-To: <5128F867-8393-4E50-AF9E-0A3BA5815ECF@tobyinkster.co.uk>
References: <C28BF658-935C-4683-A5BD-ED6413FB759C@tobyinkster.co.uk>
	<5128F867-8393-4E50-AF9E-0A3BA5815ECF@tobyinkster.co.uk>
Message-ID: <481C7317.7040307@danbri.org>

Toby A Inkster wrote:
> On 3 May 2008, at 10:15, Toby A Inkster wrote:
> 
>> Well, TITLE is a singular property
> 
> 
> Wrong, I was. TITLE is plural. But singular properties do exist (e.g. 
> 'fn', 'class', 'bday'), so the technique outined may still be of some 
> use for those.

Is 'singular property' accepted Microformat-community terminology? (or 
just an obvious/sensible phrase). Is there any machine-readable 
representation of which microformat properties are singular?

It seems roughly what RDF/OWL calls 'functional' property (eg. in FOAF, 
'birthday','gender', 'primaryTopic' are functional).

Is there a microformal word for the inverse of this concept: properties 
that have at most one proper value, for anything they apply to? In FOAF, 
examples of this (we call it an "Inverse Functional Property") include 
"homepage", "weblog", "openid", "tipjar", "jabberID", "mbox_sha1sum"...

cheers,

Dan

--
http://danbri.org/

From rff.rff at gmail.com  Sat May  3 09:26:21 2008
From: rff.rff at gmail.com (gabriele renzi)
Date: Sat May  3 09:26:23 2008
Subject: [uf-dev] Preventing false positives
In-Reply-To: <481C7317.7040307@danbri.org>
References: <C28BF658-935C-4683-A5BD-ED6413FB759C@tobyinkster.co.uk>
	<5128F867-8393-4E50-AF9E-0A3BA5815ECF@tobyinkster.co.uk>
	<481C7317.7040307@danbri.org>
Message-ID: <828083e70805030926v530ce142t2919ff911fe1ecc6@mail.gmail.com>

On Sat, May 3, 2008 at 3:13 PM, Dan Brickley <danbri@danbri.org> wrote:

>  Is 'singular property' accepted Microformat-community terminology? (or just
> an obvious/sensible phrase). Is there any machine-readable representation of
> which microformat properties are singular?

sorry to hijack the thread, but on the same line: anybody thought of a
simple/generic machine readable description of microformats ?
A simple mix of CSS/xpath/regex, for example

hcard :        .vcard, *
             #creates a namespace many allowed
  full_name: .vcard .fn/text(),  1
        #add full name to this namespace, exactly one
  email:        a.email/href or area.email/href or .email/text(), ?
#add email to this ns, checking various choices, zero or one


I'm writing a generic parser and It basically has this kind of
structure (i.e. fn = getRequired(root, '.fn', 'text()'), is there a
clear problem with this that I'm not seeing?

It would be a small improvement on the semiformal descriptions on the
wiki, where informations are a bit scattered around, for example
there is an hcard test for when .email is to be taken from the text
value of a node, but I could not find it explained on the hcard
parsing page, and it seem that this happened to other people[1].

Please excuse me if I sound dumb and talk about already discussed
things, but I'm still new to uFs.


[1]
http://www.w3.org/2006/vcard/hcard2rdf.xsl seems to miss it, for one
From zack.carter at gmail.com  Sat May  3 11:25:41 2008
From: zack.carter at gmail.com (Zachary Carter)
Date: Sat May  3 11:25:43 2008
Subject: [uf-dev] Preventing false positives
In-Reply-To: <828083e70805030926v530ce142t2919ff911fe1ecc6@mail.gmail.com>
References: <C28BF658-935C-4683-A5BD-ED6413FB759C@tobyinkster.co.uk>
	<5128F867-8393-4E50-AF9E-0A3BA5815ECF@tobyinkster.co.uk>
	<481C7317.7040307@danbri.org>
	<828083e70805030926v530ce142t2919ff911fe1ecc6@mail.gmail.com>
Message-ID: <f56dbfbd0805031125p382f8831w6be5cb38d403c579@mail.gmail.com>

There was discussion about a JSON representation of hCard
(http://microformats.org/wiki/jcard). I think that's what you're
looking for.

On Sat, May 3, 2008 at 12:26 PM, gabriele renzi <rff.rff@gmail.com> wrote:
> On Sat, May 3, 2008 at 3:13 PM, Dan Brickley <danbri@danbri.org> wrote:
>
>  >  Is 'singular property' accepted Microformat-community terminology? (or just
>  > an obvious/sensible phrase). Is there any machine-readable representation of
>  > which microformat properties are singular?
>
>  sorry to hijack the thread, but on the same line: anybody thought of a
>  simple/generic machine readable description of microformats ?
>  A simple mix of CSS/xpath/regex, for example
>
>  hcard :        .vcard, *
>              #creates a namespace many allowed
>   full_name: .vcard .fn/text(),  1
>         #add full name to this namespace, exactly one
>   email:        a.email/href or area.email/href or .email/text(), ?
>  #add email to this ns, checking various choices, zero or one
>
>
>  I'm writing a generic parser and It basically has this kind of
>  structure (i.e. fn = getRequired(root, '.fn', 'text()'), is there a
>  clear problem with this that I'm not seeing?
>
>  It would be a small improvement on the semiformal descriptions on the
>  wiki, where informations are a bit scattered around, for example
>  there is an hcard test for when .email is to be taken from the text
>  value of a node, but I could not find it explained on the hcard
>  parsing page, and it seem that this happened to other people[1].
>
>  Please excuse me if I sound dumb and talk about already discussed
>  things, but I'm still new to uFs.
>
>
>
>  [1]
>  http://www.w3.org/2006/vcard/hcard2rdf.xsl seems to miss it, for one
>
>
> _______________________________________________
>  microformats-dev mailing list
>  microformats-dev@microformats.org
>  http://microformats.org/mailman/listinfo/microformats-dev
>


-- 
Zach Carter
http://zachcarter.info
From mail at tobyinkster.co.uk  Sat May  3 13:50:24 2008
From: mail at tobyinkster.co.uk (Toby A Inkster)
Date: Sat May  3 13:50:47 2008
Subject: [uf-dev] Preventing false positives
Message-ID: <E14D25EA-E9B1-4039-8A95-D2B918AB67B4@tobyinkster.co.uk>

Dan Brickley wrote:

> Is 'singular property' accepted Microformat-community terminology? (or
> just an obvious/sensible phrase). Is there any machine-readable
> representation of which microformat properties are singular?

The terms used in the hCard spec are "singular property" and "plural  
property". A list of the singular properties for hCard can be found at:

http://microformats.org/wiki/hcard#Singular_vs._Plural_Properties

The hCard spec is rather casual about such matters -- it's just  
described in prose. Some of the newer microformats include property  
lists marked up as nested unordered HTML lists with Perl-like  
quantifiers, such as "{1}" = must occur exactly once; "*" = optional,  
may occur more than once; "+" = optional, may only occur once; etc.  
These could theoretically be parsed mechanically, but that wouldn't  
be enough to fully automate supporting new microformats, as there's  
still the matter of content models (e.g. should something be parsed  
as a link, or as a string).

> Is there a microformal word for the inverse of this concept:  
> properties
> that have at most one proper value, for anything they apply to? In  
> FOAF,
> examples of this (we call it an "Inverse Functional Property") include
> "homepage", "weblog", "openid", "tipjar", "jabberID",  
> "mbox_sha1sum"...


The UID property of hCard and hCalendar are in effect inverse  
functional properties. (Indeed my parser implements them as such. If  
two hCalendar events exist which share a UID, they'll be conflated  
into the same event in the output.)

PS: Dan, did you get my e-mail on 27 April? I sent it to your  
rdfweb.org address -- not sure if that's still valid?

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>


From rff.rff at gmail.com  Sun May  4 05:55:01 2008
From: rff.rff at gmail.com (gabriele renzi)
Date: Sun May  4 05:55:04 2008
Subject: [uf-dev] Preventing false positives
In-Reply-To: <f56dbfbd0805031125p382f8831w6be5cb38d403c579@mail.gmail.com>
References: <C28BF658-935C-4683-A5BD-ED6413FB759C@tobyinkster.co.uk>
	<5128F867-8393-4E50-AF9E-0A3BA5815ECF@tobyinkster.co.uk>
	<481C7317.7040307@danbri.org>
	<828083e70805030926v530ce142t2919ff911fe1ecc6@mail.gmail.com>
	<f56dbfbd0805031125p382f8831w6be5cb38d403c579@mail.gmail.com>
Message-ID: <828083e70805040555s1dfb18b1j7f02e726ea1b870@mail.gmail.com>

On Sat, May 3, 2008 at 7:25 PM, Zachary Carter <zack.carter@gmail.com> wrote:
> There was discussion about a JSON representation of hCard
>  (http://microformats.org/wiki/jcard). I think that's what you're
>  looking for.
>

I'm not sure, I think I failed to express my question.
This looks like a way to describe a microformatted object, through a
JSON serialization.

What I was thinking is a way to describe the microformat itself, as in
a DTD, or OWL ontology or BNF.
Nothing too fancy, just a little improvement on the already existing
informal "schema" definitions in the wiki to include some parsing
details.


Anyway, thanks for the anwer.
From julian_bond at voidstar.com  Mon May  5 00:12:08 2008
From: julian_bond at voidstar.com (Julian Bond)
Date: Mon May  5 01:16:21 2008
Subject: [uf-dev] Discovery of Microformatted documents
Message-ID: <j4uQyhjINrHIFAEW@jblaptop.voidstar.com>

We've been looking at ways of discovering microformatted documents. The 
requirement is to be able to say something like "My Profile page is at 
this URL". We've identified 3 likely candidates (which might all be the 
same page).

1) A page holding my personal profile. Probably containing hCard. 
Typically something like an AboutMe page on a blog.

2) A page holding a list of my external profiles. Marked up with XFN 
rel="me" The "YASN-Roll".

3) A page holding a list of my contacts. Marked up with XFN 
rel="contact" and the other contact types.

To make this work we need a URI that identifies the page types and/or a 
Media Type. But since this is all html/xhtml, the Media-Type is going to 
be the same in each case.

This page http://www.gmpg.org/xfn/join seems to suggest 
http://gmpg.org/xfn/11 as a relatively permanent URI to use for XFN but 
it wouldn't distinguish between cases 2 and 3.

This page http://microformats.org/wiki/profile-uris defines some URI 
candidates as well. But I think my requirement is at a slightly higher 
level.

Any thoughts on this?

-- 
Julian Bond  E&MSN: julian_bond at voidstar.com  M: +44 (0)77 5907 2173
Webmaster:          http://www.ecademy.com/      T: +44 (0)192 0412 433
Personal WebLog:    http://www.voidstar.com/     skype:julian.bond?chat
                       Serve At Room Temperature
From mail at tobyinkster.co.uk  Mon May  5 02:09:39 2008
From: mail at tobyinkster.co.uk (Toby A Inkster)
Date: Mon May  5 02:09:47 2008
Subject: [uf-dev] Discovery of Microformatted documents
Message-ID: <EF93943E-3896-4CE8-BE0F-32F01A5E038B@tobyinkster.co.uk>

Julian Bond wrote:

> We've been looking at ways of discovering microformatted documents.  
> The
> requirement is to be able to say something like "My Profile page is at
> this URL".

Probably something like:

	<a href="http://link.to.profile.invalid/me" type="text/html"
	rel="me meta">Link to my profile</a>

is the best way to link to a page which contains metadata about you.  
(rel=meta is formally defined in the XHTML 2 drafts, but has been  
used for years by the Dublin Core and FOAF communities to indicate a  
page which contains relevant metadata.)

> This page http://www.gmpg.org/xfn/join seems to suggest
> http://gmpg.org/xfn/11 as a relatively permanent URI to use for XFN  
> but
> it wouldn't distinguish between cases 2 and 3.
>
> This page http://microformats.org/wiki/profile-uris defines some URI
> candidates as well. But I think my requirement is at a slightly higher
> level.

The term "profile" used on those pages has nothing to do with  
"personal profiles". It refers to the (rarely used) "profile"  
attribute of the <head> element which is used to link to one or more  
documents that describe the way that you're using HTML. For example:

	<head profile="http://gpmp.org/xfn/11">

may be used to indicate that when you write 'rel="contact"', you are  
using the definition of 'rel="contact"' which can be found in XFN  
1.1, and not, say, the entirely different definition of  
'rel="contact"' which the current HTML 5 drafts use.

In terms of microformats, the <head profile> attribute can be thought  
of as a place to list which microformats you use on the page, so that  
parsers can distinguish between an intentional use of hCard and a co- 
incidental use of 'class="vcard"' by someone who's never even heard  
of hCard.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>


From julian_bond at voidstar.com  Mon May  5 14:22:31 2008
From: julian_bond at voidstar.com (Julian Bond)
Date: Mon May  5 14:23:16 2008
Subject: [uf-dev] Discovery of Microformatted documents
In-Reply-To: <EF93943E-3896-4CE8-BE0F-32F01A5E038B@tobyinkster.co.uk>
References: <EF93943E-3896-4CE8-BE0F-32F01A5E038B@tobyinkster.co.uk>
Message-ID: <PWp7JMrXq3HIFAjx@jblaptop.voidstar.com>

Toby A Inkster <mail@tobyinkster.co.uk> Mon, 5 May 2008 10:09:39
>Probably something like:
>
>       <a href="http://link.to.profile.invalid/me" type="text/html"
>       rel="me meta">Link to my profile</a>
>
>is the best way to link to a page which contains metadata about you. 
>(rel=meta is formally defined in the XHTML 2 drafts, but has been  used 
>for years by the Dublin Core and FOAF communities to indicate a  page 
>which contains relevant metadata.)

OK. We're trying to do something slightly different. The use case is to 
find relevant documents (in this case microformat marked up html) during 
the initial signup to a new site using openid. Openid has a discovery 
mechanism using XRDS files. These XRDS files seem like a good place to 
put links to things like:-
- The page with my profile on it (hcard)
- The page with a list of the other profiles I have (rel="me")
- The page with a list of all my contacts (rel="contact")

The XRDS format expects a URI to identify the type of service and a 
media type. If those documents are HTML then the media type is simple. 
Then it has a URL field for the location of the page described. But we 
would need URIs for those cases above.

The best URIs to use for Type seem to be
http://xmlns.com/foaf/spec/
http://gmpg.org/xfn/11#me
http://gmpg.org/xfn/11#contact
respectively. I'm suggesting using the anchors to distinguish between a 
page primarily about other profiles and one primarily about a list of 
contacts.

I don't remember seeing anything before about discovery of microformat 
documents. As you've described, you could put a <link rel="me meta" link 
into the page at the location of the human entered openid. We're not 
trying to prevent that but to also use the XRDS file that is already 
being read.

The key point here is that this is about discovery. We're not trying to 
identify a type of a document already being read. We're talking about a 
Type identifier on a document to be collected.

-- 
Julian Bond  E&MSN: julian_bond at voidstar.com  M: +44 (0)77 5907 2173
Webmaster:          http://www.ecademy.com/      T: +44 (0)192 0412 433
Personal WebLog:    http://www.voidstar.com/     skype:julian.bond?chat
                   50% Less Saturated Fat Than Butter
From contact at lumieredelune.com  Mon May  5 16:32:34 2008
From: contact at lumieredelune.com (=?US-ASCII?Q?Lumiere_de_Lune?=)
Date: Mon May  5 16:32:29 2008
Subject: [uf-dev] Problems with importation of a hcard as a vCard inOutlook
Message-ID: <034501c8af08$4c828340$6701a8c0@PARACOU>

For any kind of strange reason, I missed some messages, which I saw on the
archive now. 
Just wanted to thank those who answered, and told me it's a specific Outlook
problem. 
--
Marie-Aude Koiransky

www.lumieredelune.com 


From mail at tobyinkster.co.uk  Tue May  6 04:50:17 2008
From: mail at tobyinkster.co.uk (Toby Inkster)
Date: Tue May  6 04:50:23 2008
Subject: [uf-dev] hCard label type
Message-ID: <60769.81.2.120.180.1210074617.squirrel@goddamn.co.uk>

The vCard spec allows types (e.g. "home", "postal", etc) to be specified
for the LABEL property, but the hCard spec doesn't seem to allow this. The
hCard examples page on the wiki (RFC 2426 examples) does include a label
marked up with type+value, but that page is not considered normative.

Is this an oversight in the spec, or was a conscious decision made not to
allow types to be specified within labels? If the latter, what was the
reasoning?

Do any current parsers extend hCard and allow a type to be specified for
labels? (I'm considering adding this feature to Cognition.)

-- 
Toby Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>
From zack.carter at gmail.com  Thu May  8 12:53:07 2008
From: zack.carter at gmail.com (Zachary Carter)
Date: Thu May  8 12:53:20 2008
Subject: [uf-dev] Preventing false positives
In-Reply-To: <828083e70805040555s1dfb18b1j7f02e726ea1b870@mail.gmail.com>
References: <C28BF658-935C-4683-A5BD-ED6413FB759C@tobyinkster.co.uk>
	<5128F867-8393-4E50-AF9E-0A3BA5815ECF@tobyinkster.co.uk>
	<481C7317.7040307@danbri.org>
	<828083e70805030926v530ce142t2919ff911fe1ecc6@mail.gmail.com>
	<f56dbfbd0805031125p382f8831w6be5cb38d403c579@mail.gmail.com>
	<828083e70805040555s1dfb18b1j7f02e726ea1b870@mail.gmail.com>
Message-ID: <f56dbfbd0805081253k3628dea9y8417047f1a208c2f@mail.gmail.com>

Oops, I think you mean XMDP (http://gmpg.org/xmdp/) then. Toby has a
list of uF profiles on his site: http://buzzword.org.uk/profiles/

On Sun, May 4, 2008 at 8:55 AM, gabriele renzi <rff.rff@gmail.com> wrote:
> On Sat, May 3, 2008 at 7:25 PM, Zachary Carter <zack.carter@gmail.com> wrote:
>> There was discussion about a JSON representation of hCard
>>  (http://microformats.org/wiki/jcard). I think that's what you're
>>  looking for.
>>
>
> I'm not sure, I think I failed to express my question.
> This looks like a way to describe a microformatted object, through a
> JSON serialization.
>
> What I was thinking is a way to describe the microformat itself, as in
> a DTD, or OWL ontology or BNF.
> Nothing too fancy, just a little improvement on the already existing
> informal "schema" definitions in the wiki to include some parsing
> details.
>
>
> Anyway, thanks for the anwer.
> _______________________________________________
> microformats-dev mailing list
> microformats-dev@microformats.org
> http://microformats.org/mailman/listinfo/microformats-dev
>


-- 
Zach Carter
http://zachcarter.info
From rff.rff at gmail.com  Fri May  9 05:11:05 2008
From: rff.rff at gmail.com (gabriele renzi)
Date: Fri May  9 05:11:11 2008
Subject: [uf-dev] Testcase clarification
Message-ID: <828083e70805090511s53dd896cv4302c438d6d506a@mail.gmail.com>

Hi everyone,

I'm perusing the hCard test suite and it is a great source, thanks for it.
Yet there is something I'm not clear in the testcase 21-tel.

Basically, we have a lot of tel fields that use schemes different from
tel, such as
   <object class="tel" data="fax:+1.415.555.1239">call me</object>
or
   <a class="tel" href="modem:+1.415.555.1241">call me</a>

it seems that this should be extracted as simple "tel" poperties without type
  TEL:+1.415.555.1241
and that the type is only kept when it is made explicit with a "type"
subproperties.

Is this correct?
Why are we ignoring the explicit information in the uri scheme?

-- 
goto 10: http://www.goto10.it
blog it: http://riffraff.blogsome.com
blog en: http://www.riffraff.info
From julian.reschke at gmx.de  Fri May  9 05:31:33 2008
From: julian.reschke at gmx.de (Julian Reschke)
Date: Fri May  9 05:31:39 2008
Subject: [uf-dev] Test cases for hcard vs profile attribute
In-Reply-To: <828083e70805090511s53dd896cv4302c438d6d506a@mail.gmail.com>
References: <828083e70805090511s53dd896cv4302c438d6d506a@mail.gmail.com>
Message-ID: <48244425.8070503@gmx.de>

Hi,

I just checked, and it seems that the test cases (such as 
<http://microformats.org/tests/hcard/01-tantek-basic.html>) do not use 
the profile attribute, as recommended in 
<http://microformats.org/wiki/profile-uris>.

Bug? Documentation out of date?

BR, Julian
From bjonkman at sobac.com  Mon May 12 10:41:54 2008
From: bjonkman at sobac.com (Bob Jonkman)
Date: Mon May 12 10:43:31 2008
Subject: [uf-dev] Testcase clarification
In-Reply-To: <828083e70805090511s53dd896cv4302c438d6d506a@mail.gmail.com>
References: <828083e70805090511s53dd896cv4302c438d6d506a@mail.gmail.com>
Message-ID: <48284922.3848.DD6E7C3@bjonkman.sobac.com>

I'm pretty sure the modem:// and fax:// uri schemes have been 
deprecated.  tel:// is the only telephone uri scheme remaining, and 
call type (modem, fax) is left to be negotiated in-band by carrier 
recognition or out-of-band through SIP or similar.[1]

So, ignoring the TEL scheme in an hCard would be the correct behaviour, 
but an explicit TYPE indication should be preserved in the vCard.

--Bob.

[1] http://tools.ietf.org/html/rfc3966


>>> 9 May 2008 13:11  gabriele renzi <microformats-
dev@microformats.org>  >>>

> Hi everyone,
> 
> I'm perusing the hCard test suite and it is a great source, thanks for
> it. Yet there is something I'm not clear in the testcase 21-tel.
> 
> Basically, we have a lot of tel fields that use schemes different from
> tel, such as
>    <object class="tel" data="fax:+1.415.555.1239">call me</object> or
>    <a class="tel" href="modem:+1.415.555.1241">call me</a>
> 
> it seems that this should be extracted as simple "tel" poperties
> without type
>   TEL:+1.415.555.1241
> and that the type is only kept when it is made explicit with a "type"
> subproperties.
> 
> Is this correct?
> Why are we ignoring the explicit information in the uri scheme?
> 
> -- 
> goto 10: http://www.goto10.it
> blog it: http://riffraff.blogsome.com
> blog en: http://www.riffraff.info
> _______________________________________________
> microformats-dev mailing list
> microformats-dev@microformats.org
> http://microformats.org/mailman/listinfo/microformats-dev


-- -- -- --
Bob Jonkman <bjonkman@sobac.com>         http://sobac.com/sobac/    
SOBAC Microcomputer Services              Voice: +1-519-669-0388       
6 James Street, Elmira ON  Canada  N3B 1L5  Cel: +1-519-635-9413
Software   ---   Office & Business Automation   ---   Consulting


From rff.rff at gmail.com  Tue May 13 08:37:51 2008
From: rff.rff at gmail.com (gabriele renzi)
Date: Tue May 13 08:37:57 2008
Subject: [uf-dev] testcase for hcard TD referring to missing TH
Message-ID: <828083e70805130837h61c3f077y295fe625506f5f77@mail.gmail.com>

I don't know if this is interesting for anyone, but I added in my
local copy of the test uF/hCard test suite an additional test for  the
header case, namely:

  <td class="vcard" headers="not-a-real-header"><span class="fn">Jane
Doe</span></td>

should produce a vCard like

BEGIN:VCARD
PRODID:$PRODID$
SOURCE:$SOURCE$
NAME:32-header
VERSION:3.0
N;CHARSET=UTF-8:Doe;Jane;;;
FN;CHARSET=UTF-8:Jane Doe
END:VCARD


I think this should not happen in properly formatted pages, but as it
caused a bug to be revealed in my parser implementation I have
included it in my tests.
I'm sending this cause maybe it can be interesting for others, albeit
it seems that the uF test suite does not usually take care of invalid
formatting.

I'm not familiar with hg so I'm not sure if this is the correct way,
but I'm attaching the tiny patch formatted as an hg bundle.


-- 
goto 10: http://www.goto10.it
blog it: http://riffraff.blogsome.com
blog en: http://www.riffraff.info
-------------- next part --------------
A non-text attachment was scrubbed...
Name: missing-th.hg
Type: application/octet-stream
Size: 832 bytes
Desc: not available
Url : http://microformats.org/discuss/mail/microformats-dev/attachments/20080513/01f5f00b/missing-th.obj
From lee.jordan at gmail.com  Wed May 14 01:55:21 2008
From: lee.jordan at gmail.com (Lee Jordan)
Date: Wed May 14 01:55:24 2008
Subject: [uf-dev] SEO and abbr
Message-ID: <a2d985370805140155s4ec5ffe7t17b2512651f5e68f@mail.gmail.com>

Hi folks,

I've been adding more and more mF's to the sites that I work on
professionally and have come across a snag with the abbr-design pattern,
just wondering if anyone else has come across the following issue? If so how
it was resolved, I presume by not using abbr but using span instead. I've
used hcalendar to mark up some dates but in a search engines results page,
the title of the abbr tag has been included in the results decription text
which makes the date look messy.

It seems Google in particular indexes the title tag.
Just raising this as I've just noticed it so we can be aware of the issue.

Many Thanks
Lee

-- 
HTML | CSS | Javascript
http://www.leejordan.org.uk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080514/5b5b34c4/attachment.html
From brian.suda at gmail.com  Wed May 14 02:09:45 2008
From: brian.suda at gmail.com (Brian Suda)
Date: Wed May 14 02:09:48 2008
Subject: [uf-dev] SEO and abbr
In-Reply-To: <a2d985370805140155s4ec5ffe7t17b2512651f5e68f@mail.gmail.com>
References: <a2d985370805140155s4ec5ffe7t17b2512651f5e68f@mail.gmail.com>
Message-ID: <21e770780805140209l43128311nd75fb0ac6cbdec5d@mail.gmail.com>

2008/5/14, Lee Jordan <lee.jordan@gmail.com>:
> It seems Google in particular indexes the title tag.
> Just raising this as I've just noticed it so we can be aware of the issue.

thanks for the heads-up, can you start a wiki page and document your
findings? With the URL and keywords you are searching for and the
results that google (and other search engines) are producing? That way
over time we can easily confirm or deny that search engines behaviour
continues in a consistant way.

thanks,
-brian

-- 
brian suda
http://suda.co.uk
From lee.jordan at gmail.com  Wed May 14 03:48:03 2008
From: lee.jordan at gmail.com (Lee Jordan)
Date: Wed May 14 03:48:08 2008
Subject: [uf-dev] SEO and abbr
In-Reply-To: <21e770780805140209l43128311nd75fb0ac6cbdec5d@mail.gmail.com>
References: <a2d985370805140155s4ec5ffe7t17b2512651f5e68f@mail.gmail.com>
	<21e770780805140209l43128311nd75fb0ac6cbdec5d@mail.gmail.com>
Message-ID: <a2d985370805140348k48d5b5b6w3974d365d6cd3691@mail.gmail.com>

Cheers Brian,

Have opened an issue on the wiki, with a text example, but without a link to
the search result as SEO is a sensitive area for my employer and I'm trying
to have a positive outlook on mF.

Cheers
Lee

On Wed, May 14, 2008 at 10:09 AM, Brian Suda <brian.suda@gmail.com> wrote:

> 2008/5/14, Lee Jordan <lee.jordan@gmail.com>:
> > It seems Google in particular indexes the title tag.
> > Just raising this as I've just noticed it so we can be aware of the
> issue.
>
> thanks for the heads-up, can you start a wiki page and document your
> findings? With the URL and keywords you are searching for and the
> results that google (and other search engines) are producing? That way
> over time we can easily confirm or deny that search engines behaviour
> continues in a consistant way.
>
> thanks,
> -brian
>
> --
> brian suda
> http://suda.co.uk
> _______________________________________________
> microformats-dev mailing list
> microformats-dev@microformats.org
> http://microformats.org/mailman/listinfo/microformats-dev
>


-- 
HTML | CSS | Javascript
http://www.leejordan.org.uk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080514/8b48a645/attachment.html
From brian.suda at gmail.com  Wed May 14 04:26:42 2008
From: brian.suda at gmail.com (Brian Suda)
Date: Wed May 14 04:26:45 2008
Subject: [uf-dev] SEO and abbr
In-Reply-To: <a2d985370805140348k48d5b5b6w3974d365d6cd3691@mail.gmail.com>
References: <a2d985370805140155s4ec5ffe7t17b2512651f5e68f@mail.gmail.com>
	<21e770780805140209l43128311nd75fb0ac6cbdec5d@mail.gmail.com>
	<a2d985370805140348k48d5b5b6w3974d365d6cd3691@mail.gmail.com>
Message-ID: <21e770780805140426o374776b2qabdc1c1e89149bd6@mail.gmail.com>

2008/5/14, Lee Jordan <lee.jordan@gmail.com>:
> Have opened an issue on the wiki, with a text example, but without a link to
> the search result as SEO is a sensitive area for my employer and I'm trying
> to have a positive outlook on mF.

i am unable to replicate your finding with any of the microformats on
my sites. If you could give a solid example, then we could look into
how/why/what mark-up is contributing to this behaviour and how to
proceed, but without any confirmation it is difficult to keep this as
an open issue.

Could you create a test page somewhere, so that you do not have to
disclose any sensitive data for your employer?

thanks,
-brian

-- 
brian suda
http://suda.co.uk
From csarven at gmail.com  Wed May 14 06:46:00 2008
From: csarven at gmail.com (Sarven Capadisli)
Date: Wed May 14 06:46:07 2008
Subject: [uf-dev] SEO and abbr
In-Reply-To: <21e770780805140426o374776b2qabdc1c1e89149bd6@mail.gmail.com>
References: <a2d985370805140155s4ec5ffe7t17b2512651f5e68f@mail.gmail.com>
	<21e770780805140209l43128311nd75fb0ac6cbdec5d@mail.gmail.com>
	<a2d985370805140348k48d5b5b6w3974d365d6cd3691@mail.gmail.com>
	<21e770780805140426o374776b2qabdc1c1e89149bd6@mail.gmail.com>
Message-ID: <d4154bcf0805140646x7a6ea98ag83fae28747117fc1@mail.gmail.com>

Here is one example:
http://www.google.com/search?hl=en&q=microformats+introduction

Look for "sarven". Shows description as:
"24 Jan 2008 ... An introduction to microformats: what they are, why
we need them and briefly how to use them."

It *appears* to be that this happens when the description is less then
150 characters and they fill in the available space with the timestamp
if and only if a new sentence doesn't fit.

Here is another example:
http://www.google.com/search?hl=en&q=three+significant+modes

Which doesn't include the timestamp.

And:
http://www.google.com/search?hl=en&q=irc+social+networking+platform

Which doesn't include the second sentence but fills it in with the timestamp.


Sarven Capadisli
http://www.csarven.ca


On Wed, May 14, 2008 at 7:26 AM, Brian Suda <brian.suda@gmail.com> wrote:
> 2008/5/14, Lee Jordan <lee.jordan@gmail.com>:
>> Have opened an issue on the wiki, with a text example, but without a link to
>> the search result as SEO is a sensitive area for my employer and I'm trying
>> to have a positive outlook on mF.
>
> i am unable to replicate your finding with any of the microformats on
> my sites. If you could give a solid example, then we could look into
> how/why/what mark-up is contributing to this behaviour and how to
> proceed, but without any confirmation it is difficult to keep this as
> an open issue.
>
> Could you create a test page somewhere, so that you do not have to
> disclose any sensitive data for your employer?
>
> thanks,
> -brian
>
> --
> brian suda
> http://suda.co.uk
> _______________________________________________
> microformats-dev mailing list
> microformats-dev@microformats.org
> http://microformats.org/mailman/listinfo/microformats-dev
>
From brian.suda at gmail.com  Wed May 14 07:00:35 2008
From: brian.suda at gmail.com (Brian Suda)
Date: Wed May 14 07:00:37 2008
Subject: [uf-dev] SEO and abbr
In-Reply-To: <d4154bcf0805140646x7a6ea98ag83fae28747117fc1@mail.gmail.com>
References: <a2d985370805140155s4ec5ffe7t17b2512651f5e68f@mail.gmail.com>
	<21e770780805140209l43128311nd75fb0ac6cbdec5d@mail.gmail.com>
	<a2d985370805140348k48d5b5b6w3974d365d6cd3691@mail.gmail.com>
	<21e770780805140426o374776b2qabdc1c1e89149bd6@mail.gmail.com>
	<d4154bcf0805140646x7a6ea98ag83fae28747117fc1@mail.gmail.com>
Message-ID: <21e770780805140700u1f68d662v351ca16851d24cc2@mail.gmail.com>

2008/5/14, Sarven Capadisli <csarven@gmail.com>:
> Here is one example:
>  http://www.google.com/search?hl=en&q=microformats+introduction
>
>  Look for "sarven". Shows description as:
>  "24 Jan 2008 ... An introduction to microformats: what they are, why
>  we need them and briefly how to use them."
>
>  It *appears* to be that this happens when the description is less then
>  150 characters and they fill in the available space with the timestamp
>  if and only if a new sentence doesn't fit.

--- thanks for the links and analysis. I agree, the description is
coming from the <meta> element and the date before that is either the
publication or date crawled. This doesn't seem to be in any way
connected to the <abbr> element that Lee Jordan is finding.

Maybe we are all just slightly confused and talking about different
things and/or Lee Jordan is connecting that displayed date with a date
in the HTML, or he has actually finding an issue.

Until we can find an example of this behaviour in the wild that is
testable, (all other examples are counter to this) i do not believe
this issue exists.

thanks,
-brian

-- 
brian suda
http://suda.co.uk
From lee.jordan at gmail.com  Fri May 16 01:30:21 2008
From: lee.jordan at gmail.com (Lee Jordan)
Date: Fri May 16 01:30:24 2008
Subject: [uf-dev] SEO and abbr
In-Reply-To: <21e770780805140700u1f68d662v351ca16851d24cc2@mail.gmail.com>
References: <a2d985370805140155s4ec5ffe7t17b2512651f5e68f@mail.gmail.com>
	<21e770780805140209l43128311nd75fb0ac6cbdec5d@mail.gmail.com>
	<a2d985370805140348k48d5b5b6w3974d365d6cd3691@mail.gmail.com>
	<21e770780805140426o374776b2qabdc1c1e89149bd6@mail.gmail.com>
	<d4154bcf0805140646x7a6ea98ag83fae28747117fc1@mail.gmail.com>
	<21e770780805140700u1f68d662v351ca16851d24cc2@mail.gmail.com>
Message-ID: <a2d985370805160130s441eec9o2e202ba890bec4ee@mail.gmail.com>

This is what I found in the google description:
"2008-04-2121st April - 2008-05-1211th May 2008"

Bit of confusion for me too as I had messed around with that page quite a
lot. It is actually an issue with working around abbr, not abbr itself.
Looking at it deeper for that page I may have changed abbr to spans
somewhere along the line before the google bot came along, to address
accessibility with abbr, the lack of whitespace would be my fault then
(schoolboy - hehe). In which case this should really be noted as a pitfall
of working around abbr with span classes and should be noted as a possible
downside to avoiding abbr?

I'd say that does seem the more likely situation as it makes sense all span
text gets indexed.
Still be interested in knowing how search engines handle abbr though, will
keep an eye on my abbr dates on the search engines as I have a few and will
keep watching my own cubs in the wild.

Lee


On Wed, May 14, 2008 at 3:00 PM, Brian Suda <brian.suda@gmail.com> wrote:

> 2008/5/14, Sarven Capadisli <csarven@gmail.com>:
> > Here is one example:
> >  http://www.google.com/search?hl=en&q=microformats+introduction
> >
> >  Look for "sarven". Shows description as:
> >  "24 Jan 2008 ... An introduction to microformats: what they are, why
> >  we need them and briefly how to use them."
> >
> >  It *appears* to be that this happens when the description is less then
> >  150 characters and they fill in the available space with the timestamp
> >  if and only if a new sentence doesn't fit.
>
> --- thanks for the links and analysis. I agree, the description is
> coming from the <meta> element and the date before that is either the
> publication or date crawled. This doesn't seem to be in any way
> connected to the <abbr> element that Lee Jordan is finding.
>
> Maybe we are all just slightly confused and talking about different
> things and/or Lee Jordan is connecting that displayed date with a date
> in the HTML, or he has actually finding an issue.
>
> Until we can find an example of this behaviour in the wild that is
> testable, (all other examples are counter to this) i do not believe
> this issue exists.
>
> thanks,
> -brian
>
> --
> brian suda
> http://suda.co.uk
> _______________________________________________
> microformats-dev mailing list
> microformats-dev@microformats.org
> http://microformats.org/mailman/listinfo/microformats-dev
>


-- 
HTML | CSS | Javascript
http://www.leejordan.org.uk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080516/cc957526/attachment.html
From lists at ben-ward.co.uk  Sat May 17 11:47:01 2008
From: lists at ben-ward.co.uk (Ben Ward)
Date: Sat May 17 11:47:37 2008
Subject: [uf-dev] Defining and Extending Value Excepting
Message-ID: <93867DB2-12A5-4DBD-938D-2FF35616347E@ben-ward.co.uk>

Hi parser devs!

I've spent a number of hours this weekend documenting cases of  
microformats requiring particular data formats for parsing (ISO 8601,  
telephone keywords in hCard, and so on).

Alongside this, I've documented the current supported means of  
including said data (class-design-pattern, abbr-design-pattern and  
value-excerpting), noting how the intention of authors is to hide  
these specified formats in favour of more flexible human-centric  
formats. Alongside that, I've documented where different means of  
inclusion are appropriate and inappropriate in different situations.

Finally, I've proposed an extension to the current pattern of value- 
excepting, whereby cases where an element with a class of ?value? is  
also empty, it would have the @title attribute parsed in place of  
inner-text.

I am aware that we need to better specify the behaviour of value- 
excerpting as a whole, let alone adding extensions. We do, however,  
have a problem that can be solved; the requirement is to include  
specific data formats, but hidden in place of variable, human  
consumable forms of that data (or internationalised), whilst still  
operating entirely within the HTML layer (not depending on CSS). This  
is not something that HTML has a native means of handling.

The way I see it, at the same time as properly specifying value- 
excerpting (possibly just calling ?value-design-pattern?), we can  
specify a robust means of handling the exceptional requirement to  
include machine data.

** What I'd like from parser developers is feedback on how feasible  
this pattern is to parse, please. **

Note that whilst this proposal _does_ resolve the long running abbr- 
misuse issue that keeps coming up, my approach here is in solving the  
root of the problem, not of working around a consequence of that  
problem. Additionally, in extending the existing value-excerpting  
behaviour, we avoid adding yet more syntactic vocabulary to  
microformats and we produce a pattern that does not tie people to  
particular HTML elements (which is more inline with our microformat  
goals).

With regard to the separate issues we've had with ABBR, I'm asking  
some colleagues to test this idea thoroughly with regard to assistive  
technology before we finalise a spec.

Thanks,

Ben
From mail at tobyinkster.co.uk  Sat May 17 15:08:19 2008
From: mail at tobyinkster.co.uk (Toby A Inkster)
Date: Sat May 17 15:08:34 2008
Subject: [uf-dev] Defining and Extending Value Excepting
Message-ID: <FC7E8DA5-2AB8-4D41-AFFF-1D4B0049D389@tobyinkster.co.uk>

Although this sounds like a nice idea, I've previously been informed  
that requiring empty inline elements is a non-starter, as many HTML  
processors (including "tidy" with its default settings) strip these out.

Preliminary testing with tidy (version: 1 September 2005) shows this  
to be true. Some parsers, including X2V IIRC, pre-process non-XHTML  
HTML by running it through tidy to get it into well-formed XML.  
Skimming through the tidy documentation, I can't see a way of  
disabling this empty inline element stripping behaviour.

If people *want* to publish data that uses empty inline elements,  
then that's fair enough, but with the current state of HTML  
processors, it's probably unwise to publish a pattern that *requires*  
the use of empty inline elements.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>


From lists at ben-ward.co.uk  Sun May 18 05:09:23 2008
From: lists at ben-ward.co.uk (Ben Ward)
Date: Sun May 18 05:09:42 2008
Subject: [uf-dev] Defining and Extending Value Excepting
In-Reply-To: <FC7E8DA5-2AB8-4D41-AFFF-1D4B0049D389@tobyinkster.co.uk>
References: <FC7E8DA5-2AB8-4D41-AFFF-1D4B0049D389@tobyinkster.co.uk>
Message-ID: <AD1C0899-94D4-49CD-A3A3-17CEF8C09F6F@ben-ward.co.uk>

Hey Toby,

On 17 May 2008, at 23:08, Toby A Inkster wrote:
> Although this sounds like a nice idea, I've previously been informed  
> that requiring empty inline elements is a non-starter, as many HTML  
> processors (including "tidy" with its default settings) strip these  
> out.
>
> Preliminary testing with tidy (version: 1 September 2005) shows this  
> to be true. Some parsers, including X2V IIRC, pre-process non-XHTML  
> HTML by running it through tidy to get it into well-formed XML.  
> Skimming through the tidy documentation, I can't see a way of  
> disabling this empty inline element stripping behaviour.

hKit does this too (via the W3C hosted version, although there was  
some talk of switching to PHP's native HTML DOM parser instead).  
Looking over the HTMLTidy bug tracker, it does seem to be an open  
issue, but there's one bug ? http://is.gd/i8E ? proposing that it not  
drop empty elements with class attributes, and includes a simple fix  
for it, fixing that would resolve this.

> If people *want* to publish data that uses empty inline elements,  
> then that's fair enough, but with the current state of HTML  
> processors, it's probably unwise to publish a pattern that  
> *requires* the use of empty inline elements.

I'm not entirely comfortable with a broken part of the parser stack  
being a blocker for a mark-up level pattern.

Of course, If we can't work out a fix, then you're absolutely right  
that we can't go requiring something that's too expensive to parse  
(especially given parsing expense is the whole reason for having  
specified data formats within microformats in the first place!). But,  
if it's feasible to fix tidy for microformat parsers, then I'd be in  
favour of doing so.

B
From lists at ben-ward.co.uk  Sun May 18 07:44:47 2008
From: lists at ben-ward.co.uk (Ben Ward)
Date: Sun May 18 07:45:01 2008
Subject: [uf-dev] Defining and Extending Value Excepting
In-Reply-To: <FC7E8DA5-2AB8-4D41-AFFF-1D4B0049D389@tobyinkster.co.uk>
References: <FC7E8DA5-2AB8-4D41-AFFF-1D4B0049D389@tobyinkster.co.uk>
Message-ID: <72CA99A4-02E1-4666-899D-8DF1B34ACAFF@ben-ward.co.uk>

On 17 May 2008, at 23:08, Toby A Inkster wrote:
> Although this sounds like a nice idea, I've previously been informed  
> that requiring empty inline elements is a non-starter, as many HTML  
> processors (including "tidy" with its default settings) strip these  
> out.

As a second followup to this, I've built a version of HTMLTidy which  
does not strip empty elements where a class attribute is present. You  
can download a copy from http://ben-ward.co.uk/files/tidy-microformats.zip

It's built on Mac OSX (Intel), and I can't recall what the deal is  
with OSX binaries running on other forms of UNIX. However, I've  
included the diff, so it should be trivial to compile other builds on  
other platforms as required.

Cheers,

Ben
From lists at ben-ward.co.uk  Thu May 22 03:22:03 2008
From: lists at ben-ward.co.uk (Ben Ward)
Date: Thu May 22 03:22:14 2008
Subject: [uf-dev] Defining and Extending Value Excepting
In-Reply-To: <FC7E8DA5-2AB8-4D41-AFFF-1D4B0049D389@tobyinkster.co.uk>
References: <FC7E8DA5-2AB8-4D41-AFFF-1D4B0049D389@tobyinkster.co.uk>
Message-ID: <25D51B07-201C-48F3-A1F6-8B2909B88B15@ben-ward.co.uk>

OK, pushing on a bit:

I've got one flaw with my own suggestion here, which is that using  
class="value" is going to cause a bit of car-crash in hCard, due to  
the two instances of machine-data identified in the tel property  
(documented on the wiki page <http://microformats.org/wiki/machine- 
data>). The type property works alongside the other specified use of  
value, whilst it's possible for the value itself to need a hidden data  
value. In combination with this new value-pattern, we could end up  
with mark-up like:

<p class="tel">
     <span class="type">
         Mobile Phone
         <span class="value" title="cell"></span>
     </span>
     <span class="value">
          +1-555-FORMATS
          <span class="value" title="+15553676177"></span>
     </span>
</p>

That's? messy. Value of Value is especially unpleasant, parsing the  
value of tel without parsing the value of type as the value of tel  
strikes me as complex (although, with value-excerpting itself not  
fully spec'd, maybe it could be made to work).

So I'm suggesting one quick alteration here, which is to use a  
class=data rather than class=value, so as to avoid the example above.  
I'm thinking this from a publisher point of view as much as anything;  
I'd like to avoid that above scenario of nesting the same class for  
different behaviours.

Once again, more feedback on the pattern from a parsing angle would be  
great. I'd like to be confident that the pattern is robust and  
parsable before presenting it to ?f-discuss; I don't want to lose it  
in a maelstrom :-)

Thanks,

Ben

On 17 May 2008, at 23:08, Toby A Inkster wrote:
> Although this sounds like a nice idea, I've previously been informed  
> that requiring empty inline elements is a non-starter, as many HTML  
> processors (including "tidy" with its default settings) strip these  
> out.
>
> Preliminary testing with tidy (version: 1 September 2005) shows this  
> to be true. Some parsers, including X2V IIRC, pre-process non-XHTML  
> HTML by running it through tidy to get it into well-formed XML.  
> Skimming through the tidy documentation, I can't see a way of  
> disabling this empty inline element stripping behaviour.
>
> If people *want* to publish data that uses empty inline elements,  
> then that's fair enough, but with the current state of HTML  
> processors, it's probably unwise to publish a pattern that  
> *requires* the use of empty inline elements.
>
> -- 
> Toby A Inkster
> <mailto:mail@tobyinkster.co.uk>
> <http://tobyinkster.co.uk>
>
>
>
> _______________________________________________
> microformats-dev mailing list
> microformats-dev@microformats.org
> http://microformats.org/mailman/listinfo/microformats-dev


From brian.suda at gmail.com  Thu May 22 03:34:19 2008
From: brian.suda at gmail.com (Brian Suda)
Date: Thu May 22 03:34:23 2008
Subject: [uf-dev] Defining and Extending Value Excepting
In-Reply-To: <25D51B07-201C-48F3-A1F6-8B2909B88B15@ben-ward.co.uk>
References: <FC7E8DA5-2AB8-4D41-AFFF-1D4B0049D389@tobyinkster.co.uk>
	<25D51B07-201C-48F3-A1F6-8B2909B88B15@ben-ward.co.uk>
Message-ID: <21e770780805220334j7e3055bayc5cbf332f7ce2bd@mail.gmail.com>

2008/5/22 Ben Ward <lists@ben-ward.co.uk>:
> <p class="tel">
>    <span class="type">
>        Mobile Phone
>        <span class="value" title="cell"></span>
>    </span>
>    <span class="value">
>         +1-555-FORMATS
>         <span class="value" title="+15553676177"></span>
>    </span>
> </p>
>
> That's? messy. Value of Value is especially unpleasant, parsing the value of
> tel without parsing the value of type as the value of tel strikes me as
> complex (although, with value-excerpting itself not fully spec'd, maybe it
> could be made to work).

--- in your example, if you are only interested in the +15553676177,
then there is no need for the outer class="value" around the
+1-555-FORMATS

-brian

-- 
brian suda
http://suda.co.uk

From glenn.jones at madgex.com  Thu May 22 07:05:46 2008
From: glenn.jones at madgex.com (Glenn Jones)
Date: Thu May 22 07:05:55 2008
Subject: [uf-dev] Defining and Extending Value Excepting
In-Reply-To: <25D51B07-201C-48F3-A1F6-8B2909B88B15@ben-ward.co.uk>
References: <FC7E8DA5-2AB8-4D41-AFFF-1D4B0049D389@tobyinkster.co.uk>
	<25D51B07-201C-48F3-A1F6-8B2909B88B15@ben-ward.co.uk>
Message-ID: <36A319113CF910438942741C4727ADFF01E97814@MOBY.Clarence.local>

2008/5/22 Ben Ward <lists@ben-ward.co.uk>:
> <p class="tel">
>    <span class="type">
>        Mobile Phone
>        <span class="value" title="cell"></span>
>    </span>
>    <span class="value">
>         +1-555-FORMATS
>         <span class="value" title="+15553676177"></span>
>    </span>
> </p>
>
> That's... messy. Value of Value is especially unpleasant, parsing the 
> value of tel without parsing the value of type as the value of tel 
> strikes me as complex (although, with value-excerpting itself not 
> fully spec'd, maybe it could be made to work).


This would be really hard for me to add the above to ufXtract. Your
right nested Value of Value is especially unpleasant.

Adding the "Invisible Supplementary Data" idea as below, should not be a
problem
<span class="dtstart">Tomorrow lunchtime <span class="value"
title="2008-05-17T12:00:00+0100"></span></span> 

It looks like Cognition and Optimus are already picking up invisible
supplementary data pattern for the geo class
http://www.ufxtract.com/testsuite/experimental/experimental1.htm (Press
'Alt X' and run test)


Glenn Jones


From lists at ben-ward.co.uk  Thu May 22 07:08:30 2008
From: lists at ben-ward.co.uk (Ben Ward)
Date: Thu May 22 07:08:37 2008
Subject: [uf-dev] Specification of Value-Excerpting
Message-ID: <22689992-ADD3-4541-8FA0-5EF2BD61B7A0@ben-ward.co.uk>

Related to the machine-data documentation and empty-element-value- 
excerpting http://microformats.org/wiki/machine-data, I'd like to get  
proper documentation written on the value-excerpting behaviour first  
described in hCard.

It's currently covered by a single paragraph in the hCard spec, which  
is massively insufficient. It's also exposed issues lately about  
putting values out of nested microformats and even just out of  
complexly nested properties.

I'm vaguely aware that Operator has implemented safety nets by not  
parsing with other known microformats, but that seems to be flawed  
solution as it depends on every parser knowing about every other  
microformat. There's the ongoing class=mfo idea, which is a separate  
solution to that problem, but we should get the parsing behaviour of  
class=value (sans mfo) tightened up.

I've created an initial wiki page at <http://microformats.org/wiki/value-excerption-pattern 
 > ? with basic starting points of how it's supposed to work. However,  
I don't understand the intricacies of existing implementations enough  
to fully document parsing, so it's marked as being a ?draft, don't  
publish this yet? page.

One requirement for value-excerption that seems critical is that it  
must be implementable without parsers having knowledge of every other  
microformat. It needs to be possible to write an ?hcard parser? that  
stands alone, nor that needs to be updated every time a new  
microformat is developed.

There are some other notes on that page too, and a ?parsing to-do?  
list that I'd encourage you all to add to, so that we can get this  
fully defined and interoperable.

Thanks and regards,

Ben
From mkaply at us.ibm.com  Thu May 22 07:44:43 2008
From: mkaply at us.ibm.com (Michael Kaply)
Date: Thu May 22 07:45:14 2008
Subject: [uf-dev] Specification of Value-Excerpting
In-Reply-To: <22689992-ADD3-4541-8FA0-5EF2BD61B7A0@ben-ward.co.uk>
Message-ID: <OF4D16BB1D.994160B8-ON86257451.0050CCDB-86257451.0050FF95@us.ibm.com>

There is another page on the wiki about value excerpting somewhere besides
this one:

http://microformats.org/wiki/hCard#Value_excerpting

I don't know where it is, but I saw it at one point.

It specifically says that you are only supposed to use child nodes to get
the values, NOT descendants.

Michael Kaply
Firefox Advocate
mkaply@us.ibm.com
http://www.kaply.com/weblog/ (External Blog)
http://blogs.tap.ibm.com/weblogs/page/mkaply@us.ibm.com (Internal Blog)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080522/20426b6c/attachment.html
From mail at tobyinkster.co.uk  Thu May 22 10:13:50 2008
From: mail at tobyinkster.co.uk (Toby A Inkster)
Date: Thu May 22 10:14:04 2008
Subject: [uf-dev] Defining and Extending Value Excepting
Message-ID: <F6E0C315-C235-4944-A9A8-B9D8EDA9EFE5@tobyinkster.co.uk>

Glenn Jones:

> It looks like Cognition and Optimus are already picking up invisible
> supplementary data pattern for the geo class
> http://www.ufxtract.com/testsuite/experimental/experimental1.htm  
> (Press
> 'Alt X' and run test)

Actually, it's just a case of good luck. Cognition implements an  
extra optimisation for geo, which your example coincidentally  
triggers. It's documented here:

http://buzzword.org.uk/cognition/uf-plus.html#geo

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>


From glenn.jones at madgex.com  Fri May 23 02:18:27 2008
From: glenn.jones at madgex.com (Glenn Jones)
Date: Fri May 23 02:18:32 2008
Subject: [uf-dev] The correct format of a ISO date
Message-ID: <36A319113CF910438942741C4727ADFF01E978EB@MOBY.Clarence.local>

I have a questions about the use of the Z char (Zulu time or UTC) and
the time zone information together.

So these are correct  
2007-05-01T11:30:15Z
2007-05-01T11:30:15-08:00

What about 
2007-05-01T11:30:15Z-08:00

I cannot seem to find a clear pointer on whether the above is a valid
ISO date.

Background work
http://www.ufxtract.com/testsuite/documentation/iso-date-normalisation.h
tm
http://www.ufxtract.com/testsuite/hcard/hcard15.htm (Alt X to run test)


Examples of date formats I think are OK

2008-01-21
20080121
2007-05-01T11:30
2007-05-01 11:30
20070501 11:30
20070501T1130
2007-05-01T11:30:15
20070501T113015
2007-05-01T11:30Z-08:00
2007-05-01T11:30-08:00
2007-05-01T11:30+08:00
2007-05-01T11:30Z08:00
20070501T1130Z-0800
2007-05-01T11:30Z
2007-05
07-05-01 (equals 2007-05-01)
070501  (equals 2007-05-01)

The last one is interesting 
http://en.wikipedia.org/wiki/ISO_8601 ...
"Although the standard allows both the YYYY-MM-DD and YYYYMMDD formats
for complete calendar date representations, if the day [DD] is omitted
then only the YYYY-MM format is allowed. By disallowing dates of the
form YYYYMM, the standard avoids confusion with the truncated
representation YYMMDD (still often used)."


Glenn Jones 


From norm at cackhanded.net  Fri May 23 03:26:41 2008
From: norm at cackhanded.net (Mark Norman Francis)
Date: Fri May 23 03:26:46 2008
Subject: [uf-dev] The correct format of a ISO date
In-Reply-To: <36A319113CF910438942741C4727ADFF01E978EB@MOBY.Clarence.local>
References: <36A319113CF910438942741C4727ADFF01E978EB@MOBY.Clarence.local>
Message-ID: <10B5F88E-540A-4739-B9EC-086F21F607B6@cackhanded.net>

> 2007-05-01T11:30:15Z-08:00


I think that is incorrect, as it is an either-or. The timezone is:

     *   omitted, therefore local timezone
     *   Z, therefore UTC
     *   +/-HH:MM, therefore an offset from UTC

The W3C page on date/time formats (<http://www.w3.org/TR/NOTE- 
datetime>)  says:
> TZD = time zone designator (Z or +hh:mm or -hh:mm)

Also see <http://www.cl.cam.ac.uk/~mgk25/iso-time.html#zone> for more  
notes on the ISO standard.

-- Norm.

From glenn.jones at madgex.com  Fri May 23 04:55:17 2008
From: glenn.jones at madgex.com (Glenn Jones)
Date: Fri May 23 04:55:26 2008
Subject: [uf-dev] The correct format of a ISO date
In-Reply-To: <10B5F88E-540A-4739-B9EC-086F21F607B6@cackhanded.net>
References: <36A319113CF910438942741C4727ADFF01E978EB@MOBY.Clarence.local>
	<10B5F88E-540A-4739-B9EC-086F21F607B6@cackhanded.net>
Message-ID: <36A319113CF910438942741C4727ADFF01E979D4@MOBY.Clarence.local>

Thanks Norm

>The W3C page on date/time formats
(<http://www.w3.org/TR/NOTE-datetime>)  says:
>> TZD = time zone designator (Z or +hh:mm or -hh:mm)

That pity clear, I will change my code and tests.

Glenn


From mkaply at us.ibm.com  Fri May 23 08:05:12 2008
From: mkaply at us.ibm.com (Michael Kaply)
Date: Fri May 23 08:08:36 2008
Subject: [uf-dev] The correct format of a ISO date
In-Reply-To: <36A319113CF910438942741C4727ADFF01E978EB@MOBY.Clarence.local>
Message-ID: <OF05662515.61167D7F-ON86257452.00528D9E-86257452.0052DFD9@us.ibm.com>

microformats-dev-bounces@microformats.org wrote on 05/23/2008 04:18:27 AM:

> Examples of date formats I think are OK
>
> 2008-01-21
> 20080121
> 2007-05-01T11:30
> 2007-05-01 11:30
> 20070501 11:30

I thought the T was required?

> 20070501T1130
> 2007-05-01T11:30:15
> 20070501T113015
> 2007-05-01T11:30Z-08:00

Definitely invalid - Z and offset are mutually exclusive

> 2007-05-01T11:30-08:00
> 2007-05-01T11:30+08:00
> 2007-05-01T11:30Z08:00
> 20070501T1130Z-0800
> 2007-05-01T11:30Z

Definitely invalid - Z and offset are mutually exclusive

> 2007-05
> 07-05-01 (equals 2007-05-01)
> 070501  (equals 2007-05-01)

I sincerely hope noone would ever actually do anything like this. I'm not
going to handle it in Operator.
I can't believe they even allow this. It's a specification. So they can say
"Always have the year"

I hate ambiguity in dates and I hate parsing ISO dates.

> The last one is interesting
> http://en.wikipedia.org/wiki/ISO_8601 ...
> "Although the standard allows both the YYYY-MM-DD and YYYYMMDD formats
> for complete calendar date representations, if the day [DD] is omitted
> then only the YYYY-MM format is allowed. By disallowing dates of the
> form YYYYMM, the standard avoids confusion with the truncated
> representation YYMMDD (still often used)."

Mike Kaply
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080523/2e98ab99/attachment-0001.html
From norm at cackhanded.net  Fri May 23 09:06:31 2008
From: norm at cackhanded.net (Mark Norman Francis)
Date: Fri May 23 09:36:57 2008
Subject: [uf-dev] The correct format of a ISO date
In-Reply-To: <OF05662515.61167D7F-ON86257452.00528D9E-86257452.0052DFD9@us.ibm.com>
References: <OF05662515.61167D7F-ON86257452.00528D9E-86257452.0052DFD9@us.ibm.com>
Message-ID: <E702CF90-DE1A-4130-A9A3-B777299EC4AB@cackhanded.net>

> I thought the T was required?


No, it can be omitted. Most sane people do not choose that format for  
on-the-wire data though. It's also one of our best practices at work  
to use the T format to remind lazy programmers that they cannot just  
echo out an ISO date string to end users.

-- Norm.

From mail at tobyinkster.co.uk  Fri May 23 11:27:31 2008
From: mail at tobyinkster.co.uk (Toby A Inkster)
Date: Fri May 23 11:27:49 2008
Subject: [uf-dev] The correct format of a ISO date
Message-ID: <B409FE0E-A846-44AD-8E53-53F884ACE67C@tobyinkster.co.uk>

Glenn Jones wrote:

> 07-05-01 (equals 2007-05-01)
> 070501  (equals 2007-05-01)

Thanks for these examples. Although the current version of Cognition  
parses these dates correctly, it marks the "resolution" of the dates  
as being "month" (because they only have 6 numeric digits), so when  
outputting them will only output the year and month even though it  
knows the day internally. :-( Fix in the next release.

If you really want to test full ISO compatibility, then you should  
include:

	2008-W21
	2008W21
	2008-W21-5
	2008W215
	2008-144
	2008144

Plus "T..." variants (i.e. with times). Cognition supports them all  
because it uses the Perl DateTime::Format::ISO8601 module, which is  
fairly comprehensive.

But I don't think implementations should be expected to support the  
entire ISO8601 -- the W3CDTF note subset should be all that's required.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>


From norm at cackhanded.net  Fri May 23 11:51:14 2008
From: norm at cackhanded.net (Mark Norman Francis)
Date: Fri May 23 11:51:21 2008
Subject: [uf-dev] The correct format of a ISO date
In-Reply-To: <B409FE0E-A846-44AD-8E53-53F884ACE67C@tobyinkster.co.uk>
References: <B409FE0E-A846-44AD-8E53-53F884ACE67C@tobyinkster.co.uk>
Message-ID: <86A6A206-396F-4EC1-B170-3201B1F7DF1F@cackhanded.net>

>> 07-05-01 (equals 2007-05-01)
>> 070501  (equals 2007-05-01)
>
> Thanks for these examples. Although the current version of Cognition  
> parses these dates correctly, it marks the "resolution" of the dates  
> as being "month" (because they only have 6 numeric digits), so when  
> outputting them will only output the year and month even though it  
> knows the day internally. :-( Fix in the next release.

Actually, according to the Wikipedia page on ISO 8601:
> ISO 8601 prescribes, as a minimum, a four-digit year [YYYY] to avoid  
> the year 2000 problem.

Not having a personal copy of 8601 to check, I can't verify this, but  
it seems wise to me. ;)

-- Norm.

From mail at tobyinkster.co.uk  Fri May 23 12:20:12 2008
From: mail at tobyinkster.co.uk (Toby A Inkster)
Date: Fri May 23 12:20:31 2008
Subject: [uf-dev] The correct format of a ISO date
Message-ID: <36DE9D7A-DF34-4049-9CFA-B998D815CE36@tobyinkster.co.uk>

Mark Norman Francis wrote:

> Actually, according to the Wikipedia page on ISO 8601:
> > ISO 8601 prescribes, as a minimum, a four-digit year [YYYY] to avoid
> > the year 2000 problem.

The confusion is due to the fact that there are three editions of ISO  
8601:

	ISO 8601:1988 (E)
	ISO 8601:2000 (E)
	ISO 8601:2004 (E)

The first two allow two-digit years. The most recent edition  
disallows them, but IIRC parsers are still expected to accept them,  
as they may be produced by legacy ISO 8601 code.

Also worth consideration are date formats like:

	--05-23	(Day and month; year not specified)
	---23 (Day; month and year not specified)
	-145 (Ordinal day; year not specified)
	-W21-5 (Week and day; year not specified)

Oh, and commas are allowed to be used as decimal points. Oh, and  
decimals are not just allowed after seconds, but also after minutes  
and hours. It is for these reasons that we really must specify a  
subset of ISO 8601 -- the W3CDTF subset would be idea.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>


From scott at randomchaos.com  Fri May 23 12:54:58 2008
From: scott at randomchaos.com (Scott Reynen)
Date: Fri May 23 12:55:06 2008
Subject: [uf-dev] The correct format of a ISO date
In-Reply-To: <36DE9D7A-DF34-4049-9CFA-B998D815CE36@tobyinkster.co.uk>
References: <36DE9D7A-DF34-4049-9CFA-B998D815CE36@tobyinkster.co.uk>
Message-ID: <DAAB3293-FD49-44CF-B9E0-F48BBAD2014F@randomchaos.com>

On [May 23], at [ May 23] 1:20 , Toby A Inkster wrote:

> The confusion is due to the fact that there are three editions of  
> ISO 8601:


The datetime design pattern page [1] in the wiki says:

"Any microformat using the date-time-design pattern should use a  
profile of ISO8601. There are currently two widely used profiles which  
should be reused.
- RFC 3339
- W3C Note on Datetimes"

That seems to clear this up, but then there's more confusing language  
on the ISO 8601 page:

"Microformats should use RFC 3339."

In addition to being more specific than the previous recommendation,  
this one applies RFC 2119 "should" to the microformat itself rather  
than implementors of the microformat, which doesn't make much sense.   
Further confusing matters, individual microformats make no mention of  
RFC 3339, referring only to ISO 8601.  We should probably clarify the  
actual source(s) for date formats before we spend too much time  
testing them.

[1] http://microformats.org/wiki/datetime-design-pattern
[2] http://microformats.org/wiki/iso-8601

Peace,
Scott

From mail at tobyinkster.co.uk  Fri May 23 14:13:31 2008
From: mail at tobyinkster.co.uk (Toby A Inkster)
Date: Fri May 23 14:13:39 2008
Subject: [uf-dev] The correct format of a ISO date
Message-ID: <E5C7EA90-7098-478A-A79E-CE0742D20FC4@tobyinkster.co.uk>

Scott Reynen wrote:

> In addition to being more specific than the previous recommendation,
> this one applies RFC 2119 "should" to the microformat itself rather
> than implementors of the microformat, which doesn't make much sense.
> Further confusing matters, individual microformats make no mention of
> RFC 3339, referring only to ISO 8601.

I raised this very issue a couple of months ago:
http://microformats.org/discuss/mail/microformats-discuss/2008-March/ 
011712.html

In short the datetime design pattern says that microformats making  
use of it must define a profile (i.e. subset) of ISO 8601 that is  
supported. But none do.

I've tried to address this in my experimental hCalendar 1.1 spec:
http://microformats.org/wiki/User:TobyInk/hcalendar-1.1#Dates_and_Times

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>


From glenn.jones at madgex.com  Sun May 25 09:46:25 2008
From: glenn.jones at madgex.com (Glenn Jones)
Date: Sun May 25 09:46:32 2008
Subject: [uf-dev] The correct format of a ISO date
Message-ID: <36A319113CF910438942741C4727ADFF01EEF3B1@MOBY.Clarence.local>

I think I now have a handle on this date stuff. Thanks for everyone's
comments. I have to say that the documentation clarity for using dates
and times in Microformats is not good at the moment. Pointing people at
ISO 8601 is not a good idea. 

Toby's point about specifying a the profile for each usage in the wiki
is important. Maybe all the language on the wiki should be about the
profiles. Also changing it over to examples and use cases rather than
point people at dry specs? 

There are a couple of smaller points I still outstanding like
Specifying RFC 3339 plus 'T' and 'Z' MUST be caps has been suggested in
the past, but then it's not RFC 3339
  

So here my new take

http://www.ufxtract.com/testsuite/documentation/iso-date.htm
New test pages
http://www.ufxtract.com/testsuite/hcard/hcard15.htm
http://www.ufxtract.com/testsuite/hcard/hcard16.htm


W3C Note datetime profile - valid structures 
2007
2007-05
2007-05-01T11:30
2007-05-01T11:30Z
2007-05-01T11:30:00Z
2007-05-01T11:30+08:00
2007-05-01T11:30:00+08:00
2007-05-01T11:30:00.0135


RFC 3339 profile - valid structures 	
2007
2007-05
2007-05-01T11:30
2007-05-01T11:30Z
2007-05-01T11:30:00Z
2007-05-01T11:30+08:00
2007-05-01T11:30:00+08:00
2007-05-01T11:30:00.0135
200801
20080121
20070501T1130
20070501T113015
20070501T113015Z
20070501t113025z
2007-05-01T113025
20070501T11:30:25  

	
Valid ISO 8601 date time that SHOULD NOT be used in Microformats
070501	
07-05-01
20070501 1130
20070501 113015Z
2007-05-01 11:30:00+08:00
2007-05-01 11:30:00.0135
2007-05-01T11.0150
2007-05-01T11:30.0150
2008-W21
2008W21
2008-W21-5
2008W215
2008-144
2008144
etc...


Glenn Jones