From lists at ben-ward.co.uk  Fri Apr  4 03:08:48 2008
From: lists at ben-ward.co.uk (Ben Ward)
Date: Fri Apr  4 03:08:52 2008
Subject: [uf-dev] Re: [uf-discuss] jCard draft
In-Reply-To: <47F5750D.8030708@onlinehome.de>
References: <47F5750D.8030708@onlinehome.de>
Message-ID: <F4BD1A7F-9EB5-4513-A909-21BF9C41FBF3@ben-ward.co.uk>

On 4 Apr 2008, at 01:23, Gordon Oheim wrote:

> I have added a preliminary draft for a possible jCard specification  
> to the wiki at http://microformats.org/wiki/jcard.
> The content is based on what I read from the discussion list so far.  
> The intention was to have a reference for further discussion and for  
> solidifying a candidate for a jCard standard.

Hi,

This is great work, and it's something that I found a number of  
developers asking about during South By South West. I think it was  
Glenn Jones suggesting that we're now at a point with parser maturity  
that some thought needs to be given to having interoperable JSON  
structures.

I have two points of initial followup, one with my admin hat on, the  
other without.

1. ADMIN: This discussion should probably take place on the  
microformats-dev mailing list, rather than -discuss. It should come to  
the attention of all parser developers that way, and hopefully stay  
focused on this very parser-centric work. I've cross posted this  
thread to microformats-dev@microformats.org; please continue the  
development discussion there.

2. In my view: I'm totally supportive and in favour of this work, I  
think ?jCard? is a bad name for it; I think this work would be better  
presented connected to the hCard specification itself ? and future  
equivalents for the other microformats too. Whether that end up as an  
?Object Model? section of the relevant specs, or new documents (e.g.  
hcard-object-model). It doesn't need it's own, separate format name;  
it's really further specifying hcard itself.

What's more, whilst JSON is the obvious driver technology for this  
work, I think it would make more sense to produce an implementation- 
agnostic Object Model that would work in JSON, XML, YML or whatever  
other transport people might want to implement for. I think it's  
unlikely we'd want to specify ?jCard?, ?xCard?, ?yCard? and so on?)

> Please forgive my poor wiki editing skills and feel free to add to  
> the page.

The page is off to a great start! Keep it up.

Thanks,

Ben
From donohoe at nytimes.com  Wed Apr  9 13:38:43 2008
From: donohoe at nytimes.com (michael)
Date: Wed Apr  9 13:38:47 2008
Subject: [uf-dev] Feedback on XFN implementation
Message-ID: <8ebe8ca30804091338y71222a91md85e6c5a4c76504c@mail.gmail.com>

Hello,

I'm trying to get some initial feedback on XFN support for a project I
am working on.

I've included some sample text of a users page. Essentially there are
three components:

1. User info (name and basic summary)
2. A list of actions/activities from the user and other people in their network
3. A list of people within the users network (this can also include the user)

There really aren't any levels of friend designation, and we expect
that the user will not really know in RL many of the people in their
network. From that perspective I use the designation "acquaintance"
only.

With that in mind, does the following seem appropriate (ignore href
values as they're all bogus):

<div id="person">
	<table width="100%">
		<tr>
			<td width="70">
				<a href="/profileid/1/index.html"><img src="path_to_image.jpg"
width="64" height="64" /></a>
			</td>
			<td>
				<h4><a rel="me" href="blah/index.html">Michael</a></h4>
				<p>NYC&nbsp;</p>
			</td>
		</tr>
	</table>
</div>
...
<!-- this is a table with a list of users in your list -->
<table>
	<tr>
	<td class="picon"><a href="blah/index.html"><img class="iconMedium"
src="path_to_image.jpg" width="30" height="30" title="John Coleman"
/></a></td>
	<td class="bold"><a rel="acquaintance"
href="/view/user/3456266/1/index.html">John Coleman</a></td>
	<td class="plocation">Earth</td>
	</tr>
	<tr>
	<td class="picon"><a href="blah/index.html"><img class="iconMedium"
src="path_to_image" width="30" height="30" title="Shane Sweeney"
/></a></td>
	<td class="bold"><a rel="acquaintance"
href="/view/user/237187/1/index.html">Shane Sweeney</a></td>
	<td class="plocation">New York</td>
	</tr>
	<tr>
	<td class="picon"><a href="blah/index.html"><img class="iconMedium"
src="path_to_image" width="30" height="30" title="Nick Burke"
/></a></td>
	<td class="bold"><a rel="acquaintance"
href="/view/user/50640219/1/index.html">Nick Burke</a></td>
	<td class="plocation">Austin, TX</td>
	</tr>
<tr>
...
<table id="list">
	<tr>
		<td width='18'><a href="blah/index.html"><img src="someimage.jpg"
width="16" height="16" /></a></td>
		<td>
			<div><a rel="acquaintance" href="blah/unique/index.html">John
Coleman</a> recommended something: <a
href="http://www.yahoo.com">Yahoo</a></div>
			<div id="52396054" class="summary" style=""><span>This is a web
site </span></div>
	
		</td>
		<td class="toggle"><span class="timestamp">Apr, 1 2008</span></td>
	</tr>
	<tr>
		<td width='18'><a href="blah/index.html"><img src="someimage.jpg"
width="16" height="16" /></a></td>
		<td>
			<div><a rel="me" href="blah/index.html">Michael</a> recommended an
another thing: <a href="http://www.someurl.com/coffee.html">Something
about Coffee</a></div>
			<div id="52396054" class="summary" style=""><span>This is a summary
with description ifnormation.</span></div>
	
		</td>
		<td class="toggle"><span class="timestamp">Apr, 1 2008</span></td>
	</tr>
</table>
...

Thoughts and feedback appreciated!

-Michael
From julian_bond at voidstar.com  Fri Apr 11 04:12:46 2008
From: julian_bond at voidstar.com (Julian Bond)
Date: Fri Apr 11 04:13:40 2008
Subject: [uf-dev] Parsing XFN in PHP
Message-ID: <2P$lv5Due0$HFApB@jblaptop.voidstar.com>

Continuing a thread that started on the Discuss list.

My experiments have led me to 2 approaches depending on PHP release.
First php5. With error handling left as an exercise for the reader

$url = 'http://ciaranmcnulty.com/';
if($html = @file_get_contents($url)){
  $dom = new DomDocument();
  if(@$dom->loadHtml($html)){
    if ($nodes = $dom->getElementsByTagName('a')) {
      foreach($nodes as $node){
        if ($node->getAttribute('rel')=='me') {
          echo $node->getAttribute('href');
        }
      }
    }
  }
}

Pretty easy, huh? Clearly this same approach could be used for other
values of rel= It's probably not too hard to extend this approach to
find hCard and other uFs.

loadHtml() doesn't exist in php4 dom-xml. In theory it should be
possible to use HTML-Tidy tidy_repair_string to clean the html first and
then feed it to domxml_open_mem. In practice, I'm having real trouble
getting the right collection of tidy_repair_string configuration
parameters to generate clean enough XML for dom to accept it. If that
can be done, then this should work.

$url = 'http://ciaranmcnulty.com/';
if($html = @file_get_contents($url)){
  $html = @tidy_repair_string($html);
  if ($dom = @domxml_open_mem($html)) ) {
    if ($nodes = $dom->get_elements_by_tagname('a')) {
      foreach($nodes as $node){
        if ($node->get_attribute('rel')=='me') {
          echo $node->get_attribute('href');
        }
      }
    }
  }
}

Typical errors are things like:-
- Space required after the Public Identifier
- SystemLiteral " or ' expected
- xmlParseExternalID: PUBLIC, no URI in
- invalid entity nbsp
Maybe, it's possible to get Tidy's output to avoid all these but I
haven't managed it yet. I had a look at hkit but that makes no attempt
to configure the Tidy module so I'd expect lots of problems when trying
to parse arbitrary web pages.

-- 
Julian Bond  E&MSN: julian_bond at voidstar.com  M: +44 (0)77 5907 2173
Webmaster:          http://www.ecademy.com/      T: +44 (0)192 0412 433
Personal WebLog:    http://www.voidstar.com/     skype:julian.bond?chat
                           Tastes Like Milk
From mark at markng.me.uk  Fri Apr 11 04:36:03 2008
From: mark at markng.me.uk (Mark Ng)
Date: Fri Apr 11 04:36:12 2008
Subject: [uf-dev] Parsing XFN in PHP
In-Reply-To: <2P$lv5Due0$HFApB@jblaptop.voidstar.com>
References: <2P$lv5Due0$HFApB@jblaptop.voidstar.com>
Message-ID: <d6fe3b060804110436h78192a6eod774a24fd21da64d@mail.gmail.com>

$html = tidy_repair_string($html,array('output-xhtml' => true,
'numeric-entities' => 'true', )); was what I was using - does it work
for you ?

Mark

On 11/04/2008, Julian Bond <julian_bond@voidstar.com> wrote:
> Continuing a thread that started on the Discuss list.
>
>  My experiments have led me to 2 approaches depending on PHP release.
>  First php5. With error handling left as an exercise for the reader
>
>
>  $url = 'http://ciaranmcnulty.com/';
>  if($html = @file_get_contents($url)){
>   $dom = new DomDocument();
>   if(@$dom->loadHtml($html)){
>
>     if ($nodes = $dom->getElementsByTagName('a')) {
>       foreach($nodes as $node){
>         if ($node->getAttribute('rel')=='me') {
>           echo $node->getAttribute('href');
>         }
>       }
>     }
>   }
>  }
>
>  Pretty easy, huh? Clearly this same approach could be used for other
>  values of rel= It's probably not too hard to extend this approach to
>  find hCard and other uFs.
>
>  loadHtml() doesn't exist in php4 dom-xml. In theory it should be
>  possible to use HTML-Tidy tidy_repair_string to clean the html first and
>  then feed it to domxml_open_mem. In practice, I'm having real trouble
>  getting the right collection of tidy_repair_string configuration
>  parameters to generate clean enough XML for dom to accept it. If that
>  can be done, then this should work.
>
>
>  $url = 'http://ciaranmcnulty.com/';
>  if($html = @file_get_contents($url)){
>
>   $html = @tidy_repair_string($html);
>   if ($dom = @domxml_open_mem($html)) ) {
>     if ($nodes = $dom->get_elements_by_tagname('a')) {
>       foreach($nodes as $node){
>         if ($node->get_attribute('rel')=='me') {
>           echo $node->get_attribute('href');
>         }
>       }
>     }
>   }
>  }
>
>  Typical errors are things like:-
>  - Space required after the Public Identifier
>  - SystemLiteral " or ' expected
>  - xmlParseExternalID: PUBLIC, no URI in
>  - invalid entity nbsp
>  Maybe, it's possible to get Tidy's output to avoid all these but I
>  haven't managed it yet. I had a look at hkit but that makes no attempt
>  to configure the Tidy module so I'd expect lots of problems when trying
>  to parse arbitrary web pages.
>
>
>  --
>  Julian Bond  E&MSN: julian_bond at voidstar.com  M: +44 (0)77 5907 2173
>  Webmaster:          http://www.ecademy.com/      T: +44 (0)192 0412 433
>  Personal WebLog:    http://www.voidstar.com/     skype:julian.bond?chat
>                            Tastes Like Milk
>  _______________________________________________
>
> microformats-dev mailing list
>  microformats-dev@microformats.org
>  http://microformats.org/mailman/listinfo/microformats-dev
>
From foolistbar at googlemail.com  Fri Apr 11 04:45:03 2008
From: foolistbar at googlemail.com (Geoffrey Sneddon)
Date: Fri Apr 11 04:52:51 2008
Subject: [uf-dev] Parsing XFN in PHP
In-Reply-To: <slv1d5-4a1.ln1@ophelia.g5n.co.uk>
References: <rdrCr8D7C2+HFADe@jblaptop.voidstar.com>
	<cdc278e10804080638l2f21f69cq8080eb2b80f32fae@mail.gmail.com>
	<s8ZTDcAx93+HFAwd@jblaptop.voidstar.com>
	<d9FPD$BGUG$HFAO8@jblaptop.voidstar.com>
	<73766b160804091118t1c5ad3bbof0bc5456898c2d1a@mail.gmail.com>
	<xHGk9$CIIc$HFAoc@jblaptop.voidstar.com>
	<cdc278e10804100203j44018259xcc9e03c1b4747932@mail.gmail.com>
	<006f01c89afa$b5afadb0$116bacca@COMCEN>
	<DrooWwBxJg$HFANI@jblaptop.voidstar.com>
	<d6fe3b060804100540i439baf44hb467c728bd2a6f8c@mail.gmail.com>
	<cdc278e10804100601u4a383962y7fbc4d53b1f923a2@mail.gmail.com>
	<D98F769B-8848-483C-AED5-4E5C38048584@gmail.com>
	<slv1d5-4a1.ln1@ophelia.g5n.co.uk>
Message-ID: <11156B01-48DD-435A-BFE4-F41F1CE661CE@googlemail.com>


On 10 Apr 2008, at 18:34, Toby A Inkster wrote:
> Ryan Parman wrote:
>
>> "But we can do it in web browsers!" What do web browsers have that  
>> PHP
>> developers don't? An HTML parser. As far as I know there are no HTML
>> parsers written for PHP (or any other language that I'm aware of).
>
> http://www.php.net/manual/en/function.dom-domdocument-loadhtml.php

That doesn't really work. libxml2's HTML parsing is nothing like what  
is actually needed for real world compatibility. Just take a look at  
things like <b><i>foo</b>bar</i>, or <plaintext>foo</plaintext><b>bar.


On 11 Apr 2008, at 08:33, Toby A Inkster wrote:
> Another option is XML_HTMLSax3 from PEAR:
> http://pear.php.net/package/XML_HTMLSax3

This really seems like nothing more than a subset of SGML similar to  
XML, and is therefore equally useless at parsing HTML. See the above  
two examples again, as well as things like <b<i>hi</i></b> (note the  
omitted >).

Real world HTML content really does rely on specific parsing rules,  
and attempting to deviate from them will just result in issues. In  
terms of anything useful, you'd really need to implement your own HTML  
parser, likely starting from HTML 5. Then you can run into issues with  
DOM requiring XML well-formedness, so you can't have as a localName  
"a@" (to reuse the example on public-html a few days ago, you need to  
parse <a@> <a#> </a@> correctly, despite all those tags having  
characters that you can't legally store in the DOM)


--
Geoffrey Sneddon
<http://gsnedders.com/>

From julian_bond at voidstar.com  Fri Apr 11 05:09:15 2008
From: julian_bond at voidstar.com (Julian Bond)
Date: Fri Apr 11 05:10:21 2008
Subject: [uf-dev] Parsing XFN in PHP
In-Reply-To: <d6fe3b060804110436h78192a6eod774a24fd21da64d@mail.gmail.com>
References: <2P$lv5Due0$HFApB@jblaptop.voidstar.com>
	<d6fe3b060804110436h78192a6eod774a24fd21da64d@mail.gmail.com>
Message-ID: <vfuj7pGrT1$HFAbo@jblaptop.voidstar.com>

Mark Ng <mark@markng.me.uk> Fri, 11 Apr 2008 12:36:03
>$html = tidy_repair_string($html,array('output-xhtml' => true,
>'numeric-entities' => 'true', )); was what I was using - does it work
>for you ?

I must have been getting tired last night. I'm sure I tried that. But 
today it's handling everything I can throw at it.

My test rig is here
http://www.voidstar.com/xfnexplorer

-- 
Julian Bond  E&MSN: julian_bond at voidstar.com  M: +44 (0)77 5907 2173
Webmaster:          http://www.ecademy.com/      T: +44 (0)192 0412 433
Personal WebLog:    http://www.voidstar.com/     skype:julian.bond?chat
                    No Wife, No Horse, No Moustache
From ryan.lists.warpshare at gmail.com  Fri Apr 11 09:38:54 2008
From: ryan.lists.warpshare at gmail.com (Ryan Parman)
Date: Fri Apr 11 09:39:02 2008
Subject: [uf-dev] Fwd: (Off-list) Parsing XFN in PHP
References: <11156B01-48DD-435A-BFE4-F41F1CE661CE@googlemail.com>
Message-ID: <7EE313F0-F45C-420F-BBFB-31C6AECED526@gmail.com>

Forwarding Geoffrey's off-list message sent to the original thread:


Begin forwarded message:

> From: Geoffrey Sneddon <foolistbar@googlemail.com>
> Date: April 11, 2008 4:45:03 AM PDT
> To: Toby A Inkster <mail@tobyinkster.co.uk>, Ryan Parman <ryan.lists.warpshare@gmail.com 
> >
> Subject: Re: (Off-list) Parsing XFN in PHP
>
>
> On 10 Apr 2008, at 18:34, Toby A Inkster wrote:
>> Ryan Parman wrote:
>>
>>> "But we can do it in web browsers!" What do web browsers have that  
>>> PHP
>>> developers don't? An HTML parser. As far as I know there are no HTML
>>> parsers written for PHP (or any other language that I'm aware of).
>>
>> http://www.php.net/manual/en/function.dom-domdocument-loadhtml.php
>
> That doesn't really work. libxml2's HTML parsing is nothing like  
> what is actually needed for real world compatibility. Just take a  
> look at things like <b><i>foo</b>bar</i>, or <plaintext>foo</ 
> plaintext><b>bar.
>
>
> On 11 Apr 2008, at 08:33, Toby A Inkster wrote:
>> Another option is XML_HTMLSax3 from PEAR:
>> http://pear.php.net/package/XML_HTMLSax3
>
> This really seems like nothing more than a subset of SGML similar to  
> XML, and is therefore equally useless at parsing HTML. See the above  
> two examples again, as well as things like <b<i>hi</i></b> (note the  
> omitted >).
>
> Real world HTML content really does rely on specific parsing rules,  
> and attempting to deviate from them will just result in issues. In  
> terms of anything useful, you'd really need to implement your own  
> HTML parser, likely starting from HTML 5. Then you can run into  
> issues with DOM requiring XML well-formedness, so you can't have as  
> a localName "a@" (to reuse the example on public-html a few days  
> ago, you need to parse <a@> <a#> </a@> correctly, despite all those  
> tags having characters that you can't legally store in the DOM)
>
>
> --
> Geoffrey Sneddon
> <http://gsnedders.com/>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080411/bd79448b/attachment.html
From gordon at onlinehome.de  Sat Apr 12 02:54:02 2008
From: gordon at onlinehome.de (Gordon Oheim)
Date: Sat Apr 12 02:54:06 2008
Subject: [uf-dev] Finalizing jCard
Message-ID: <480086BA.3000108@onlinehome.de>

Hi all,

the discussion about a standardized jCard output format seems to have 
slept in a bit - so I am here to revive it.

I'd say we are pretty much done with the specs, but there is one major 
point missing (see Section 2.2 in the wiki).
Can we do a vote on whether Arrays or Objects may be reduced in case 
they only contain a single property.

Though reducing Objects and Arrays would benefit a more compact JSON 
format, it would also require a little bit more business logic in the 
receiving system.
+1 Enclosing Arrays or Objects must NOT be reduced.

Cheers, Gordon

Wiki Page: http://microformats.org/wiki/jcard
From lists at ben-ward.co.uk  Mon Apr 14 07:12:55 2008
From: lists at ben-ward.co.uk (Ben Ward)
Date: Mon Apr 14 07:13:08 2008
Subject: [uf-dev] Finalizing jCard
In-Reply-To: <480086BA.3000108@onlinehome.de>
References: <480086BA.3000108@onlinehome.de>
Message-ID: <B2CAFE18-7E81-4D1D-B9C1-B60EDEAB40B5@ben-ward.co.uk>

On 12 Apr 2008, at 10:54, Gordon Oheim wrote:
> the discussion about a standardized jCard output format seems to  
> have slept in a bit - so I am here to revive it.
>
> I'd say we are pretty much done with the specs, but there is one  
> major point missing (see Section 2.2 in the wiki).
> Can we do a vote on whether Arrays or Objects may be reduced in case  
> they only contain a single property.

I think it's somewhat premature to suggest that we're ?pretty much  
done with? the specs. I'd like to see input from Mike Kapley, Glenn  
Jones, Brian Suda, Drew McLellan and David Janes (if he has time!)  
since they all work on parsers too.

Any attempt to standardise the object model of microformats is going  
to need their assistance, and they're also amongst the most  
experienced working with parsing. It's important they're give an  
opportunity to raise their own issues before this work gets pushed  
into finalisation.

Ben
From dmitry at baranovskiy.com  Mon Apr 14 15:19:24 2008
From: dmitry at baranovskiy.com (Dmitry Baranovskiy)
Date: Mon Apr 14 15:19:27 2008
Subject: [uf-dev] Finalizing jCard
In-Reply-To: <B2CAFE18-7E81-4D1D-B9C1-B60EDEAB40B5@ben-ward.co.uk>
References: <480086BA.3000108@onlinehome.de>
	<B2CAFE18-7E81-4D1D-B9C1-B60EDEAB40B5@ben-ward.co.uk>
Message-ID: <8a52ddad0804141519l40833892sc0efb6b925832d7d@mail.gmail.com>

Just an input from me: +1 Enclosing Arrays or Objects must NOT be reduced.
I implemented it opposite way in Optimus, but I am pretty sure it is
time to change it.
From brian.suda at gmail.com  Tue Apr 15 01:25:34 2008
From: brian.suda at gmail.com (Brian Suda)
Date: Tue Apr 15 01:25:37 2008
Subject: [uf-dev] Finalizing jCard
In-Reply-To: <B2CAFE18-7E81-4D1D-B9C1-B60EDEAB40B5@ben-ward.co.uk>
References: <480086BA.3000108@onlinehome.de>
	<B2CAFE18-7E81-4D1D-B9C1-B60EDEAB40B5@ben-ward.co.uk>
Message-ID: <21e770780804150125m65d0ecfemf3cd078dc39d7d90@mail.gmail.com>

2008/4/14, Ben Ward <lists@ben-ward.co.uk>:
> On 12 Apr 2008, at 10:54, Gordon Oheim wrote:
>
> > the discussion about a standardized jCard output format seems to have
> slept in a bit - so I am here to revive it.
--- my first suggestion is not to call it jCard, but something more
like JSON output of vCard or JSON to hCard mapping. As Ben said
earlier, if we start using jCard, then we'll have xCard, aCard,
pCard... all meaningless words. The same json mappings we make for
hCard will be effective for hCalendar, hReview, etc. so the
terminology should reflect this.

>  I think it's somewhat premature to suggest that we're 'pretty much done
> with' the specs.

--- i am not a JSON expert, so i can't weigh in on specifics, but
here's what i would suggest to help move things along.

Have a look at the current test suite. It has HTML and .vcf/.ics
output for the pages.
http://hg.microformats.org/tests

We should also create a .json output as well. Then we can have a
better point of discussion around real examples. This will help
clear-up any outstanding issues and at the same time give various
developers something to test their own code against.

>  Any attempt to standardise the object model of microformats is going to
> need their assistance, and they're also amongst the most experienced working
> with parsing. It's important they're give an opportunity to raise their own
> issues before this work gets pushed into finalisation.

--- i think the sample .json output from the tests will really help.
Without that, it is difficult to discuss exact parsing rules and
expected behaviours.

-brian

-- 
brian suda
http://suda.co.uk
From drew.mclellan at gmail.com  Tue Apr 15 01:42:56 2008
From: drew.mclellan at gmail.com (Drew McLellan)
Date: Tue Apr 15 01:42:59 2008
Subject: [uf-dev] Finalizing jCard
In-Reply-To: <21e770780804150125m65d0ecfemf3cd078dc39d7d90@mail.gmail.com>
References: <480086BA.3000108@onlinehome.de>
	<B2CAFE18-7E81-4D1D-B9C1-B60EDEAB40B5@ben-ward.co.uk>
	<21e770780804150125m65d0ecfemf3cd078dc39d7d90@mail.gmail.com>
Message-ID: <83a9a59b0804150142g4ccf15d1r2a69da32d1e3a93d@mail.gmail.com>

On 15/04/2008, Brian Suda <brian.suda@gmail.com> wrote:
>
> > > the discussion about a standardized jCard output format seems to have
> > slept in a bit - so I am here to revive it.
>
> --- my first suggestion is not to call it jCard, but something more
> like JSON output of vCard or JSON to hCard mapping. As Ben said
> earlier, if we start using jCard, then we'll have xCard, aCard,
> pCard... all meaningless words. The same json mappings we make for
> hCard will be effective for hCalendar, hReview, etc. so the
> terminology should reflect this.
>
> >  I think it's somewhat premature to suggest that we're 'pretty much done
> > with' the specs.
>
> --- i am not a JSON expert, so i can't weigh in on specifics, but
> here's what i would suggest to help move things along.
>
> Have a look at the current test suite. It has HTML and .vcf/.ics
> output for the pages.
> http://hg.microformats.org/tests
>
> We should also create a .json output as well. Then we can have a
> better point of discussion around real examples. This will help
> clear-up any outstanding issues and at the same time give various
> developers something to test their own code against.
>
> >  Any attempt to standardise the object model of microformats is going to
> > need their assistance, and they're also amongst the most experienced
> working
> > with parsing. It's important they're give an opportunity to raise their
> own
> > issues before this work gets pushed into finalisation.
>
> --- i think the sample .json output from the tests will really help.
> Without that, it is difficult to discuss exact parsing rules and
> expected behaviours.
>

Apologies that I'm late to this conversation ... I've been watching the idea
unfold but haven't had a moment to contribute so far.

I'd echo Brian's point about the name, but I'm not going to get hung up on
that.

However, the point about the test suites is crucial. If this is viable and
useful then having the hCard tests in JSON format will both help confirm and
encourage that. I'd just a compact format for the output (no whitespace etc)
so that it becomes simple to perform a basic string comparison to verify
results.

+1 to never reducing single-item arrays. This is something we're changing in
hkit already.

drew.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080415/1a52da16/attachment.html
From brian.suda at gmail.com  Tue Apr 15 05:57:38 2008
From: brian.suda at gmail.com (Brian Suda)
Date: Tue Apr 15 05:57:41 2008
Subject: [uf-dev] Finalizing jCard
In-Reply-To: <93E6A4D1-D835-4C14-B318-4269E46C45F3@tobyinkster.co.uk>
References: <93E6A4D1-D835-4C14-B318-4269E46C45F3@tobyinkster.co.uk>
Message-ID: <21e770780804150557x67ff3a57m7afc23007b0ed910@mail.gmail.com>

2008/4/15, Toby A Inkster <mail@tobyinkster.co.uk>:
>  Brian Suda wrote:
> > my first suggestion is not to call it jCard, but something more
> > like JSON output of vCard or JSON to hCard mapping. As Ben said
> > earlier, if we start using jCard, then we'll have xCard

>  An XML version of vCard
> <http://www.watersprings.org/pub/id/draft-dawson-vcard-xml-dtd-03.txt>
> already exists and predates hCard by a number of years, though it never
> reached RFC stage.

--- i must not have explained myself well enough, but your example
proves the point i was trying to make. Rather than calling it jCard,
for a JSON representation of vCard details. I was suggesting to call
it something like "JSON representation of vCard" just like the XML
representation of a vCard is not called xCard, but "An XML version of
vCard".

> > aCard, pCard
>
>  Not sure what those would be, but for other hierarchical
> markup/serialisation languages, I'd suggest that formats could be defined
> as:

--- i?m not talking about definitions of the serializations... just
giving examples that if we start putting [A-Za-z0-9] infront of Card,
we'll have an alphabet soup of formats which tell us nothing.

>  I would say that there exists no such function g() which allows for jCard -
> or anything *like* jCard - to be defined in those terms, thus it is
> justified to dedicate effort into defining jCard explicitly.

--- the other thing i think we are hung-up on is solving the JSON
representation for a single format. We have several design patterns to
map VCF/ICS data to HTML, the class design pattern, the rel-design
pattern and others. IMHO This is the best way forward to map
Microformatted HTML to JSON in a similar manner, through patterns -
not specific formats. Lets not worry about XYZ format mapping to JSON,
we should look at a mf2json() mapping.

-brian

-- 
brian suda
http://suda.co.uk

From lists at ben-ward.co.uk  Tue Apr 15 06:42:35 2008
From: lists at ben-ward.co.uk (Ben Ward)
Date: Tue Apr 15 06:42:39 2008
Subject: [uf-dev] Microformat Object Models (was: Finalizing jCard)
In-Reply-To: <21e770780804150557x67ff3a57m7afc23007b0ed910@mail.gmail.com>
References: <93E6A4D1-D835-4C14-B318-4269E46C45F3@tobyinkster.co.uk>
	<21e770780804150557x67ff3a57m7afc23007b0ed910@mail.gmail.com>
Message-ID: <AD436B3D-DC5E-45B1-8BB3-02625F1A4C41@ben-ward.co.uk>


On 15 Apr 2008, at 13:57, Brian Suda wrote:
>> I would say that there exists no such function g() which allows for  
>> jCard -
>> or anything *like* jCard - to be defined in those terms, thus it is
>> justified to dedicate effort into defining jCard explicitly.
>
> --- the other thing i think we are hung-up on is solving the JSON
> representation for a single format. We have several design patterns to
> map VCF/ICS data to HTML, the class design pattern, the rel-design
> pattern and others. IMHO This is the best way forward to map
> Microformatted HTML to JSON in a similar manner, through patterns -
> not specific formats. Lets not worry about XYZ format mapping to JSON,
> we should look at a mf2json() mapping.

?? Defining ?jCard? explicitly is a perfectly valid effort, but  
within the microformats community ? ? where we're working within the  
scope of HTML ?? the focus is to solve the problem of parsers  
producing inconsistent output, hence my emphasis on this being the  
?hCard Object Model? (vis a vis the DOM, CSS OM). My view is that If  
that effort produces a defined vCard in JSON format as well then so be  
it, but for me, the lack of a vCard->JSON format is not the problem  
itself.

? Object Model consistency needs to be fixed for all other  
microformats, too, which gives weight to Brian's generic approach. If  
a set of generic parsing rules and patterns is robust enough and can  
be documented tightly enough to be implemented, then it's probably the  
way to go.  Should we perhaps be looking to better define the data  
types at a schema level, which then map to parsing rules?

To Glenn Jones: You said you might have an example of the kind of  
model documentation you'd like to implement against. Were you able to  
find any examples of this?

B
From aconbere at gmail.com  Tue Apr 15 10:34:52 2008
From: aconbere at gmail.com (anders conbere)
Date: Tue Apr 15 10:34:57 2008
Subject: [uf-dev] Microformat Object Models (was: Finalizing jCard)
In-Reply-To: <AD436B3D-DC5E-45B1-8BB3-02625F1A4C41@ben-ward.co.uk>
References: <93E6A4D1-D835-4C14-B318-4269E46C45F3@tobyinkster.co.uk>
	<21e770780804150557x67ff3a57m7afc23007b0ed910@mail.gmail.com>
	<AD436B3D-DC5E-45B1-8BB3-02625F1A4C41@ben-ward.co.uk>
Message-ID: <8ca3fbe80804151034i6c01d651j528013803d66c571@mail.gmail.com>

On Tue, Apr 15, 2008 at 6:42 AM, Ben Ward <lists@ben-ward.co.uk> wrote:
>
>  On 15 Apr 2008, at 13:57, Brian Suda wrote:
>
> >
> > > I would say that there exists no such function g() which allows for
> jCard -
> > > or anything *like* jCard - to be defined in those terms, thus it is
> > > justified to dedicate effort into defining jCard explicitly.
> > >
> >
> > --- the other thing i think we are hung-up on is solving the JSON
> > representation for a single format. We have several design patterns to
> > map VCF/ICS data to HTML, the class design pattern, the rel-design
> > pattern and others. IMHO This is the best way forward to map
> > Microformatted HTML to JSON in a similar manner, through patterns -
> > not specific formats. Lets not worry about XYZ format mapping to JSON,
> > we should look at a mf2json() mapping.
> >
>
>  ?? Defining 'jCard' explicitly is a perfectly valid effort, but within the
> microformats community ? ? where we're working within the scope of HTML ??
> the focus is to solve the problem of parsers producing inconsistent output,
> hence my emphasis on this being the 'hCard Object Model' (vis a vis the DOM,
> CSS OM). My view is that If that effort produces a defined vCard in JSON
> format as well then so be it, but for me, the lack of a vCard->JSON format
> is not the problem itself.
>
>  ? Object Model consistency needs to be fixed for all other microformats,
> too, which gives weight to Brian's generic approach. If a set of generic
> parsing rules and patterns is robust enough and can be documented tightly
> enough to be implemented, then it's probably the way to go.  Should we
> perhaps be looking to better define the data types at a schema level, which
> then map to parsing rules?

Dan Brickley and I had a couple of good conversations at BlogTalk
about how microformats could really use an assertion based approach to
parsing. If you see every data item as a claim then everything becomes
tuples.

(Anders Conbere, has a, Hcard)
(Hcard, has a, Address)
(Address, has a, Street)
(Street, is, 7511 Jones Ave NW)

When you organize data structures like this it becomes trivially easy
to define what a correct set of claims are for any given microformat
and test for the correctness of a parsing output.

Some of you might recognize this as the stance the rdf takes with it's testing

http://www.w3.org/TR/rdf-testcases/

when I brought this up a month ago there was some strong push back
from tantec for what I felt was a reluctance to begin to solidify the
definitions of what is a very loose set of specs.

That being said, it's REALLY REALLY hard to parse microformats
properly today, having a test harness to run my parser against would
help immensely, but that requires the organization to put some work
into solidifying the way the specs work.

(One of the other nice things about specing your formats as rdf, is
that you can easily create grddl documents for them and parsers are
really good at parsing rdf.)

~ Anders




>
>  To Glenn Jones: You said you might have an example of the kind of model
> documentation you'd like to implement against. Were you able to find any
> examples of this?
>
>  B
>  _______________________________________________
>  microformats-dev mailing list
>  microformats-dev@microformats.org
>  http://microformats.org/mailman/listinfo/microformats-dev
>

From msporny at digitalbazaar.com  Tue Apr 15 13:05:48 2008
From: msporny at digitalbazaar.com (Manu Sporny)
Date: Tue Apr 15 13:05:56 2008
Subject: [uf-dev] Microformat Object Models
In-Reply-To: <8ca3fbe80804151034i6c01d651j528013803d66c571@mail.gmail.com>
References: <93E6A4D1-D835-4C14-B318-4269E46C45F3@tobyinkster.co.uk>	<21e770780804150557x67ff3a57m7afc23007b0ed910@mail.gmail.com>	<AD436B3D-DC5E-45B1-8BB3-02625F1A4C41@ben-ward.co.uk>
	<8ca3fbe80804151034i6c01d651j528013803d66c571@mail.gmail.com>
Message-ID: <48050A9C.7000100@digitalbazaar.com>

anders conbere wrote:
> Some of you might recognize this as the stance the rdf takes with it's testing
> 
> http://www.w3.org/TR/rdf-testcases/

It is also the approach that RDFa takes when checking parser conformance
against the RDFa specification.

Check out the RDFa Test Harness and Unit Tests:

http://rdfa.digitalbazaar.com/rdfa-test-harness/

You can plug in different parsers and test them for conformance using
the utility above - which has helped when tracking down parser issues.
It also allows a developer to check their implementation against a test
suite that the community has agreed upon.

However, to get something like the above working for this community,
we'd have to:

- Agree on a parser specification (or set of specifications) for
  Microformats.
- Agree on a serialization format for Microformats (JSON/XML/N3/etc).
- Agree on a set of unit tests for Microformats.
- Agree on a method of checking the results of parsers.

In the RDFa community, this is what happened:

- Agree on a parser specification: Standardized by the W3C
- Agree on a serialization format: RDF
- Agree on a set of unit tests   : Standardized by the W3C
- Agree on a method of checking the results of parsers: SPARQL

-- manu
From danny.ayers at gmail.com  Tue Apr 15 14:00:29 2008
From: danny.ayers at gmail.com (Danny Ayers)
Date: Tue Apr 15 14:07:40 2008
Subject: [uf-dev] Microformat Object Models
In-Reply-To: <48050A9C.7000100@digitalbazaar.com>
References: <93E6A4D1-D835-4C14-B318-4269E46C45F3@tobyinkster.co.uk>
	<21e770780804150557x67ff3a57m7afc23007b0ed910@mail.gmail.com>
	<AD436B3D-DC5E-45B1-8BB3-02625F1A4C41@ben-ward.co.uk>
	<8ca3fbe80804151034i6c01d651j528013803d66c571@mail.gmail.com>
	<48050A9C.7000100@digitalbazaar.com>
Message-ID: <1f2ed5cd0804151400h74db8320rf025431d1f9bc8b1@mail.gmail.com>

Using RDF as a model would have its advantages:

* the W3C test harness could be reused
* it's straightforward
* some of the modelling has already been done
* Semantic Web integration comes free

at http://esw.w3.org/topic/CustomRdfDialects
there are links to several microformat2rdfxml XSLT transformations - at
least some of them are less-than-perfect, but should be good enough to
bootstrap

(incidentally a lot of the material there originated on a page called
http://esw.w3.org/topic/MicroModels - it got rebranded :-)

SPARQL-capable RDF tools are available for pretty much every
language/platform, and test SPARQL would be pretty easy to write. SPARQL
results can appear in XML or JSON - which could be handy in this context.
http://www.w3.org/TR/rdf-sparql-json-res/

There's also a JSON syntax for RDF, and at least two online converters:

http://n2.talis.com/wiki/RDF_JSON_Specification
http://triplr.org
http://convert.test.talis.com/

The RDF/JSON result would no doubt look different from the intended
microformat/JSON, but  it shouldn't take much script to convert for testing
purposes.

On 15/04/2008, Manu Sporny <msporny@digitalbazaar.com> wrote:
Sorry Manu, nitpicking, expanding your shorthand -

- Agree on a serialization format for Microformats (JSON/XML/N3/etc).


presumably=>
Agree on a Microformats model
Agree on a serialization format for Microformats model (JSON/XML/N3/etc).

- Agree on a serialization format: RDF


presumably=>
Agree on an RDF model : RDF  (easy one that)
Agree on a serialization format for RDF model : RDF/XML (I'm assuming)


Cheers,
Danny.

-- 
http://dannyayers.com
~
http://blogs.talis.com/nodalities/this_weeks_semantic_web/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080415/189f6b13/attachment.html
From rff.rff at gmail.com  Thu Apr 17 04:53:40 2008
From: rff.rff at gmail.com (gabriele renzi)
Date: Thu Apr 17 04:53:43 2008
Subject: [uf-dev] doubts intepreting the hListing spec draft
Message-ID: <828083e70804170453hf8924balcab358e61a5210d4@mail.gmail.com>

Hi everyone,

this is my first post to this list so sorry if I ask something stupid,
but I could not find details on this.

I'm trying to write an hListing parser/extractor but there is
something not clear in the draft spec page.

The schema does not have reference to item type, which is then described later.
I'd fix the page by myself but I'm not sure if we have to keep the
item-type (fix schema) or if it's not there anymore (fix summary of
changes+field details).

Also, I'm not sure: where a field is described as
 hCard | (fn || email || url || tel)
how shall I read the or's ?

I believe that the single pipe is to be read as an exclusive or (use
hcard or values), while the double pipe is inclusive (use fn, possibly
with email, url etc), is this correct? If not is there documentation
for this short-hand syntax somewhere?

Thanks in advance.
From lists at ben-ward.co.uk  Thu Apr 17 06:12:18 2008
From: lists at ben-ward.co.uk (Ben Ward)
Date: Thu Apr 17 06:12:23 2008
Subject: [uf-dev] doubts intepreting the hListing spec draft
In-Reply-To: <828083e70804170453hf8924balcab358e61a5210d4@mail.gmail.com>
References: <828083e70804170453hf8924balcab358e61a5210d4@mail.gmail.com>
Message-ID: <09305870-B939-477D-8B41-9721C532BEB3@ben-ward.co.uk>

Hi Gabriele,

Thanks for posting about hListing.

First up, the entire wiki page you're working from is due a BIG  
update, which I've got pending and which I'll follow through on very  
soon. I'm sorry for the delay on that.

On 17 Apr 2008, at 12:53, gabriele renzi wrote:
> The schema does not have reference to item type, which is then  
> described later.
> I'd fix the page by myself but I'm not sure if we have to keep the
> item-type (fix schema) or if it's not there anymore (fix summary of
> changes+field details).

So, item type somewhat conflicts with ?listing action? and also the  
inferred type from item itself (using hCalendar would imply being an  
event, for example). My advice right now is to ignore that field, or  
just parse <foo class="item"><bar class="type"> as plain text if you  
can find evidence of it being used (by way of example, we didn't  
publish ?type? on Kelkoo as the definition was fuzzy and we didn't  
want to accidentally steamroller it into the spec).

> Also, I'm not sure: where a field is described as
> hCard | (fn || email || url || tel)
> how shall I read the or's ?

That's just badly phrased. fn, email, url and tel are all fields of  
hcard; every lister should be an hcard (in spec terms, probably ?must?  
but until the draft is updated I'll avoid such firm terms).


Thanks very much for your effort on hListing. If you've got any issues  
you find please post to the microformats-new list, or add them to the  
hlisting-issues page on the wiki: http://microformats.org/wiki/hlisting-issues

Regards,

Ben


From rff.rff at gmail.com  Thu Apr 17 07:05:08 2008
From: rff.rff at gmail.com (gabriele renzi)
Date: Thu Apr 17 07:05:15 2008
Subject: [uf-dev] doubts intepreting the hListing spec draft
In-Reply-To: <09305870-B939-477D-8B41-9721C532BEB3@ben-ward.co.uk>
References: <828083e70804170453hf8924balcab358e61a5210d4@mail.gmail.com>
	<09305870-B939-477D-8B41-9721C532BEB3@ben-ward.co.uk>
Message-ID: <828083e70804170705o1114f8a4ka3f71cb1e6579c8e@mail.gmail.com>

On Thu, Apr 17, 2008 at 2:12 PM, Ben Ward <lists@ben-ward.co.uk> wrote:
> Hi Gabriele,
>
>  Thanks for posting about hListing.
>
>  First up, the entire wiki page you're working from is due a BIG update,
> which I've got pending and which I'll follow through on very soon. I'm sorry
> for the delay on that.

No worries, thanks for the quick and detailed answer.
I'll ask on microformats-new if I find something else that's unclear
to me, and wait for the updated spec.
Meanwhile I'm more than happy to skip doubtful things :)


-- 

blog it: http://riffraff.blogsome.com
blog en: http://www.riffraff.info
From microformats at kaply.com  Mon Apr 21 10:38:29 2008
From: microformats at kaply.com (Mike Kaply)
Date: Mon Apr 21 11:43:32 2008
Subject: [uf-dev] Proper use of value
Message-ID: <e06e0e0b0804211038x3597f24ey63d58a546024755d@mail.gmail.com>

Can someone please tell me if Roger Costellos examples for value
(Pages 13, 14, and 15 here -
http://www.xfront.com/microformats/hCard.html) are correct?

There seems to be some confusion around whitespace with regards to
value and I like to get it clarified so I do the right thing in FF3.

Basically I am allowing all whitespace in "value" but apparently others are not.

Also note that I don't get any notes to them mailing list for some
strange reason, so please email me as well as the list.

Thank you.

Mike Kaply
From mail at tobyinkster.co.uk  Mon Apr 21 23:09:34 2008
From: mail at tobyinkster.co.uk (Toby A Inkster)
Date: Mon Apr 21 23:09:41 2008
Subject: [uf-dev] Proper use of value
Message-ID: <D6FFB98F-2B47-4097-AEF3-06505AB2FBB0@tobyinkster.co.uk>

> Can someone please tell me if Roger Costellos examples for value
> (Pages 13, 14, and 15 here -
> http://www.xfront.com/microformats/hCard.html) are correct?

They look OK to me. Thanks for posting the examples though because  
they've helped me fix an annoying bug in Cognition's handling of  
this. (It has some code for specifically avoiding trimming white  
space from value-excerpted parts, but that code wasn't being  
triggered correctly, and white space was being trimmed resulting in  
fn="JohnSmith". I've fixed it now and will include the fix in my next  
release.)

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>



From microformats at kaply.com  Tue Apr 22 07:53:57 2008
From: microformats at kaply.com (Mike Kaply)
Date: Tue Apr 22 07:54:06 2008
Subject: [uf-dev] Proper use of value
In-Reply-To: <D6FFB98F-2B47-4097-AEF3-06505AB2FBB0@tobyinkster.co.uk>
References: <D6FFB98F-2B47-4097-AEF3-06505AB2FBB0@tobyinkster.co.uk>
Message-ID: <e06e0e0b0804220753y11014d16y30d730d949201dd7@mail.gmail.com>

On Tue, Apr 22, 2008 at 1:09 AM, Toby A Inkster <mail@tobyinkster.co.uk> wrote:
> > Can someone please tell me if Roger Costellos examples for value
> > (Pages 13, 14, and 15 here -
> > http://www.xfront.com/microformats/hCard.html) are correct?
> >
>
>  They look OK to me. Thanks for posting the examples though because they've
> helped me fix an annoying bug in Cognition's handling of this. (It has some
> code for specifically avoiding trimming white space from value-excerpted
> parts, but that code wasn't being triggered correctly, and white space was
> being trimmed resulting in fn="JohnSmith". I've fixed it now and will
> include the fix in my next release.)

For the record, other parsers do this differently - they trim all
whitespace (even in values).

What's I'm looking for is the definitive answer as to what the "right
thing" to do is. There are a ton of edge cases that are simply poorly
defined within the microformats spec.

Mike Kaply
From brian.suda at gmail.com  Tue Apr 22 09:57:21 2008
From: brian.suda at gmail.com (Brian Suda)
Date: Tue Apr 22 09:57:23 2008
Subject: [uf-dev] Proper use of value
In-Reply-To: <e06e0e0b0804220753y11014d16y30d730d949201dd7@mail.gmail.com>
References: <D6FFB98F-2B47-4097-AEF3-06505AB2FBB0@tobyinkster.co.uk>
	<e06e0e0b0804220753y11014d16y30d730d949201dd7@mail.gmail.com>
Message-ID: <21e770780804220957t18233eddjcae0e54693f515e8@mail.gmail.com>

2008/4/22, Mike Kaply <microformats@kaply.com>:
> For the record, other parsers do this differently - they trim all
>  whitespace (even in values).

--- we should certainly try to get them inline and decide on a single
way to do this.

>  What's I'm looking for is the definitive answer as to what the "right
>  thing" to do is. There are a ton of edge cases that are simply poorly
>  defined within the microformats spec.

--- i can't give you a definitive answer, but i think and parse any
class="value" and do NOT trim white-space, but i do collapse it. Value
is something extra that the user adds, so i take the assumption they
know what they are doing and that they meant to include that space. (i
do think i reduce multiple spaces, tabs, returns to a single space - i
need to confirm this)

There was/is also some parsers that intentionally ADD a space, i would
say that this is incorrect.

If we add this to the wiki as an issue, hopefully we can document a
correct answer in some form, that way we have a reference for parser
updates.

-brian


-- 
brian suda
http://suda.co.uk
From mail at tobyinkster.co.uk  Tue Apr 22 11:01:28 2008
From: mail at tobyinkster.co.uk (Toby A Inkster)
Date: Tue Apr 22 11:10:22 2008
Subject: [uf-dev] Proper use of value
Message-ID: <DED7B391-6E99-4219-BAC1-0CBDC988A87B@tobyinkster.co.uk>

Brian Suda wrote:

> i can't give you a definitive answer, but i think and parse any
> class="value" and do NOT trim white-space, but i do collapse it. Value
> is something extra that the user adds, so i take the assumption they
> know what they are doing and that they meant to include that space. (i
> do think i reduce multiple spaces, tabs, returns to a single space - i
> need to confirm this)

For the record, the behaviour used by Cognition (or at least its  
intended behaviour - as I said, there is a bug in the latest released  
version pertaining to this issue) is:

* Within each element with class="value", expanses of white space are  
collapsed into single spaces.
* Within each element with class="value", white space is *not*  
trimmed from the beginning or end of the value (although it is  
collapsed as per above).
* All the elements with class="value" are then joined together  
without any interleaving white space to form a combined string.
* Within the combined string, expanses of white space are collapsed  
into single spaces.
* Within the combined string, white space *is* trimmed from the  
beginning and end.

In my experience, this seems to work well for the vast majority of  
real-world cases. (The percentage of pages that actually *use*  
multiple elements with class="value" for a single property is tiny  
anyway.)

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>



From mkaply at us.ibm.com  Tue Apr 22 12:00:27 2008
From: mkaply at us.ibm.com (Michael Kaply)
Date: Tue Apr 22 12:00:42 2008
Subject: [uf-dev] Proper use of value
Message-ID: <OF0C855401.4FD0009F-ON86257433.0067F6D6-86257433.0068694E@us.ibm.com>


OK, how about this.

When retrieving individual values from the documenting if there is any
whitespace, it is collapsed into one space, and leading and trailing white
space is NOT removed.

After the values have been concatenated to create the final value, if there
is any whitespace, it is collapsed into one space, and leading and trailing
whitespace IS removed.

So all of these:

<fn>
<value>John</value>
<value> </value>
<value>Doe</value>
</fn>
<fn>
<value>John</value>
<value>          </value>
<value>Doe</value>
</fn>
<fn>
<value>              John</value>
<value> </value>
<value>Doe                </value>
</fn>
<fn>
<value>John                 </value>
<value>              </value>
<value>                 Doe</value>
</fn>
<fn>
<value>John                 </value>
<value>                 Doe</value>
</fn>
<fn>
<value>          John                 </value>
<value>                 Doe          </value>
</fn>

become

|John Doe|

but this:

<fn>
<value>John</value>
<value>Doe</value>
</fn>

becomes


|JohnDoe|

Does that sound right?

Michael Kaply
Firefox Advocate
mkaply@us.ibm.com
http://www.kaply.com/weblog/ (External Blog)
http://blogs.tap.ibm.com/weblogs/page/mkaply@us.ibm.com (Internal Blog)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080422/8ba8eeec/attachment.html
From msporny at digitalbazaar.com  Tue Apr 22 12:37:56 2008
From: msporny at digitalbazaar.com (Manu Sporny)
Date: Tue Apr 22 12:38:20 2008
Subject: [uf-dev] Proper use of value
In-Reply-To: <OF0C855401.4FD0009F-ON86257433.0067F6D6-86257433.0068694E@us.ibm.com>
References: <OF0C855401.4FD0009F-ON86257433.0067F6D6-86257433.0068694E@us.ibm.com>
Message-ID: <480E3E94.6050209@digitalbazaar.com>

Michael Kaply wrote:
> OK, how about this.
> 
> When retrieving individual values from the documenting if there is any
> whitespace, it is collapsed into one space, and leading and trailing
> white space is NOT removed.

Just my $0.02 on this - we had a very involved discussion (lasting
several months) when tackling this problem at the W3C with regards to
how to do whitespace canonicalization  in RDFa. In the end, we stated
that the parser should keep the original text as is (including all
whitespace), and it's up to the application to normalize spaces in a way
that makes sense to the application.

Note that we make a strong distinction between the parser (eg:
librdfa[1]) and the application using the parser (Firefox + Fuzzbot[2]).

The primary reasoning for this is that several people had different ways
that they wanted to canonicalize whitespace and at the end of the day,
we didn't want to force application writers into a certain method of
whitespace canonicalization. Here's the actual text that we settled upon
at the W3C with regard to whitespace canonicalization:

PLAIN LITERAL (aka: basic text) CANONICALIZATION:

"The actual literal is ... a string created by concatenating the text
content of each of the descendant elements of the [current element] in
document order."

This means that all new lines, tabs, spaces and other whitespace
characters are preserved for processing at a later time by the
application that is using the parser.

I think the above is the proper approach - otherwise you end up with the
issues that we had with whitespace canonicalization and Internet
Explorer 6. IE6 assumes that you want the whitespace canonicalized in a
certain way, thus the non-canonicalized whitespace isn't available in
the DOM accessed via Javascript. When you choose to perform whitespace
canonicalization in a certain way - you're bound to tick off a sub-set
of developers/authors. :)

Does this approach sound like a better one to take?

-- manu

[1] http://rdfa.digitalbazaar.com/librdfa/
[2] http://rdfa.digitalbazaar.com/fuzzbot/

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: RDFa Basics in 8 minutes (video)
http://blog.digitalbazaar.com/2008/01/07/rdfa-basics/
From rff.rff at gmail.com  Tue Apr 22 12:41:10 2008
From: rff.rff at gmail.com (gabriele renzi)
Date: Tue Apr 22 12:41:16 2008
Subject: [uf-dev] Proper use of value
In-Reply-To: <OF0C855401.4FD0009F-ON86257433.0067F6D6-86257433.0068694E@us.ibm.com>
References: <OF0C855401.4FD0009F-ON86257433.0067F6D6-86257433.0068694E@us.ibm.com>
Message-ID: <828083e70804221241l27b42de2h809ebbe53257aef7@mail.gmail.com>

On Tue, Apr 22, 2008 at 8:00 PM, Michael Kaply <mkaply@us.ibm.com> wrote:
>
>
> OK, how about this.
>
>  When retrieving individual values from the documenting if there is any
> whitespace, it is collapsed into one space, and leading and trailing white
> space is NOT removed.
>
>  After the values have been concatenated to create the final value, if there
> is any whitespace, it is collapsed into one space, and leading and trailing
> whitespace IS removed.


Isn't the first pass of removing multiple spaces implicit in the second pass?
Is it different from just saying
* concat all values
* collapse whitespaces
* trim
 ?

anyway my modest opinion as an incompetent who joined this list just
few days ago is that this sound correct :)
From mail at tobyinkster.co.uk  Tue Apr 22 13:38:47 2008
From: mail at tobyinkster.co.uk (Toby A Inkster)
Date: Tue Apr 22 13:38:56 2008
Subject: [uf-dev] Proper use of value
Message-ID: <0F65BA62-AEDC-4FB6-853F-B2ABB032BA2D@tobyinkster.co.uk>

Manu Sporny wrote:

> Just my $0.02 on this - we had a very involved discussion (lasting
> several months) when tackling this problem at the W3C with regards to
> how to do whitespace canonicalization  in RDFa. In the end, we stated
> that the parser should keep the original text as is (including all
> whitespace), and it's up to the application to normalize spaces in  
> a way
> that makes sense to the application.

Unfortunately for some microformats, the parser *needs* to know about  
white space. The example which springs to mind is N-optimisation in  
hCard. This:

	<span class="fn">JohnDoe</span>

is parsed as:

	FN:JohnDoe
	NICKNAME:JohnDoe

Whereas this:

	<span class="fn">John Doe</span>

is parsed as:

	FN:John Doe
	N:Doe;John

In RDF terms, the white space in the object literal effects the  
choice of predicate. So it is important to know how white space  
should be interpreted, at least in some situations.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>



From msporny at digitalbazaar.com  Tue Apr 22 19:00:07 2008
From: msporny at digitalbazaar.com (Manu Sporny)
Date: Tue Apr 22 19:26:01 2008
Subject: [uf-dev] Proper use of value
In-Reply-To: <0F65BA62-AEDC-4FB6-853F-B2ABB032BA2D@tobyinkster.co.uk>
References: <0F65BA62-AEDC-4FB6-853F-B2ABB032BA2D@tobyinkster.co.uk>
Message-ID: <480E9827.4020300@digitalbazaar.com>

Toby A Inkster wrote:
> Unfortunately for some microformats, the parser *needs* to know about
> white space. The example which springs to mind is N-optimisation in
> hCard. 

Hmm... That's not evident to me. I understand your point, and it's
certainly valid - but there's a nuance.

To say that the parser "*needs* to know about whitespace" is different
from saying that "we should preserve the original whitespace". We can
have both.

My previous post stated differently could read:

"As a general rule, we should preserve any and all whitespace in the
parser model. Only when the information is displayed or exported from
the parser model should we canonicalize whitespace, and only when it
makes sense to do so."

> This:
> 
>     <span class="fn">JohnDoe</span>
> 
> is parsed as:
> 
>     FN:JohnDoe
>     NICKNAME:JohnDoe
> 
> Whereas this:
> 
>     <span class="fn">John Doe</span>
> 
> is parsed as:
> 
>     FN:John Doe
>     N:Doe;John
> 
> In RDF terms, the white space in the object literal effects the choice
> of predicate. So it is important to know how white space should be
> interpreted, at least in some situations.

I don't think the above is a good example. I'm racking my brain to come
up with a reason to canonicalize whitespace in the parser. I don't think
throwing away the original stuff buys us anything. For example:

   <span class="fn">  John     Doe   </span>
   <span class="fn">John Doe</span>

Both of the above would parse to:

   FN:John Doe
   N:Doe;John

However, I think the proper thing to give the developer back when they
ask for the contents of FN should be "  John     Doe   ".

The application can then make the decision to canonicalize the
whitespace when a) displaying it in an interface or b) exporting it to
another format, such as VCARD.

As far as the example you gave above... I would expect that the hCard
optimization step would be performed after the parser acquired all of
the data from the page. FN would contain "  John     Doe   ", and thus
the N-optimization would trim all whitespace, split the string and
encode it as "Doe;John". In other words, N-optimization is a
post-processing step performed after the parser-proper runs.

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: RDFa Basics in 8 minutes (video)
http://blog.digitalbazaar.com/2008/01/07/rdfa-basics/
From mkaply at us.ibm.com  Wed Apr 23 09:11:01 2008
From: mkaply at us.ibm.com (Michael Kaply)
Date: Wed Apr 23 13:41:17 2008
Subject: [uf-dev] Proper use of value
In-Reply-To: <480E9827.4020300@digitalbazaar.com>
Message-ID: <OFD2BF50DC.6AB2156C-ON86257434.0058AC00-86257434.0058E641@us.ibm.com>

I think the RDF situation is very different than the microformats with
regards to the whitespace problem.

With microformats, you are adding the microformat classes to existing
content, so you are
probably putting them around a lot of various whitespace (carriage returns,
line feed, etc.)

With RDF, things are done a little more granular.

I think parsers should definitely remove the whitespace because what we are
making available should equate to the
HTML content and the HTML content has whitespace collapsed and removed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080423/58468ac8/attachment.html
From msporny at digitalbazaar.com  Wed Apr 23 14:14:04 2008
From: msporny at digitalbazaar.com (Manu Sporny)
Date: Wed Apr 23 14:14:28 2008
Subject: [uf-dev] Proper use of value
In-Reply-To: <OFD2BF50DC.6AB2156C-ON86257434.0058AC00-86257434.0058E641@us.ibm.com>
References: <OFD2BF50DC.6AB2156C-ON86257434.0058AC00-86257434.0058E641@us.ibm.com>
Message-ID: <480FA69C.5000308@digitalbazaar.com>

Michael Kaply wrote:
> I think the RDF situation is very different than the microformats with
> regards to the whitespace problem.
> 
> With microformats, you are adding the microformat classes to existing
> content, so you are
> probably putting them around a lot of various whitespace (carriage
> returns, line feed, etc.)

Hmm... do you mean RDF or RDFa? :)

If you mean RDF, then yes I agree - the two situations are very
different. If you mean RDFa, then I don't agree as insertion of RDFa and
Microformats into pre-existing XHTML is done in more-or-less the same way.

The majority of the RDFa use cases have RDFa added to existing XHTML web
pages... so I believe the same whitespace issues exist for RDFa as they
do for Microformats.

> I think parsers should definitely remove the whitespace because what we
> are making available should equate to the
> HTML content and the HTML content has whitespace collapsed and removed.

What about PRE tags? Or the use of any CSS 'white-space'[1] style that
isn't 'normal'. This is important in poetry and other pre-formatted text
on the net.

For example:

<span style="white-space: pre-line>
A crash reduces
Your expensive computer
To a simple stone.
</span>

By stating that uF parsers should remove whitespace, we're unnecessarily
invalidating all of those use cases.

-- manu


[1] http://webdesign.about.com/od/styleproperties/p/blspwhitespace.htm
From brian.suda at gmail.com  Thu Apr 24 04:17:40 2008
From: brian.suda at gmail.com (Brian Suda)
Date: Thu Apr 24 04:17:45 2008
Subject: [uf-dev] Proper use of value
In-Reply-To: <OF0C855401.4FD0009F-ON86257433.0067F6D6-86257433.0068694E@us.ibm.com>
References: <OF0C855401.4FD0009F-ON86257433.0067F6D6-86257433.0068694E@us.ibm.com>
Message-ID: <21e770780804240417jb245e75o5389e7bbdb6614be@mail.gmail.com>

2008/4/22, Michael Kaply <mkaply@us.ibm.com>:
> OK, how about this.

>  So all of these:
>
>  <fn>
>  <value>John</value>
>  <value> </value>
>  <value>Doe</value>
>  </fn>
>  <fn>
>  <value>John</value>
>  <value> </value>
>  <value>Doe</value>
>  </fn>
>  <fn>
>  <value> John</value>
>  <value> </value>
>  <value>Doe </value>
>  </fn>
>  <fn>
>  <value>John </value>
>  <value> </value>
>  <value> Doe</value>
>  </fn>
>  <fn>
>  <value>John </value>
>  <value> Doe</value>
>  </fn>
>  <fn>
>  <value> John </value>
>  <value> Doe </value>
>  </fn>
>
>  become
>
>  |John Doe|
>
>  but this:
>
>  <fn>
>  <value>John</value>
>  <value>Doe</value>
>  </fn>
>
>  becomes
>
>
>  |JohnDoe|
>
>  Does that sound right?

--- i agree, this is what i personally would expect. It would need to
be codified somehow, but (i think) this is what X2V already does. We
could make a simple test page and add it to the test suite if you
think it would help?

-brian

-- 
brian suda
http://suda.co.uk
From mdagn at spraci.com  Mon Apr 28 21:01:12 2008
From: mdagn at spraci.com (Michael MD)
Date: Mon Apr 28 21:01:15 2008
Subject: [uf-dev] Proper use of value
References: <OFD2BF50DC.6AB2156C-ON86257434.0058AC00-86257434.0058E641@us.ibm.com>
	<480FA69C.5000308@digitalbazaar.com>
Message-ID: <002601c8a9ad$aaa11960$116bacca@COMCEN>

> What about PRE tags? Or the use of any CSS 'white-space'[1] style that
> isn't 'normal'. This is important in poetry and other pre-formatted text
> on the net.
>
> For example:
>
> <span style="white-space: pre-line>
> A crash reduces
> Your expensive computer
> To a simple stone.
> </span>
>
> By stating that uF parsers should remove whitespace, we're unnecessarily
> invalidating all of those use cases.



its a tricky one ... I can think of some cases where removing whitesapace 
can be a problem and others where keeping it is a problem...

Perhaps a new line should be treated differently to something like a space 
or tab?

...or perhaps its better to preserve them in the parser and let the 
application handle them in an appropriate way? 


From contact at lumieredelune.com  Tue Apr 29 12:16:30 2008
From: contact at lumieredelune.com (=?Windows-1252?Q?Lumi=E8re_de_Lune?=)
Date: Tue Apr 29 12:16:20 2008
Subject: [uf-dev] Problems with importation of a hcard as a vCard in Outlook
Message-ID: <00e101c8aa2d$8914c870$6701a8c0@PARACOU>

Hello, 

 

I'm not sure this is the right to post, hope I did not make the wrong choice
? 

I'm currently creating a hCard for my website. The website is in XHTML and
utf-8

And the Hcard has a accented character : ?

I tried with the three options possible in the source code (?, &grave; and
&#232; ) and the two import protocols of Technorati and XV2 and both produce
"strange" characters in Outlook. 

Strange meaning this kind of wrong character you got when you've got the
wrong encoding. 

I also noticed that on the Wiki, or other sites, I could not find any
example of a Hcard with accents? 

 

Any idea to solve this problem would be highly appreciated. 

 

--

Marie-Aude 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080429/c7ff0712/attachment-0001.html
From contact at lumieredelune.com  Tue Apr 29 12:22:28 2008
From: contact at lumieredelune.com (=?Windows-1252?Q?Lumi=E8re_de_Lune?=)
Date: Tue Apr 29 12:22:15 2008
Subject: [uf-dev] Problems with importation of a hcard as a vCard in Outlook
Message-ID: <00ed01c8aa2e$5cfc3830$6701a8c0@PARACOU>



From jason.karns at gmail.com  Tue Apr 29 12:57:33 2008
From: jason.karns at gmail.com (Jason Karns)
Date: Tue Apr 29 12:57:37 2008
Subject: [uf-dev] Include-Pattern Infinite Loop Test Cases
Message-ID: <1005d65f0804291257x35022f49vdf96a4499796bfc7@mail.gmail.com>

I've been working on a simple JavaScript pre-parser of sorts.  It is
designed to follow all include references (local references only, of
course) and produces a DOM with all includes replaced by the
referenced subtrees.  This is a call to all current microformat parser
implementers to produce infinite loop test cases so that I might fully
test my implementation before porting it to other languages.

If successful, I plan to post the algorithm as well as various
language implementations in the hope that existing tools may be able
to easily add support for the include-pattern, without falling back to
arbitrary max-recursion numbers.

Thanks,
Jason Karns
From brian.suda at gmail.com  Tue Apr 29 16:38:53 2008
From: brian.suda at gmail.com (Brian Suda)
Date: Tue Apr 29 16:38:57 2008
Subject: [uf-dev] Problems with importation of a hcard as a vCard in
	Outlook
In-Reply-To: <00e101c8aa2d$8914c870$6701a8c0@PARACOU>
References: <00e101c8aa2d$8914c870$6701a8c0@PARACOU>
Message-ID: <21e770780804291638x415f84b8of832aabae0010fa2@mail.gmail.com>

2008/4/29, Lumi?re de Lune <contact@lumieredelune.com>:
> Hello,
> I'm currently creating a hCard for my website. The website is in XHTML and
> utf-8

--- do you have a public url we could test against?

> And the Hcard has a accented character : ?
>
> I tried with the three options possible in the source code (?, &grave; and
> &#232; ) and the two import protocols of Technorati and XV2 and both produce
> "strange" characters in Outlook.

--- once we have a url, we can test to see if this is an issue with
the transformation or with Outlook. Which version of Outlook are you
using?

There is a list of known issues here:
http://microformats.org/wiki/vcard-implementations

-brian

-- 
brian suda
http://suda.co.uk

From contact at lumieredelune.com  Tue Apr 29 17:42:39 2008
From: contact at lumieredelune.com (=?US-ASCII?Q?Lumiere_de_Lune?=)
Date: Tue Apr 29 17:42:48 2008
Subject: [uf-dev] Problems with importation of a hcard as a vCard inOutlook
In-Reply-To: <21e770780804291638x415f84b8of832aabae0010fa2@mail.gmail.com>
References: <00e101c8aa2d$8914c870$6701a8c0@PARACOU>
	<21e770780804291638x415f84b8of832aabae0010fa2@mail.gmail.com>
Message-ID: <010a01c8aa5b$181ba570$6701a8c0@PARACOU>

2008/4/29, Brian Suda said

>do you have a public url we could test against?

Now yes (it was on localhost)
http://www.lumieredelune.com/res/tpl/vcardTest.php


>once we have a url, we can test to see if this is an issue with
>the transformation or with Outlook. Which version of Outlook are you
>using?

I'm using Outlook 2003 on XP SP2, with a French system. 
I asked two friends, with Outlook 2007 and SP and a German system, and one
with Outlook 2003, XP and an English system, and both of them see experience
the same problem. 

>There is a list of known issues here:
>http://microformats.org/wiki/vcard-implementations
Is it better to post directly there ? 

Thank you for your help

--
Marie-Aude
http://www.lumieredelune.com 


From gordon at onlinehome.de  Wed Apr 30 00:15:33 2008
From: gordon at onlinehome.de (Gordon Oheim)
Date: Wed Apr 30 00:20:53 2008
Subject: [uf-dev] Problems with importation of a hcard as a vCard inOutlook
Message-ID: <48181C95.4070802@onlinehome.de>

I have encountered the same issue just recently. I don't think it is an 
issue with Brian's script.

If you save the generated vCard to your desktop and open it with 
Notepad, all characters are fine. If you open the vCard on a Mac, it is 
fine too. It is only when you open the vCard with Windows Address Book 
or Outlook that the characters are broken. Probably due to the encoding 
used within Outlook. This is a common problem when using UTF-8 encoded 
content in Windows applications.

Cheers, Gordon
From mail at tobyinkster.co.uk  Wed Apr 30 00:30:29 2008
From: mail at tobyinkster.co.uk (Toby A Inkster)
Date: Wed Apr 30 00:30:42 2008
Subject: [uf-dev] Problems with importation of a hcard as a vCard inOutlook
Message-ID: <BA21E673-A9AA-4F6E-BBD6-4E80F31E2B96@tobyinkster.co.uk>

Lumiere de Lune wrote:

> http://www.lumieredelune.com/res/tpl/vcardTest.php

It does appear to be an Outlook-specific error. I've tried converting  
to vCard with both Cognition and X2V and adding to Apple Address  
Book, and the accent in the organisation name is imported perfectly.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>




	John Coleman recommended something: Yahoo This is a web site	Apr, 1 2008
	Michael recommended an another thing: Something about Coffee This is a summary with description ifnormation.	Apr, 1 2008