[uf-discuss] Parsing XFN in PHP

Dan Brickley danbri at danbri.org
Thu Apr 10 12:20:26 PDT 2008

Julian Bond wrote:
> Ryan Parman <ryan.lists.warpshare at gmail.com> Thu, 10 Apr 2008 09:05:47
>> As someone with a background in parsing RSS/Atom, I can say from 
>> years of experience that RSS is only occasionally XML and that you 
>> typically find far more HTML in a feed than XML. And parsing HTML can 
>> be a bitch.
> Big snip.
> Woah! That's enough to put one off even starting on parsing and 
> reading uF. Which makes uF all a bit pointless. Oh dear. :(
> I suspect though that this Gordian knot can be cut. It seems quite 
> likely that any page marked up with uF is good enough that HTML-Tidy 
> won't remove too many uF marked up elements. If that's the case, then 
> Fetch html -> HTML-Tidy -> XML parsing is going to get 99% of the job 
> done and successfully extract the uF marked data.
Aside re 'nofollow':

If you're scrubbing HTMLish character streams with arbitrary other code 
to make XHTML, do take care that you're not accidentally scrubbing 
rel='nofollow' from comment areas while leaving in potentially 
mischievous "rel='me'" claims. I don't know the default behaviour of 
HTML Tidy or similar tools, but this risk is worth bearing in mind.

Per http://microformats.org/wiki/xfn-clarifications#me_nofollow_interaction
    "If a link has the rel value "nofollow", then a "me" rel value DOES 
NOT indicate an identity relationship. That is, only rel attributes with 
the value "me", and WITHOUT the value "nofollow" indicate an identity 
relationship assertion. "

While it might seem odd for a 'nofollow' to be stripped while leaving a 
'me' in there, I've seen enough hostility to the 'nofollow' idea 
floating around, that it is certainly possible some HTML cleanup tools 
will drop that markup. For example, 
http://www.itst.org/nonofollow/  http://www.nonofollow.net/ 




More information about the microformats-discuss mailing list