[uf-discuss] Parsing XFN in PHP
Dan Brickley
danbri at danbri.org
Thu Apr 10 12:20:26 PDT 2008
Julian Bond wrote:
> Ryan Parman <ryan.lists.warpshare at gmail.com> Thu, 10 Apr 2008 09:05:47
>> As someone with a background in parsing RSS/Atom, I can say from
>> years of experience that RSS is only occasionally XML and that you
>> typically find far more HTML in a feed than XML. And parsing HTML can
>> be a bitch.
>
> Big snip.
>
> Woah! That's enough to put one off even starting on parsing and
> reading uF. Which makes uF all a bit pointless. Oh dear. :(
>
> I suspect though that this Gordian knot can be cut. It seems quite
> likely that any page marked up with uF is good enough that HTML-Tidy
> won't remove too many uF marked up elements. If that's the case, then
> Fetch html -> HTML-Tidy -> XML parsing is going to get 99% of the job
> done and successfully extract the uF marked data.
Aside re 'nofollow':
If you're scrubbing HTMLish character streams with arbitrary other code
to make XHTML, do take care that you're not accidentally scrubbing
rel='nofollow' from comment areas while leaving in potentially
mischievous "rel='me'" claims. I don't know the default behaviour of
HTML Tidy or similar tools, but this risk is worth bearing in mind.
Per http://microformats.org/wiki/xfn-clarifications#me_nofollow_interaction
"If a link has the rel value "nofollow", then a "me" rel value DOES
NOT indicate an identity relationship. That is, only rel attributes with
the value "me", and WITHOUT the value "nofollow" indicate an identity
relationship assertion. "
While it might seem odd for a 'nofollow' to be stripped while leaving a
'me' in there, I've seen enough hostility to the 'nofollow' idea
floating around, that it is certainly possible some HTML cleanup tools
will drop that markup. For example,
http://meiert.com/en/blog/20070106/nofollow-still-considered-harmful/
http://foolswisdom.com/do-follow-wordpress/
http://www.itst.org/nonofollow/ http://www.nonofollow.net/
http://www.unintentionallyblank.co.uk/2007/02/20/on-the-redundancy-of-nofollow/
etc...
cheers,
Dan
--
http://danbri.org/
More information about the microformats-discuss
mailing list