[uf-discuss] Parsing XFN in PHP
Julian Bond
julian_bond at voidstar.com
Thu Apr 10 10:04:41 PDT 2008
Ryan Parman <ryan.lists.warpshare at gmail.com> Thu, 10 Apr 2008 09:05:47
>As someone with a background in parsing RSS/Atom, I can say from years
>of experience that RSS is only occasionally XML and that you typically
>find far more HTML in a feed than XML. And parsing HTML can be a bitch.
Big snip.
Woah! That's enough to put one off even starting on parsing and reading
uF. Which makes uF all a bit pointless. Oh dear. :(
I suspect though that this Gordian knot can be cut. It seems quite
likely that any page marked up with uF is good enough that HTML-Tidy
won't remove too many uF marked up elements. If that's the case, then
Fetch html -> HTML-Tidy -> XML parsing is going to get 99% of the job
done and successfully extract the uF marked data. But that HTML-Tidy
step is going to be indispensable. It just plain won't work without it.
And the shortcut that reduces even that step is
DomDocument>loadHtml($html) which is effectively doing the same thing.
It would be interesting to do some interop testing and see just how bad
a web page has to be before the uF starts getting missed.
And a uF validator would come in handy there.
--
Julian Bond E&MSN: julian_bond at voidstar.com M: +44 (0)77 5907 2173
Webmaster: http://www.ecademy.com/ T: +44 (0)192 0412 433
Personal WebLog: http://www.voidstar.com/ skype:julian.bond?chat
Tastes Like Milk
More information about the microformats-discuss
mailing list