[uf-discuss] Parsing XFN in PHP

Julian Bond julian_bond at voidstar.com
Tue Apr 8 05:10:35 PDT 2008

I need some advice about reading rel="me" tags in arbitrary web pages 
using PHP. I'm intending to use this to help build a lifestream style 
function. The basic intent is to cut down the amount of data entry the 
user has to do. When they give me a MyBlogLog, Friendfeed, Plaxo Pulse 
page that has lists of links to their profile pages I should be able to 
avoid having to ask them for all of them again. So:-

- User gives me a URL for one of their profile pages
- Use Curl to collect the source
- Parse the source looking for links with a rel="me"
- Extract an array of Link URL - Link Text
- Do something useful with the array. (???? followed by Profit!)

I've been searching this morning for a PHP library to do the parsing and 
link extraction or PHP examples or example regex to use in 
PREG_MATCH_ALL or something/anything, without success. Since the source 
data is probably badly written and broken html, I don't think I can use 
XML methods as all the XML unserialising code I've used barfs on badly 
formed XML. One possibility I suppose is to run it though HTML-Tidy 
first but I run the (admittedly small) chance of html-tidy wiping out 
some of the links.

So what do people use to consume XFN with PHP?

