[uf-discuss] Parsing XFN in PHP

Tue Apr 8 05:10:35 PDT 2008

I need some advice about reading rel="me" tags in arbitrary web pages 
using PHP. I'm intending to use this to help build a lifestream style 
function. The basic intent is to cut down the amount of data entry the 
user has to do. When they give me a MyBlogLog, Friendfeed, Plaxo Pulse 
page that has lists of links to their profile pages I should be able to 
avoid having to ask them for all of them again. So:-

- User gives me a URL for one of their profile pages
- Use Curl to collect the source
- Parse the source looking for links with a rel="me"
- Extract an array of Link URL - Link Text
- Do something useful with the array. (???? followed by Profit!)

I've been searching this morning for a PHP library to do the parsing and 
link extraction or PHP examples or example regex to use in 
PREG_MATCH_ALL or something/anything, without success. Since the source 
data is probably badly written and broken html, I don't think I can use 
XML methods as all the XML unserialising code I've used barfs on badly 
formed XML. One possibility I suppose is to run it though HTML-Tidy 
first but I run the (admittedly small) chance of html-tidy wiping out 
some of the links.

So what do people use to consume XFN with PHP?

-- 
Julian Bond  E&MSN: julian_bond at voidstar.com  M: +44 (0)77 5907 2173
Webmaster:          http://www.ecademy.com/      T: +44 (0)192 0412 433
Personal WebLog:    http://www.voidstar.com/     skype:julian.bond?chat
                         Not Tested On Animals