[uf-dev] Parsing XFN in PHP
Julian Bond
julian_bond at voidstar.com
Fri Apr 11 04:12:46 PDT 2008
Continuing a thread that started on the Discuss list.
My experiments have led me to 2 approaches depending on PHP release.
First php5. With error handling left as an exercise for the reader
$url = 'http://ciaranmcnulty.com/';
if($html = @file_get_contents($url)){
$dom = new DomDocument();
if(@$dom->loadHtml($html)){
if ($nodes = $dom->getElementsByTagName('a')) {
foreach($nodes as $node){
if ($node->getAttribute('rel')=='me') {
echo $node->getAttribute('href');
}
}
}
}
}
Pretty easy, huh? Clearly this same approach could be used for other
values of rel= It's probably not too hard to extend this approach to
find hCard and other uFs.
loadHtml() doesn't exist in php4 dom-xml. In theory it should be
possible to use HTML-Tidy tidy_repair_string to clean the html first and
then feed it to domxml_open_mem. In practice, I'm having real trouble
getting the right collection of tidy_repair_string configuration
parameters to generate clean enough XML for dom to accept it. If that
can be done, then this should work.
$url = 'http://ciaranmcnulty.com/';
if($html = @file_get_contents($url)){
$html = @tidy_repair_string($html);
if ($dom = @domxml_open_mem($html)) ) {
if ($nodes = $dom->get_elements_by_tagname('a')) {
foreach($nodes as $node){
if ($node->get_attribute('rel')=='me') {
echo $node->get_attribute('href');
}
}
}
}
}
Typical errors are things like:-
- Space required after the Public Identifier
- SystemLiteral " or ' expected
- xmlParseExternalID: PUBLIC, no URI in
- invalid entity nbsp
Maybe, it's possible to get Tidy's output to avoid all these but I
haven't managed it yet. I had a look at hkit but that makes no attempt
to configure the Tidy module so I'd expect lots of problems when trying
to parse arbitrary web pages.
--
Julian Bond E&MSN: julian_bond at voidstar.com M: +44 (0)77 5907 2173
Webmaster: http://www.ecademy.com/ T: +44 (0)192 0412 433
Personal WebLog: http://www.voidstar.com/ skype:julian.bond?chat
Tastes Like Milk
More information about the microformats-dev
mailing list