[uf-dev] Parsing XFN in PHP
Mark Ng
mark at markng.me.uk
Fri Apr 11 04:36:03 PDT 2008
$html = tidy_repair_string($html,array('output-xhtml' => true,
'numeric-entities' => 'true', )); was what I was using - does it work
for you ?
Mark
On 11/04/2008, Julian Bond <julian_bond at voidstar.com> wrote:
> Continuing a thread that started on the Discuss list.
>
> My experiments have led me to 2 approaches depending on PHP release.
> First php5. With error handling left as an exercise for the reader
>
>
> $url = 'http://ciaranmcnulty.com/';
> if($html = @file_get_contents($url)){
> $dom = new DomDocument();
> if(@$dom->loadHtml($html)){
>
> if ($nodes = $dom->getElementsByTagName('a')) {
> foreach($nodes as $node){
> if ($node->getAttribute('rel')=='me') {
> echo $node->getAttribute('href');
> }
> }
> }
> }
> }
>
> Pretty easy, huh? Clearly this same approach could be used for other
> values of rel= It's probably not too hard to extend this approach to
> find hCard and other uFs.
>
> loadHtml() doesn't exist in php4 dom-xml. In theory it should be
> possible to use HTML-Tidy tidy_repair_string to clean the html first and
> then feed it to domxml_open_mem. In practice, I'm having real trouble
> getting the right collection of tidy_repair_string configuration
> parameters to generate clean enough XML for dom to accept it. If that
> can be done, then this should work.
>
>
> $url = 'http://ciaranmcnulty.com/';
> if($html = @file_get_contents($url)){
>
> $html = @tidy_repair_string($html);
> if ($dom = @domxml_open_mem($html)) ) {
> if ($nodes = $dom->get_elements_by_tagname('a')) {
> foreach($nodes as $node){
> if ($node->get_attribute('rel')=='me') {
> echo $node->get_attribute('href');
> }
> }
> }
> }
> }
>
> Typical errors are things like:-
> - Space required after the Public Identifier
> - SystemLiteral " or ' expected
> - xmlParseExternalID: PUBLIC, no URI in
> - invalid entity nbsp
> Maybe, it's possible to get Tidy's output to avoid all these but I
> haven't managed it yet. I had a look at hkit but that makes no attempt
> to configure the Tidy module so I'd expect lots of problems when trying
> to parse arbitrary web pages.
>
>
> --
> Julian Bond E&MSN: julian_bond at voidstar.com M: +44 (0)77 5907 2173
> Webmaster: http://www.ecademy.com/ T: +44 (0)192 0412 433
> Personal WebLog: http://www.voidstar.com/ skype:julian.bond?chat
> Tastes Like Milk
> _______________________________________________
>
> microformats-dev mailing list
> microformats-dev at microformats.org
> http://microformats.org/mailman/listinfo/microformats-dev
>
More information about the microformats-dev
mailing list