[uf-dev] Preventing false positives
Zachary Carter
zack.carter at gmail.com
Fri May 2 15:47:07 PDT 2008
The mfo approach is interesting, and would probably be the ideal type
of approach. The handling by Cognition would be identical to mfo
except the classes aren't added back at a later step. For the second
situation, where the descendants are still parsed as belonging to the
scope of the uF but not the element, it would remove class/rev/rel
from just the element it's placed on.
To help elaborate my situation:
<div class="vcard">
<h1 class="fn n">
<span class="honorific-prefix">Dr.</span>
<span class="given-name">Marvin</span>
<span class="family-name">Candle</span>
</h1>
<p>
<span class="label">Website:</span> <a
href="http://example.org" class="url">http://example.org</a>
</p>
<h2 class="title">Applications</h2>
<p class="applications">
[... third party content ...]
</p>
</div>
Title and label classes are not being used as hcard properties, so I
would want to exclude them. The third party application area I would
want to ignore completely (placing it in an iframe would likely break
lots of functionality.) Are there any plans (or should there be) to
support something like this?
Alternatively, is it possible to assign content distributed on the
page as belonging to a single microformat?
On Fri, May 2, 2008 at 4:25 AM, Toby A Inkster <mail at tobyinkster.co.uk> wrote:
> Zachary Carter wrote:
>
>
> > So I have two questions: 1) is
> > there a way to ignore an entire element and its descendants from being
> > parsed?
> >
>
> Not that I know of. I suppose that putting the content into an IFRAME
> instead of on the main page ought to do it, but it's an ugly solution; and
> because it's not an officially sanctioned method for hiding content from
> parsers, you have no guarantee that future parsers will not start parsing
> within IFRAMEs.
>
>
> > 2) Is there a way to have the parser ignore all class names on
> > an element? (as if the class names were removed from the element prior
> > to parsing)
> >
>
> The MFO effort <http://microformats.org/wiki/mfo> is an attempt to do
> something like this. The list of parsers that actually support MFO is pretty
> short though.
>
> Cognition <http://buzzword.org.uk/cognition/> does support MFO. I mention
> this because the technique it uses is close to what you describe. When it
> parses a microformat, it takes a *clone* of the element and its children (so
> as not to damage the original DOM tree), then tries to parse embedded
> microformats -- e.g. "adr", "geo" and "agent vcard" within a "vcard".
>
> I'll break off the parsing procedure here for a little terminology: I make
> a distinction between "embedded microformats" which are those that imply a
> special meaning by being nested within each other; and "nested microformats"
> which are those that are nested within each other by mere co-incidence, or
> perhaps to convey some kind of undefined relationship between the objects
> (e.g. an hCard could be nested within a geo -- perhaps the author meant to
> convey that the person represented by the hCard lives at that location, but
> this type of nesting is not defined in the specs)
>
> Anyway, after parsing *embedded* microformats, Cognition searches for
> *nested* microformats. It uses a list of all known root element classes
> (e.g. "hatom", "hresume", "hlisting", "vcalendar") -- including the class
> names for microformats which Cognition does not yet support. It also
> includes the class name "mfo".
>
> Now, if it finds any of these nested microformats, it reaches within them
> and tampers with every descendent element, setting the "rel", "rev" and
> "class" attributes to the empty string. Remember, that this is on a clone of
> the DOM. Thus these elements will be excluded from supplying any
> unintentional semantics to the outer microformat.
>
> Let's look at an example:
>
> <div class="vcard">
> <h1 class="fn n">
> <span class="honorific-prefix">Dr.</span>
> <span class="given-name">Marvin</span>
> <span class="family-name">Candle</span>
> </h1>
> <p class="note">
> <span class="mfo">
> Worked for a company called
> <b class="vcard">
> <span class="fn org">The Hanzo Foundation</span>
> </b>.
> </span>
> </p>
> </div>
>
> Now, when we come to parse the outer hCard, the clone is reduced to the
> following using MFO:
>
> <div class="vcard">
> <h1 class="fn n">
> <span class="honorific-prefix">Dr.</span>
> <span class="given-name">Marvin</span>
> <span class="family-name">Candle</span>
> </h1>
> <p class="note">
> <span class="mfo">
> Worked for a company called
> <b>
> <span>The Hanzo Foundation</span>
> </b>.
> </span>
> </p>
> </div>
>
> And the following vCard may be produced:
>
> BEGIN:VCARD
> FN:Dr. Marvin Candle
> N:Candle;Marvin;;Dr.
> NOTE:Worked for a company called The Hanzo Foundation.
> END:VCARD
>
> Note that the full text of the note is included, but there is no "ORG"
> property in the vCard.
>
> As it happens, because "vcard" is included in that big list of known
> microformats (remember? "hatom", "hresume", "hlisting", "vcalendar"...), the
> same effect would have happened even if we hadn't included <span
> class="mfo"> -- but the MFO class is still useful because new microformats
> could arise at some point in the future which are not on that list.
>
> It is also worth noting that while this MFO step masks the properties of
> the inner hCard from the outer hCard, the inner hCard will still be parsed
> as a later step, resulting in a second vCard:
>
> BEGIN:VCARD
> FN:The Hanzo Foundation
> ORG:The Hanzo Foundation
> END:VCARD
>
> --
> Toby A Inkster
> <mailto:mail at tobyinkster.co.uk>
> <http://tobyinkster.co.uk>
>
>
>
>
>
> --
> Toby A Inkster
> <mailto:mail at tobyinkster.co.uk>
> <http://tobyinkster.co.uk>
>
>
>
> _______________________________________________
> microformats-dev mailing list
> microformats-dev at microformats.org
> http://microformats.org/mailman/listinfo/microformats-dev
>
--
Zach Carter
http://zachcarter.info
More information about the microformats-dev
mailing list