[uf-discuss] a question about concatenation and hAtom entry content

Ben Wiley Sittler bsittler at gmail.com
Sat Jun 2 06:20:30 PDT 2007


On 6/1/07, David Janes <davidjanes at blogmatrix.com> wrote:
> On 6/1/07, Ryan King <ryan at technorati.com> wrote:
> > On May 31, 2007, at 11:29 AM, David Janes wrote:
> >
> > > On 5/31/07, Ryan King <ryan at technorati.com> wrote:
> > >
> > >> Another option is that entry content is:
> > >>
> > >> <p class="entry-content">Content</p>
> > >> <p class="entry-content">More Content</p>
> > >>
> > >>
> > >> Is there a reason why hAtom as currently spec'ed only does text, not
> > >> markup?
> > >
> > > I thought it did markup! I totally see what you are saying here
> > > though; the question here is whether we include the DOM nodes that
> > > specify entry-content. This isn't in the spec, and you wouldn't want
> > > to do it everywhere (entry-title, for example) but it would make sense
> > > if it did.
> >
> > You're right, I'm suggesting that only for entry-content (and maybe
> > entry-summary) that we take the nodes that have the class name on
> > them. The reason? I've seen this several times:
> >
> > <... class="hentry">
> >   ...
> >   <p class="entry-content">...</p>
> >
> >   <p class="entry-content">...</p>
> >
> > </>
> >
> > It makes sense, to me, to put the paragraph nodes, intact, in the
> > content.
>
> I concur. Time to start ramping up for hAtom 0.2, if I can get some
> blocks of free time.
>
> Regards, etc...

why not do this for the entry title, too? accroding to the atom spec,
this can contain markup too (and in my experience, often does.)

and yes, having some well-defined rules for xhtml → text flattening
would be good (not just for microformats, but for xhtml apps
generally.) here are the ones i use:

1. ignore content of the following elements: script, style, textarea, title

2. use the alt text as the text for img elements

4. normalize all runs of one or more whitespace to a single space in
all elements that do not have an encestral pre, xmp, plaintext, or
listing element

3. insert breaks before and after the following elements: br, p, div,
hr, h1, h2, h3, h4, h5, blockquote, address, table, tr, td, form, pre,
xmp, listing, ol, ul, menu, dir, li, dl, dt and dd

still to do:

4. table layout algorithm

5. conversion of content inside sup or sub to corresponding unicode
characters where possible, but only when the entire non-whitespace sub
or sup content can be converted. this would include e.g. <sup>TM</sup>
→ ™ and <sup>2</sup> → ²

-ben



More information about the microformats-discuss mailing list