[uf-discuss] a question about concatenation and hAtom entry
content
Ben Wiley Sittler
bsittler at gmail.com
Sat Jun 2 06:20:30 PDT 2007
On 6/1/07, David Janes <davidjanes at blogmatrix.com> wrote:
> On 6/1/07, Ryan King <ryan at technorati.com> wrote:
> > On May 31, 2007, at 11:29 AM, David Janes wrote:
> >
> > > On 5/31/07, Ryan King <ryan at technorati.com> wrote:
> > >
> > >> Another option is that entry content is:
> > >>
> > >> <p class="entry-content">Content</p>
> > >> <p class="entry-content">More Content</p>
> > >>
> > >>
> > >> Is there a reason why hAtom as currently spec'ed only does text, not
> > >> markup?
> > >
> > > I thought it did markup! I totally see what you are saying here
> > > though; the question here is whether we include the DOM nodes that
> > > specify entry-content. This isn't in the spec, and you wouldn't want
> > > to do it everywhere (entry-title, for example) but it would make sense
> > > if it did.
> >
> > You're right, I'm suggesting that only for entry-content (and maybe
> > entry-summary) that we take the nodes that have the class name on
> > them. The reason? I've seen this several times:
> >
> > <... class="hentry">
> > ...
> > <p class="entry-content">...</p>
> >
> > <p class="entry-content">...</p>
> >
> > </>
> >
> > It makes sense, to me, to put the paragraph nodes, intact, in the
> > content.
>
> I concur. Time to start ramping up for hAtom 0.2, if I can get some
> blocks of free time.
>
> Regards, etc...
why not do this for the entry title, too? accroding to the atom spec,
this can contain markup too (and in my experience, often does.)
and yes, having some well-defined rules for xhtml → text flattening
would be good (not just for microformats, but for xhtml apps
generally.) here are the ones i use:
1. ignore content of the following elements: script, style, textarea, title
2. use the alt text as the text for img elements
4. normalize all runs of one or more whitespace to a single space in
all elements that do not have an encestral pre, xmp, plaintext, or
listing element
3. insert breaks before and after the following elements: br, p, div,
hr, h1, h2, h3, h4, h5, blockquote, address, table, tr, td, form, pre,
xmp, listing, ol, ul, menu, dir, li, dl, dt and dd
still to do:
4. table layout algorithm
5. conversion of content inside sup or sub to corresponding unicode
characters where possible, but only when the entire non-whitespace sub
or sup content can be converted. this would include e.g. <sup>TM</sup>
→ ™ and <sup>2</sup> → ²
-ben
More information about the microformats-discuss
mailing list