[uf-discuss] Scraping or parsing?

Mon Mar 5 21:23:37 PST 2007

On Mar 4, 2007, at 11:06 PM, Mike Schinkel wrote:

> Ryan Cannon wrote:
>> Adding an @profile attribute to he <head>element is far
>> less technically demanding than, say, creating a tag
>> space, which we also require. Especially as the addition
>> also has no performance or usability impact.
>
> It may be less technically demanding, but the latter is needed.

Creating a tag space allows a user of rel-tag to discover precisely
what the author means by the text of the tag. Profile URIs help
authors discover precisely what an attribute value means. In light
of your later point about grokability, I think both are needed.

>> I also think that authoring microformats with the intent
>> that they be usable to the CMS-using/WYSIWG masses is a
>> pipe dream. Users should *not* be encouraged to publish
>> HTML markup they cannot read. Robust microformatted
>> content will always require either an understanding of
>> how to hand-code HTML or a tool to help generate it--is
>> it unreasonable to think that the meeting of either
>> condition implies the ability to add an @profile as well
>> for 80% of cases?
>
> I cannot overemphasis how strongly I disagree with that last  
> paragraph from
> a philosophical standpoint, for two reasons:
>
> 1.) There are two schools of thinking, one of which I believe to be  
> severely
> flawed:
>
> 	A.) Don't worry about the syntax or how it is implemented, the tools
> will take care of make it easy.
> 	B.) Don't even think about tools until it can be done and easily
> understood by a human. Only then should tools be created.
> 	
> Of course I strongly believe that "A" is the flaw perspective  
> although I
> know there are many people in that camp, you (it appears) included.  
> I plan
> to write a paper in the future on this issue after I've done enough  
> research
> and gathered actual evidence but for now let's look at the  
> technologies that
> have gained quick and *widespread* usage (a), and those that  
> haven't (b):
>
> 	(a) HTML, RSS, CSS, XML, some microformats, shell scripts/batch
> files, languages using text for source, and so on.
> 	(b) XHTML, XML Namespaces, XSLT, RDF, other microformats, Visual
> programming languages, and so on.
> 	
> <snip>
> 	
> The technologies that work are the ones that are designed for  
> humans first,
> with humans with tools second.

Although I'm not sure about the others, I know that RSS, CSS and XML  
were
designed not simply for humans, but for humans with a specific set of  
skills
in place. People with these skills built tools that then fueled (or  
are fueling)
wide-spread adoption. Perhaps I'm wrong, but I see microformats in  
the same
vein. I think your concept of "quick" and "widespread" are  
interesting as
well--CSS 2 (Recommended 1998) and XML Namespaces (Recommended 2006)  
have
roughly similar penetration in Web browsers (imperfect in most, quite  
poor in
a major one).

> Believing that there is or should be a
> difference between "users" and "content authors" is either simply  
> ignorant
> or actively arrogant.

To quote Andy Mabbett, this is a straw man. I never said this, nor did I
intend it. What I said was: WYSIWYG-only users can't read code.
Microformats without tools are code. In my experience, WYSIWYG users  
who post
code they cannot read rarely get the outcome they desire. Authoring  
Microformats
with the intention that they be usable *as code* to content authors  
*who cannot
read code* is a pipe dream.

> The web with its recent social media component has
> empowered EVERYONE to become content authors, and I don't honestly  
> see this
> abating. My expectation is that soon every kid from a first world  
> country
> (and soon every kid in the world, if OLPC succeeds) will be as  
> comfortable
> coding in HTML as today's office worker is comfortable using  
> Microsoft Word.

And? Once this occurs adding a single attribute to a single element  
will be
easy for everyone.

> And if you'll forgive the tinge of melodrama

I don't think it's appropriate, warranted or necessary.

This thread is about the necessity of profile URIs. I think the problems
started with Scott Reynen's assertion[1] that:

 > Profiles are not intended to work as parsing templates.  They just
 > identify the type of data so parsers can figure out whether or not
 > it's something they know how to parse.

Profiles are intended for machines *and humans*[2]. Providing profile  
URIs adds
important disambiguation for the definitions of terms and helps  
content authors
better understand the code they are writing. For example, say an author
unfamiliar with hCard attempted to duplicate the following code:

<div class="vcard" id="banana">
   <p>
      <a href="http://ryancannon.com/" class="fn url bar">Ryan  
Cannon</a>
      is a <span class="constellation">Scorpio</span>.
   </p>
</div>

What is necessary? What is significant? Why banana? Instead of having  
to wade
through the vCard or hCard spec, the profile provides an easy-to-read  
description
of the format and its included terms. By allowing microformat  
publishers to omit
profile URIs, you also eliminate important clues as to what  
microformats mean,
what is important, and what is not. *That* is a good way to keep  
content authors
from becoming anointed.

[1]: http://microformats.org/discuss/mail/microformats-discuss/2007- 
March/008892.html
[2]: http://www.gmpg.org/xmdp/