Microformalyze (was: Playlists and Albums (was: Re: [uf-new] item property))

Manu Sporny msporny at digitalbazaar.com
Fri Oct 19 07:57:13 PDT 2007


Martin McEvoy wrote:
>>> I Really dont think that we can have a clear Idea of what hAudio is
>>> Until our our examples are re-studied without the use of a program.
> 
> Because it is my opinion that the data output of your application is not
> to be relied upon

I don't want this to become a nasty discussion, Martin. I realize that
you have questions about Microformalyze and I am attempting to answer them.

I believe the tone of this discussion is a bit off... right now, it
sounds like you're alluding to the notion that there has been some sort
of "nefarious behavior" when gathering data for hAudio, or that the data
we have is not dependable. I realize that my responses could have been
less inflammatory and more explanatory.

I am going to attempt to explain how Microformalyze works in a more
explanatory manner.

>> Why do you think this approach is going to help us?
> 
> Why do you think that the Microformalyze approach is going to help us?
> do you not think the Hand and Eye are a better approach? 

Microformalyze is a "Hand and Eye" approach... there is no automation to
the "analyzing a web page" part of the tool.

It saves us the time from having to tally statistics by hand. It is also
far more accurate to have a machine tally the results and statistics.

Before we were using Microformalyze there were several errors when
calculating the statistics that I made. It is difficult to go through 48
examples and over 1,000 properties by hand, calculate statistics, and
not expect some human error.

Here's how we used to gather examples for hAudio:

1. Open up the hAudio Wiki.
2. Copy/Paste one example URL into a different tab in the web browser.
3. Copy/Paste the hAudio example template that had all of the properties
   into the correct part of the wiki page.
4. Flip between the hAudio Wiki tab and the example URL page, adding or
   deleting properties from hAudio.
5. Repeat this process 84 times (each page took around 20 minutes to
   analyze).

Here's how it works with Microformalyze:

1. Open up Microformalyze
2. Click "Add URL" to add URLs that need to be analyzed.
3. Click "Add property" to add properties that you expect to see (this
   can also be done while you're analyzing the pages)
4. Once all of the URLs that need to be analyzed have been added, you
   click the "Next URL" button.
5. Microformalyze displays the URL in a web browser and you click
   checkboxes to specify what properties exist on the example URL page.
   This small change to the process cuts down the time to analyze a
   page greatly... mainly because you're not editing wiki text, you're
   just clicking a checkbox.
6. Repeat this process 84 times (each page took around 5 minutes to
   analyze).

The old way of doing things took around 20 minutes per website. The
Microformalyze way of doing things takes around 5 minutes per website.

Now let's examine how we calculated statistics before:

Here's how we did it via the Wiki:

Every time a new property was created, I would have to go through and
tally the results by hand. This was error prone and on more than one
occasion, I had to wipe everything and start over. It also required me
to triple-check my work to make sure I was reporting the correct
statistics to the list. I spent hours doing this - just calculating
statistics. There is a reason not many people help out with gathering
examples and calculating statistics - it is tedious and excruciatingly
time consuming.

Here's how it was done using Microformalyze:

You click a button and the statistics are automatically calculated for
you. You click another button and it dumps the wiki formatted text for
displaying the properties. It is no longer time consuming or error prone
to do this!

However, the most important aspect of Microformalyze is that ANYBODY can
go back and validate our findings easily. The data files are there,
there is a common namespace across all properties/websites, in other
words: there is a verifiable paper trail.

It is important to point out that this does not exist for any other
Microformat that I know about. Verifiability of analysis results is very
important! Reducing human error in statistics calculations is very
important! Microformalyze builds this into the examples gathering and
statistics calculation process.

>> that helps the user track the properties on each page. It can
>> automatically calculate statistics and helped the process of analysis
>> immensely.
> 
> This is my concern *HOW* does Microformalize do this? 
> 
> Microformalize has all the power of a high profile search engine that
> can output the relevance of a given keyword in order and frequency of
> occurrence correct?

No, absolutely not. This is the core of your misunderstanding of what
Microformalyze does. There is no "search engine" or "keyword matching"
technology in Microformalyze. That would be a horrible way to go about
gathering examples.

All Microformalyze does is automate the tedious and error-prone parts of
the examples and statistics gathering portion of the Microformats
process. It also adds verifiability - which is really it's most
important contribution to the process.

If you would like to see a detailed tutorial on how it works, the
tutorial is available here:

http://wiki.digitalbazaar.com/en/Microformalyze#Tutorial

I'd be happy to answer any other questions or concerns that you have
about Microformalyze. Like I said before, all of the data files, source
code (which I placed under the GPL), and documentation is available via
the website listed above. You don't have to take my word for it... you
could read the code, look at the data and see for yourself.

-- manu


More information about the microformats-new mailing list