Microformalyze (was: Playlists and Albums (was: Re: [uf-new] item property))

Rob Manson roBman at MobileOnlineBusiness.com.au
Fri Oct 19 20:23:03 PDT 2007


Martin,

it doesn't analyse the page for Baba and Flumps...you do.

You checked the boxes telling it Baba and Flumps are there...so it gave
you a 100% answer.

8)


Manu, I think it's a very useful tool to help standardise analysis.


roBman


On Fri, 2007-10-19 at 22:15 +0100, Martin McEvoy wrote:
> On Fri, 2007-10-19 at 10:57 -0400, Manu Sporny wrote:
> > Martin McEvoy wrote:
> > >>> I Really dont think that we can have a clear Idea of what hAudio is
> > >>> Until our our examples are re-studied without the use of a program.
> > > 
> > > Because it is my opinion that the data output of your application is not
> > > to be relied upon
> > 
> > I don't want this to become a nasty discussion, 
> 
> ?? now you are confusing me, this is a nasty discussion because I ask
> questions? 
> 
> > Martin. I realize that
> > you have questions about Microformalyze and I am attempting to answer them.
> > 
> > I believe the tone of this discussion is a bit off... right now, it
> > sounds like you're alluding to the notion that there has been some sort
> > of "nefarious behavior" when gathering data for hAudio, 
> 
> I am not saying that there is some sort of sinister behavior going on at
> all I am pointing out that the data that Microformalize outputs (in the
> terminal) is not to be trusted.
> 
> > or that the data
> > we have is not dependable. I realize that my responses could have been
> > less inflammatory and more explanatory.
> > 
> > I am going to attempt to explain how Microformalyze works in a more
> > explanatory manner.
> > 
> > >> Why do you think this approach is going to help us?
> > > 
> > > Why do you think that the Microformalyze approach is going to help us?
> > > do you not think the Hand and Eye are a better approach? 
> > 
> > Microformalyze is a "Hand and Eye" approach... there is no automation to
> > the "analyzing a web page" part of the tool.
> > 
> 
> ...
> 
> > It saves us the time from having to tally statistics by hand. It is also
> > far more accurate to have a machine tally the results and statistics.
> > 
> 
> ...
> 
> > Before we were using Microformalyze there were several errors when
> > calculating the statistics that I made. It is difficult to go through 48
> > examples and over 1,000 properties by hand, calculate statistics, and
> > not expect some human error.
> > 
> > Here's how we used to gather examples for hAudio:
> > 
> > 1. Open up the hAudio Wiki.
> > 2. Copy/Paste one example URL into a different tab in the web browser.
> > 3. Copy/Paste the hAudio example template that had all of the properties
> >    into the correct part of the wiki page.
> > 4. Flip between the hAudio Wiki tab and the example URL page, adding or
> >    deleting properties from hAudio.
> > 5. Repeat this process 84 times (each page took around 20 minutes to
> >    analyze).
> > 
> > Here's how it works with Microformalyze:
> > 
> > 1. Open up Microformalyze
> > 2. Click "Add URL" to add URLs that need to be analyzed.
> > 3. Click "Add property" to add properties that you expect to see (this
> >    can also be done while you're analyzing the pages)
> > 4. Once all of the URLs that need to be analyzed have been added, you
> >    click the "Next URL" button.
> > 5. Microformalyze displays the URL in a web browser and you click
> >    checkboxes to specify what properties exist on the example URL page.
> 
> So I tell the application what properties exist on a given page, and It
> confirms if this is true or not?
> 
> >    This small change to the process cuts down the time to analyze a
> >    page greatly... mainly because you're not editing wiki text, you're
> >    just clicking a checkbox.
> > 6. Repeat this process 84 times (each page took around 5 minutes to
> >    analyze).
> > 
> > The old way of doing things took around 20 minutes per website. The
> > Microformalyze way of doing things takes around 5 minutes per website.
> > 
> > Now let's examine how we calculated statistics before:
> > 
> > Here's how we did it via the Wiki:
> > 
> > Every time a new property was created, I would have to go through and
> > tally the results by hand. This was error prone and on more than one
> > occasion, I had to wipe everything and start over. It also required me
> > to triple-check my work to make sure I was reporting the correct
> > statistics to the list. I spent hours doing this - just calculating
> > statistics. There is a reason not many people help out with gathering
> > examples and calculating statistics - it is tedious and excruciatingly
> > time consuming.
> > 
> > Here's how it was done using Microformalyze:
> > 
> > You click a button and the statistics are automatically calculated for
> > you. You click another button and it dumps the wiki formatted text for
> > displaying the properties. It is no longer time consuming or error prone
> > to do this!
> > 
> > However, the most important aspect of Microformalyze is that ANYBODY can
> > go back and validate our findings easily. The data files are there,
> > there is a common namespace across all properties/websites, in other
> > words: there is a verifiable paper trail.
> > 
> > It is important to point out that this does not exist for any other
> > Microformat that I know about. Verifiability of analysis results is very
> > important! Reducing human error in statistics calculations is very
> > important! Microformalyze builds this into the examples gathering and
> > statistics calculation process.
> > 
> > >> that helps the user track the properties on each page. It can
> > >> automatically calculate statistics and helped the process of analysis
> > >> immensely.
> > > 
> > > This is my concern *HOW* does Microformalize do this? 
> > > 
> > > Microformalize has all the power of a high profile search engine that
> > > can output the relevance of a given keyword in order and frequency of
> > > occurrence correct?
> > 
> > No, absolutely not. This is the core of your misunderstanding of what
> > Microformalyze does. There is no "search engine" or "keyword matching"
> > technology in Microformalyze. That would be a horrible way to go about
> > gathering examples.
> > 
> > All Microformalyze does is automate the tedious and error-prone parts of
> > the examples and statistics gathering portion of the Microformats
> > process. It also adds verifiability - which is really it's most
> > important contribution to the process.
> 
> Sorry my friend I don't think I was being very clear
> 
> *HOW* does Microformalize do this? 
> 
> What Is a property? 
> how is a property determined?, 
> does Microformalize Analyze the raw html to determine the existence of
> these properties? does it look for actual output on a web page?
> 
> How does it gather statistics?
> how are they compared?, are they compared against other url's loaded
> into Microformalize, or does it calculate the occurence of a "property"
> on a page, or some other way?
> 
> > 
> > If you would like to see a detailed tutorial on how it works, the
> > tutorial is available here:
> > 
> > http://wiki.digitalbazaar.com/en/Microformalyze#Tutorial
> 
> Thanks for the tutorial but How do I use Microformalize was not the
> question.
> 
> > 
> > I'd be happy to answer any other questions or concerns that you have
> > about Microformalyze. Like I said before, all of the data files, source
> > code (which I placed under the GPL), and documentation is available via
> > the website listed above. You don't have to take my word for it... you
> > could read the code, look at the data and see for yourself.
> 
> I have had a look at the code but Python is not my strong point, Perhaps
> you might like to explain?
> 
> I did a test, the "properties" I was Looking for were Baba and Flumps
> (because there is a good chance that these properties will NOT exist in
> any of the pages I'm likely to test)
> 
> here is the test file (copy and paste if you like)
> 
> property	Baba	The Elephant
> property	Flumps	A sweetie
> url	Bazaar	http://blog.digitalbazaar.com/
> properties	Baba	Flumps
> 
> sorry to use your url but it was the first thing that sprung to mind :)
> 
> I ticked both boxes in the GUI Baba and Flumps and outputted the data in
> the terminal
> 
> Baba                               : 100.00%
> Flumps                             : 100.00%
> 
> 
> I looked at your page thinking "Huh" how can that be correct?
> 
> In the web page text there is no mention of the words Baba or Flumps 
> 
> I looked at the source code No no Mention there either?
> 
> Does Microformalize determine the existence of these properties in
> another way?
> 
> I added another url to examine
> 
> property	Baba	The Elephant
> property	Flumps	A sweetie
> url	Bazaar	http://blog.digitalbazaar.com/
> properties	Baba	Flumps
> url	no foo in this	http://weborganics.co.uk/
> properties	Baba
> 
> 
> the outputed data from the second url
> 
> Baba                               : 100.00%
> Flumps                             : 50.00%
> 
> I KNOW these two properties do not exist in any way at WebOrganics
> 
> Can you see now WHY I am concerned and moderately confused 
> Microformalize does not seem to be calculating the existence of these
> properties on a page it seems to be Just calculating if I have ticked a
> box or not.
> 
> 
> Am I missing something?
> 
> 
> Thanks
> 
> Martin
>  
> > 
> > -- manu
> > _______________________________________________
> > microformats-new mailing list
> > microformats-new at microformats.org
> > http://microformats.org/mailman/listinfo/microformats-new
> 
> _______________________________________________
> microformats-new mailing list
> microformats-new at microformats.org
> http://microformats.org/mailman/listinfo/microformats-new
> 



More information about the microformats-new mailing list