[uf-discuss] Request for Microformats.org MediaWiki SQL data

Rob Manson roBman at MobileOnlineBusiness.com.au
Tue Mar 11 16:10:38 PST 2008


Hey Manu,

here's another way you can get the same data as the admins generally
don't seem to have time to respond to system level requests.  

A few months back I downloaded all the irc logs from:

	http://rbach.priv.at/Microformats/IRC/

All wiki edits are logged here.

Then I used a dirty little script (included below) to build some maps of
the following;

- Overall history : a map of all wiki edits
- Editor history : a map of all the edits grouped by editor
- Page history : a map of all edits grouped by page

I'm sure the pattern hasn't really changed since I ran this.

Basically the top 3 were AndyMabbett, Tantek and ChristopheDucamp - by a
long way.

And the top 50 page history sorted by number of editors is listed below.
NOTE: this is old data.


Personally it seems like a herculean effort and persistence is required
to contribute to this community and I'm very impressed by anyone who is
able to.  

And requests for basic system level updates, for example useful upgrades
to the wiki engine or just being able to contribute open source code
(note http://microformats.org/code/ currently just shows a 404 in the
body), seem to be piped directly to /dev/null.


roBman



Page history:
===========================================================================
Editors         Edits           Page
---------------------------------------------------------------------------
139     :       407     :       Main Page
133     :       346     :       hcard
97      :       246     :       hcard-examples-in-wild
81      :       192     :       hcalendar
62      :       117     :       irc
56      :       84      :       irc-people
47      :       173     :       hreview
42      :       177     :       implementations
42      :       87      :       hcalendar-examples-in-wild
41      :       119     :       hresume
37      :       225     :       hatom
32      :       196     :       to-do
30      :       56      :       rel-tag
29      :       195     :       events
27      :       226     :       hatom-issues
27      :       42      :       hreview-examples-in-wild
25      :       103     :       geo
24      :       209     :       species-brainstorming
22      :       35      :       Talk:Main Page
20      :       80      :       faq
20      :       116     :       citation-brainstorming
20      :       108     :       hcard-issues
20      :       98      :       hcard-brainstorming
19      :       30      :       rel-license
17      :       59      :       presentations
17      :       69      :       posh
16      :       104     :       citation-examples
16      :       48      :       hcalendar-issues
16      :       21      :       xfolk
16      :       33      :       process
16      :       19      :       xfn-implementations
16      :       26      :       hresume-examples-in-wild
16      :       27      :       xoxo
15      :       29      :       events/2006-03-13-sxsw-microformats
15      :       36      :       recipe-brainstorming
15      :       28      :       vote-links
15      :       81      :       mailing-lists
15      :       30      :       hreview-issues
15      :       70      :       hcard-faq
15      :       40      :       measure-brainstorming
14      :       36      :       citation-formats
14      :       79      :       media-info-examples
14      :       37      :       rel-tag-faq
14      :       57      :       include-pattern-feedback
14      :       33      :       icons
14      :       17      :       Talk:xmdp-faq
13      :       15      :       media-info-formats
13      :       29      :       hlisting-feedback
13      :       18      :       plazes-syntax
13      :       40      :       podcasts
13      :       33      :       mailing-list-unmoderation
13      :       37      :       what-are-microformats
13      :       16      :       Talk:posh
13      :       28      :       hcalendar-implementations
13      :       266     :       advocacy
13      :       54      :       press
12      :       34      :       picoformats
12      :       27      :       podcasts-fr
12      :       14      :       Talk:zen-garden
12      :       33      :       rest/ahah
12      :       27      :       events/2007-04-18-web-2-expo-dinner
12      :       17      :       citation-irc-meetup
12      :       37      :       hcard-examples
12      :       13      :       Talk:WikiNode
11      :       18      :       hatom-hints
11      :       18      :       hcalendar-profile
11      :       17      :       photo-note-examples
11      :       96      :       species
11      :       20      :       hcard-implementations
11      :       18      :       User:TimG
11      :       38      :       rel-tag-issues
11      :       50      :       resume-brainstorming
11      :       30      :       hcalendar-brainstorming
11      :       31      :       alternates-brainstorming
11      :       62      :       currency-examples
11      :       28      :       citation
10      :       690     :       Special:Log/block
10      :       26      :
events/2006-03-12-sxsw-growth-evolution-of
10      :       21      :       hlisting-proposal
10      :       14      :       rest/forms-brainstorming
10      :       23      :       adr
10      :       13      :       Talk:selected-test-cases-from-the-web
10      :       52      :       governance-issues
10      :       11      :       Talk:rel-design-pattern
10      :       13      :       hbib
10      :       12      :       Talk:rest/examples
10      :       33      :       events/2006-06-13-where-2-bof
10      :       41      :       hcard-cheatsheet
10      :       12      :       rel-home-issues
10      :       18      :       twitter-syntax
10      :       10      :       Talk:datetime-design-pattern
10      :       11      :       Talk:chat-examples
10      :       20      :       rel-faq
10      :       13      :       Talk:rel-enclosure
9       :       20      :       mailing-lists-proposals
9       :       45      :       buttons




#!/usr/bin/perl

$, = "\n";
use Data::Dumper;

chomp(my @file = `cat $ARGV[0]`);
my $history;
foreach my $line (@file) {
        $line =~ /\[\[(.*?)\]\].*\*(.*?)\*/;
        my $page = $1;
        my $editor = $2;
        $editor =~ s/\s+//g;
        $history->{$page}->{$editor}++;
        $history->{pages}->{$page}++;
        $history->{editors}->{$editor}++
}

print "Overall wiki history:\n";
print Dumper($history),"\n\n";

my $pages;
foreach my $item (keys %{ $history->{pages} }) {
        my $key = scalar keys %{ $history->{$item} };
        $key .= " : ".$history->{pages}->{$item}." : ".$item;
        $pages->{$key} = $history->{$item};
}
my @pages = sort {$b <=> $a} keys %{ $pages };
print "Page history:\n";
print @pages,"\n\n";

my $editors;
foreach my $item (keys %{ $history->{editors} }) {
        $editors->{$history->{editors}->{$item}." : ".$item} = $item;
}
my @editors = sort {$b <=> $a} keys %{ $editors };
print "Editor history:\n";
print @editors;



On Tue, 2008-03-11 at 13:13 -0400, Manu Sporny wrote:
> This is a request for the Microformats.org MediaWiki MySQL database
> data. If one of the admins could do a mysqldump of the database (or
> selected tables) and place it onto a public HTTP/FTP site, that would be
> ideal.
> 
> WARNING: Do not dump the password or e-mail field for the user table.
> 
> I'd like to run an analysis on the number of contributions made by
> everyone involved in this community and attempt to write an algorithm to
> detect edit wars.
> 
> This request is two-fold:
> 
> 1. I'm curious to see who the most prolific wiki contributors are and if
>    they have any correlation with the most prolific mailing list
>    contributors.
> 2. It would be good to have an automatic process that could detect and
>    log wiki edit wars, thus reducing the load on the admins and the rest
>    of the community.
> 
> -- manu
> 



More information about the microformats-discuss mailing list