[uf-new] Microformat for Datasets

Jordan Mendelson jordan at commoncrawl.org
Mon Mar 25 11:12:06 PDT 2013


Has there been any work towards a microformat for datasets like what you'd find at http://data.gov, http:///commoncrawl.org, etc?

Open data is becoming more common and there is a lot of metadata surrounding it (url, format of the data, size of dataset, when it was published, when it was updated, description, sample data, license/terms of use, contributors, geo (if data relates to an area), etc and really no way to easily find it outside some very incomplete directories.

With a microformat, one might actually be able to build a decent search engine to help people who are searching for datasets for use in research, commerce, etc.

My organization publishes several hundred TB of web crawl data and at a recent talk at Strata, someone asked me about a microformat for datasets. I feel like if there isn't one started yet, one needs to be started.


More information about the microformats-new mailing list