dataset-examples

From Microformats Wiki
Revision as of 22:40, 1 May 2013 by Aloisius (talk | contribs) (Initial version)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

<entry-title>Dataset examples</entry-title>

There are many people and organizations publishing datasets online in a wide variety of formats (csv, sequence, xls, etc). Examples of webpages describing and linking to datasets are explored here.

The Problem

Discovering these datasets is incredibly difficult because there exists simple way of marking up pages that describe these datasets. Today, links to various datasets can be scattered throughout the web or entered into various central repositors. Being able to publish a dataset in a way that an automated search engine or software tool could discover them would go a long way towards easing the discovery process.

Use Cases

As the originator of the data, you publish a webpage with a link to that data for discovery purposes.

Alternatively, a third party may publish links to your data (or webpage describing the data) and include extra metadata about it that the originator may not have included.

Real-World Examples

Links to public web pages, either popular or insightful

Individual/Organizational Publishers

Centralized Repositories and/or Directories

Common Practices

Datasets typically are described using several common fields.

  • fn - name of the dataset
  • records - number of records
  • size - byte size of dataset
  • schema - link to something describing the schema or a description of the schema itself
  • url - url to dataset
    • type - format the data is in
  • sample - sample of data or link to the sample
    • type - format the data is in
  • summary - summary of the dataset
  • description - description of the dataset
  • terms - terms of use for dataset, url likely
  • dtpublished - date dataset was published
  • dtupdated - date dataset was updated
  • contributor - people/organizations contributing to the dataset

Existing Practices

Brainstorming

See Also