dataset-examples: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
(Initial version)
 
m (Replace <entry-title> with {{DISPLAYTITLE:}})
 
Line 1: Line 1:
<entry-title>Dataset examples</entry-title>
{{DISPLAYTITLE:Dataset examples}}


There are many people and organizations publishing datasets online in a wide variety of formats (csv, sequence, xls, etc). Examples of webpages describing and linking to datasets are explored here.  
There are many people and organizations publishing datasets online in a wide variety of formats (csv, sequence, xls, etc). Examples of webpages describing and linking to datasets are explored here.  

Latest revision as of 16:21, 18 July 2020


There are many people and organizations publishing datasets online in a wide variety of formats (csv, sequence, xls, etc). Examples of webpages describing and linking to datasets are explored here.

The Problem

Discovering these datasets is incredibly difficult because there exists simple way of marking up pages that describe these datasets. Today, links to various datasets can be scattered throughout the web or entered into various central repositors. Being able to publish a dataset in a way that an automated search engine or software tool could discover them would go a long way towards easing the discovery process.

Use Cases

As the originator of the data, you publish a webpage with a link to that data for discovery purposes.

Alternatively, a third party may publish links to your data (or webpage describing the data) and include extra metadata about it that the originator may not have included.

Real-World Examples

Links to public web pages, either popular or insightful

Individual/Organizational Publishers

Centralized Repositories and/or Directories

Common Practices

Datasets typically are described using several common fields.

  • fn - name of the dataset
  • records - number of records
  • size - byte size of dataset
  • schema - link to something describing the schema or a description of the schema itself
  • url - url to dataset
    • type - format the data is in
  • sample - sample of data or link to the sample
    • type - format the data is in
  • summary - summary of the dataset
  • description - description of the dataset
  • terms - terms of use for dataset, url likely
  • dtpublished - date dataset was published
  • dtupdated - date dataset was updated
  • contributor - people/organizations contributing to the dataset

Existing Practices

Brainstorming

See Also