species

From Microformats Wiki
Revision as of 09:20, 7 June 2007 by AndyMabbett (talk | contribs) (Examples in the wild)
Jump to navigation Jump to search

Species

For the latest ideas, and to make comments, please see species-brainstorming.
Note: the original name of the proposed microformat, "species", is likely to change, probably to "biota" or "taxon". The former has been retained here, to avoid having to make many repetitive and perhaps redundant edits
updated! The new beta of Operator detects Species. A test page is available. Work on both continues!

Introduction

People use the vernacular AND taxonomic names of species in everyday speech and writing - just read or watch any populist gardening magazine or television programme.

  • Consider this list: "Blackbird", "poodle", "T Rex", "potato", "French Marigold", "Wisteria", "E. Coli", "HIV", "Rubella" and "human being".
"T Rex" is "Tyrannosaurus rex"; "E. Coli" is "Escherichia coli"; "HIV" is "Human immunodeficiency virus" and "Rubella" is "Rubella virus". All are the taxonomic (or scientific) names of unique species.
"Wisteria" is a taxonomic genus.
"Blackbird"; "poodle"; "potato"; "French Marigold" and "human being" (arguments about Neanderthals not withstanding) are vernacular (or common) names, but still refer to individual species.
  • The scientific naming of organisms is a part of biodiversity informatics - "the application of information technology to the domain of biodiversity".

...that we will work together to help create the key tool that we need to inspire preservation of Earth's biodiversity: the Encyclopedia of Life [...] an encyclopedia that lives on the Internet, with an ever-evolving page for every species [and which] does not duplicate existing efforts, but instead incorporates them through linking [with a] search technology that can aggregate existing biological information and make it easily accessible.

What is missing [from HTML] is an element for marking up "proper names" (names of people, geographic locations, institutions, or even scientific names such as genus/species).

It's interesting that microformats have given us the first three missing items - and we're now debating the fourth!

Proposal

Imagine viewing a web page with a reference to a species - and being able to use an add-on to you browser to be taken directly to information about that species, on, say, Wikipedia, or Wikispecies, or Google Images, or another site, such as in an academic database, of your choosing.

Your software would automatically know to search site A if the scientific name referred to a moth, site B for a bird, and site C for a plant - and you could set your preferences as to which sites those were to be, and in which order two or more were to be searched (e.g. for moths, try UK Moths first, if not found try The Global Lepidoptera Names Index).

Or supposing someone writes a long, chronologically-ordered web page about all the birds, insects, mammals and plants they saw on a wildlife safari, with lots of prose description about the paces where they saw them and the people they were with, but you want to extract a list of species, sorted into alphabetical order within taxonomic class (birds first, then insects then...) or in taxonomic order.

Those are just two of the things a "species" microformat might do for you.

Your software, or a search engine, would be able to differentiate between a pages discussing HMS Beagle, the ship, and a Beagle dog; or birds that fly as opposed to a slang term for women.

Another benefit would be that user-agents could be instructed to treat text marked up in this way as not being in the base language of the document or element in which they occur - pronunciation should be as for Latin, they should not be translated (e.g. where a component word happens also to be a valid word in that language, such as the genus Colon, Circus cyaneus, Hesperia comma, or anything with major or minor on an English-language page) and should not be spell-checked, or be spell-checked with a specialised dictionary (a need identified in this 2003 ietf-languages discussion of language values for taxonomic names).

A further benefit the species microformat would bring is in the enriching and enhancement of species checklists, which are commonly found on the web. Broadly speaking, a species checklist is a list of taxa, usually for a particular group of similar organisms such as birds or vascular plants, found within a particular geographical region (a country, a region, a county, or a specific site, large or small). A typical example of a species checklist is the Checklist of Beetles of the British Isles which, as the name suggests, lists beetles known to be found within the British Isles. A Google Search for "species checklist" will reveal many other such examples. Species checklists are presented in a broadly consistent manner but are usually unable to be parsed and utilised by computers due to the lack of a common standard for marking them up in HTML. The species microformat would provide that common standard. A fully microformat enabled checklist would be parsable by computers and thus would provide developers with a means by which to aggregate and otherwise make use of this invaluable content beyond the current, rather limited, use of simple online viewing.

A specific example of checklist use might be in enabling biological recording software to parse and aggregate checklists in order to include them in their own species dictionaries. Typically there are waits of many months or even years while humans collate checklist changes manually for inclusion in recording software; automated checklist parsing and aggregation would greatly expedite and increase the efficiencies of this process.

Existing taxonomies

The proposal respects all existing biological taxonomies, and is not intended to change or supplant any of them - it is intended merely to provide webmasters (from personal hobby sites to major academic databases; from news outlets to retail organisations) with a method of either:

  1. marking-up a taxonomical name (or taxon-common name pair) in such a way that its components can be recognised by computers or
  2. marking up a common name, so as to associate with it a taxonomical name, in such a way that the latter's components can be recognised by computers.

Embedding within other microformats

The proposed plant microformat (with care regime, supplier, etc.), hlisting, recipe or hReview (and possibly others) could contain a scientific name microformat, in the same way that an hCalendar can contain an hCard.

See also: species-brainstorming#Future development

References

Contributors & Supporters

See also

Here's some work-in-progress:

Examples in the wild

  • The current "straw man" for the Species microformat has been deployed, in part, on Wikipedia. All Wikipedia articles with "taxoboxes" (information panels on living things; and there are thousands) now emit a species microformat. For example outhern_Tamandua (a species) and Anteater (a family of species).
  • A test page is available.

Implementations (pending)

  • Taxon Checker - a software tool which, given a common name, searches for the relevant taxonomic data and outputs the selected species' details as (among other options) an HTML fragment. It is intended to provide templates for outputting such fragments with "species" microformat markup, once this proposal is implemented.
  • Wikpedia's Beastie Bot can be used to update the "taxoboxes" of articles about living things/