species-brainstorming

From Microformats Wiki
Revision as of 16:55, 21 October 2006 by AndyMabbett (talk | contribs) (→‎Typography: fix, sign, my last)
Jump to navigation Jump to search

Species Brainstorming

Andy Mabbett

Proposal

There should, I believe, be a "species" microformat for the markup of plant and animal names, to include their scientific names. Consider:

<abbr class="sci" title="Anas platyrhynchos">Mallard</abbr>

or

<span class="sci">Anas platyrhynchos</span>

The microformat would allow user agents to be configured to perform look-ups on on-line databases of species, according to user preferences. Specification of the taxonomic class would help user agents to know which such databases were applicable (i.e., use database A for plants, but database B for mammals and database C for insects.)

It would also allow for more specific searching (do I mean "crow" or do I mean "Corvus corone"?).

The specification should encourage, but not mandate, the correct capitalisation of scientific names, so "Anas platyrhynchos'" not "anas platyrhynchos" nor (except historically) "Anas Platyrhynchos". A reminder that such names should be styled with italics will also be included.

Straw man proposal

I'm tending towards this model, nested according to components of the microforamt, not taxonomically:

  • sci (scientific name; aka botanical name) (better: taxon; also biota)
    • domain (alternatively: "superregnum")
    • kingdom (alt: "regnum")
    • subkingdom (alt: "subregnum")
    • superphylum
    • phylum
    • subphylum
    • class (alt: "classis")
    • subclass (alt: "subclassis")
    • infraclass (alt: "infraclassis")
    • superorder (alt: "superordo")
    • order (alt: "ordo")
    • suborder (alt: "subordo")
    • infraorder (alt: "infraordo")
    • parvorder
    • superfamily (alt: "superfamilia")
    • family (alt: "familia")
    • subfamily (alt: "subfamilia")
    • bin ("binominal name") (better: taxon-name or txname - if a subspecies, var, subvar, etc are involved, then the binomial name becomes trinomial or even quadrinomial; point being, "binomial name" would be semantically incorrect in many cases.)
      • genus
      • species (="specific epithet")
      • subsp ("subspecies")
      • var ("variety")
      • subvar ("subvariety")
      • form
      • subform
      • cult ("cultivar")
      • cultgp ("cultivar group")
      • cross (e.g. "F1")
      • strain
      • ? morph (or phase) (e.g "Gyrfalcons, for example, have a grey morph and a white morph" [1]; "the Lesser Snow Goose (C. c. caerulescens), commonly occurs in two plumage variants. White-phase birds are white except for black wing tips, but blue-phase geese have bluish-grey plumage replacing most of the white except on the head, neck and tail tip." [2])
    • trade ("trade name")
    • breed (e.g. Bull Terrier)
    • sense (botanical - see examples)
    • authority
      • year (...of authority)
    • cname ("common name")
    • guid
    • vgroup ("vernacular group" - there is possibly a better term for this. Often, a genus or family doesn't encapsulate a particular group of species in a practical or useful fashion. For example, it is difficult to seperate fungi species and lichen species as they are taxonomically intermingled. Thus, within taxonomic databases, a vernacular group of "fungi" and "lichen" is often applied to species falling into either of these groups. A vernacular group could be considered similar to a common name, but for groups of species. See the NBN Gateway for an example of vernacular groups in use; these group names are also used in the Recorder biological recording software.
    • ? gender (useful for species exhibiting sexual dimorphism - "find me a picture of a male Pintail"; "I want to buy a female Holly bush" - a binary value, male or female; or including nueter, hermaphoradite, unspecified and/ or mixed?)- see Future development
    • ? age bracket (adult/ juvenile/ seed/ egg/ nymph/ nestling/ pup/ cub/ instar1/ instar2 etc. - needs more work) - see Future development
    • ? count (a number, or represenattion of some other value - none, unspecified, "present", etc) - see Future development
    • [[name to be suggested] an indicator of type, e.g. for bees, "queen" or "worker"

where all except "bin" are optional, and it is possible to infer from simply:

<abbr class="bin" title="Anas platyrhynchos">Mallard</abbr>

or

<span class="bin">Anas platyrhynchos</span>

that the genus is Anas and the species is platyrhynchos (and, thus, "bin" is to "sci"; as "adr" is to "hCard 1.0")

Examples

Extreme case (Pied Wagtail, a bird):

  <span class="sci">
    <span class="domain">Eukarya</span>
    <span class="kingdom">Animalia</span>
    <span class="subkingdom">Eumetazoa</span>
    <span class="superphylum">Deuterostomia</span>
    <span class="phylum">Chordata</span>
    <span class="subphylum">Vertebrata</span>
    <span class="class">Aves</span>
    <span class="subclass">Neognathae</span>
    <span class="order">Passeriformes</span>
    <span class="suborder">Passeri</span>
    <span class="parvordo">Passerida</span>
    <span class="superfamily">Passeroidea</span>
    <span class="family">Motacillidae</span>
    <span class="bin">
	<span class="genus">Motacilla</span>
	<span class="species">alba</span>
	<span class="subspecies">yarrellii</span>
    </span>
    <span class="cname">Pied Wagtail</span>
    <span class="authority">Linnaeus</span>
    <span class="year">1758</span>
  </span>

Simplified equivalent of the above:

    <span class="bin">
	<span class="genus">Motacilla</span>
	<span class="species">alba</span>
	<span class="subspecies">yarrellii</span>
    </span>

Sub-species (animal, common name displayed):

    <span class="sci">
        <span class="bin">Larus glaucoides</span>
        <span class="sub">kumlieni</span>
    </span>

Variety (plant):

  <span class="sci">
    <span class="bin">Pisum sativum</span>
    var. <span class="var">macrocarpon</span> 
  </span> 

Species (animal, common name displayed):

    <span class="sci">
        <abbr class="bin" title="Larus thayeri">
            <span class="common">Thayer's Gull</span>
        </abbr>
    </span> 

Species (animal, scientific name displayed):

    <span class="sci">
        <abbr class="common" title="Thayer's Gull"> 
            <span class="bin" Larus thayeri</span> 
        </abbr> 
    </span> 

Fungus, kingdom included:

    <span class="sci"> 
        <abbr class="kingdom" title="Fungi"> 
            <span class="bin">Amanita muscaria</span> 
        </abbr> 
    </span> 

Same name for different Genera:

    <p class="biota">
        An unidentified
         <abbr class="taxoclass" title="Aves"> 
         <abbr class="genus" title="Oenanthe">
         <span class="common">
            Wheatear
         </span>
         </abbr>
         </abbr>
    </p>

and :

    <p class="biota">
        An unidentified
         <abbr class="taxoclass" title="Magnoliopsida"> 
         <abbr class="genus" title="Oenanthe">
         <span class="common">
            Water Dropwort
         </span>
         </abbr>
         </abbr>
        sp.
    </p>

Species (animal, with authority and year):

    <span class="sci"> 
        <span class="bin">Pica pica</span> 
        <span class="authority">Linnaeus</span>, 
        (<span class="year">1758</span>) 
    </span>

Re-classified species (animal):

    The species was classified as
    <span class="sci">
        <abbr class="bin" title="Bartramia longicauda">Tringa longicauda</abbr>
        by Johann Bechstein in 1812.
    </span>
Expressing a species with a GUID

Work is currently underway, through TDWG to develop a truly global GUID system based on LSIDs. More on LSIDs.

In the following example case an NBN GUID is provided. This GUID would be usable on the NBN Gateway, The NHM Species Dictionary, in Recorder 2002 and Recorder 6, and in the forthcoming OpenRecorder online recording toolkit. As there are different GUIDs for different databases, the type of GUID can be indicated with a code followed by a hyphen followed by the GUID (e.g. nbn-NBNSYS0000005133).

    <span class="sci nbn-NBNSYS0000005133">
        <span class="bin">Lutra lutra</span>
    </span>

Alternatively, the GUID could be expressed as an element in its own right, with the GUID type being expressed as a secondary class name:

    <span class="sci">
        <span class="bin">Lutra lutra</span>
        <span class="uid nbn">NBNSYS0000005133</span>
    </span>

As a further alternative, the abbr design pattern could potentially be used, although this is semantically questionable:

    <span class="sci">
        <abbr class="bin" title="NBNSYS0000005133">Lutra lutra</abbr>
    </span>

Questions

  • Is "sci" the best attribute name for the top-level?
    • No - Scott Reynen
      • What do you think would be better? - Andy Mabbett
        • Assuming "sci" is short for "scientific name", I propose "scientific-name".
          • It is. That's 12 extra characters! - Andy Mabbett
    • Taxon is a far better solution [3]. It's short, meaningful and in keeping with the other class types. - Andy Mabbett
      • I think "taxonname" or "taxon-name" would be a better value for the class attribute. It is more descriptive of the data your trying to specify the format of. Taxon refers more to the classification grouping I thought. The class attribute is used frequently for the application of CSS styling so the top level class at least needs to be fairly distinctive I would have thought to avoid clashes with other class attribute values in the page and CSS files. - Tony Prichard
        • The OED defines taxon as "A taxonomic group". See also the URL cited, [4]. - Andy Mabbett
          • I agree that taxon would be the most suitable name. It could be considered as a shortening of TaxonConcept (or TaxonName), which is the term used by the TCS - Charles Roper
    • or Biota - Andy Mabbett
  • Should "bin", var", "cult", etc., be written in full? (I think not, to save bloating file sizes)
    • Yes - Scott Reynen
  • Should other attribute names be abbreviated for brevity?
    • No, brevity is not one of the naming principles. "bin", "var", and "cult" all leave ambiguous meaning, which is a problem. We should "Use class names based on names from the original schema," e.g. full words or phrases where they aren't especially long. - Scott Reynen
      • Fair enough, though I worry about some of my pages, with tens or hundreds of species listed! Also, note that "var" "sub" and suchlike are the proper abbreviations to use, in botanical nomenclature (see the posted examples). - Andy Mabbett
      • I think a balance will need to be achieved between brevity in the interests of avoiding bloated html in a page with many species names and giving a meaningful name - Tony Prichard
        • Would bloating really be an issue? Many, if not most, servers (including this one) now gzip,deflate content and thus transfer time aren't so much of an issue. The front page of the microformats site states "Designed for humans first and machines second[...]", so unabbreviated terms would be more consistent with this aim. - Charles Roper
          • 341 species, 58Kb. 'Nuff said? AndyMabbett 11:53, 26 Sep 2006 (PDT)
            • Your bird list page can be compressed by 79%, i.e. it would go down from 58KB to 12KB by enabling output compression on your server. It would also make the page load faster and save you bandwidth. No doubt compression technologies will improve over time, as will connection speeds and server speeds, so the technical solution to reducing page size would seem to me to be preferable over the "manual compression" method, i.e. using abbreviated, less clear, less readable class names. While it is easy to improve the compression technology (or switch it on, even), it's much harder to change an established microformat standard. - Charles Roper
  • Is "class" a potentially confusing attribute name, and what should replace it ("taxoclass", perhaps? or "classis"?)
    • Yes I would avoid class as it a frequent keyword in software languages - Tony Prichard
      • "bin" and "var" are also extremely common terms using in programming languages - Charles Roper
  • What other attribute names are needed, if any (we could do with help from a taxonomist!)
  • How to deal with: "Podiceps sp." (a grebe of unknown species)
    • How about the following, where we can infer an unknown species by the absence of that attribute?:
<span class="bin"><span class="genus">Podiceps</span></span>
    • There are also species aggregates and groups to be considered Grey/Dark Dagger sp., where it is one of two species but where the genus Acronicta cannot be used as there are more than the two species in the genus - Tony Prichard
      • Any suggestions? Or other examples? - Andy Mabbett
  • Should we allow divisions of "bin" with no parent "sci", such as:
<span class="bin">Larus glaucoides <span class="sub">kumlieni</span></span>
  • Is the "fungus" example OK, given that Amanita muscaria is not an abbreviation of "funghi"?
    • I do not like the use of the abbr tag at all in the examples given. The abbr tag is for abbreviations with the suggestion that the title is used for the full name. The implication in the Mallard example is that Mallard is an abbreviation for the scientific name, it is not it is a different type of name - Tony Prichard
  • Do the "authority" and "date" pair need a joint wrapper?
  • I first thought that "all except "bin" are optional"; now I'm not so sure. Should be be able to mark up:
An unidentified <abbr class="taxoclass" title="Sauropsida">reptile</abbr>
  • Is "bin" (short for binominal) the most appropriate term for a taxon name? When subspecies, var, subvar, etc. are nested, then surely it becomes trinomial? Would simply name or TaxonName not be more flexible? - Charles Roper

To add

  • Animal hybrids
  • GUID (Globally Unique Identifier). When referencing to a taxon name, there is also often the need to provide a GUID which relates to a taxonomic concept database (such as the NHM Species Dictionary). By providing a GUID, ambiguity is removed. - Charles Roper

Future development

Instead of including gender, age-bracket and count, we could allow for a furture microformat, called, perhpas, "sighting", which might have the components:

  • sighting
    • species (a "species" microformat)
      • set (one or more)
        • count
        • age-bracket
        • gender
    • location (hCard or geo)
    • date-time

See West Midland Bird Club's Latest news from Ladywalk and In and around South Staffordshire 2006 (blog) for simple examples.

Bill Hull

My website has 17000+ photos of 4700+ bird species. There are also a handful of butterflies (organized very poorly as I am unaware of any published butterfly world taxonomies) and shortly will have a number of dragon/damselflies. The site is made up of static pages but is built from a database so it is easy for me to add it new HTML tags to the pages. If you are interested in some prototyping at some point I can probably build stuff into the pages. - Bill Hull

Roger Hyam

Taxonomic Databases Working Group

TDWG is the organisation for standardisation in exchange of biodiversity data. The organisation is currently undergoing a degree of re-organisation and is developing an architecture to integrate the different standards it produces with each other and with those in use in the semantic web and geospatial communities. Part of this architecture will be a central ontology for things like scientific biological names.

Because of its role in bridging technologies the application that manages the ontology will need to be able to express the same basic semantics in multiple formats (e.g. RDFS, OWL, Geography Mark Up, OBO etc). It seems logical that this application should also generate basic microformat definitions for each of the classes it contains. If we have an ontology defining 'Taxon Name' and 'specific epithet' for example the same notion should be mapped to as many technologies as possible.

TDWG is also supporting a system for Globally Unique Identifiers based on Life Science Identifiers for biodiveristy objects such as taxon names, specimens, herbaria etc which it would be cool to integrate into any microformat.

There is a meeting in St Louis, USA, October 2006 where the way forward for the ontology will be discussed. Decisions made at the meeting will govern what is possible. It is difficult to take this further without concensus from that meeting.

If it is after October 2006 and you are interested in learning more please contact me (Roger Hyam).

  • Thanks, Roger - it's good to have the involvement of such an august body, especially just before such a fortuitously-scheduled event. Is there any chance (and I realise that this is rather late in the day) that this proposal could be on the agenda in St Louis (even if only through a note in the papers/ programme); or that someone from the microformat community could attend/ speak there? Or that the TDWG and/or conference websites could link to http://microformats.org/wiki/species? - Andy Mabbett

Malcolm Storey

(extracted from e-mails to Andy Mabbett, by kind permission)

  • "Hopefully I'll have more time for things like this in the New Year, but expect it all be done and dusted by then!!" - Malcolm Storey, BioImages

ICZN, ICBN et al

You don't cover the full set of levels of taxonomic hierarchy defined by the relevant body ICZN or ICBN (plus the others - one each for garden plant varieties, bacteria, viruses. Don't know about mycoplasmas, diseases, BSE factors etc.

ICBN Ranks listed [5], [6]

AIUI ICBN only goes down to species.

ICZN isn't so easy: [7]

1.2.2. The Code regulates the names of taxa in the family group, genus group, and species group. Articles 1-4, 7-10, 11.1-11.3, 14, 27, 28 and 32.5.2.5 also regulate names of taxa at ranks above the family group. (But none of the above articles list the taxonomic ranks.)

ICZN Only goes down to subspecies (art 1.3.4)

Note also:

1.4. Independence. Zoological nomenclature is independent of other systems of nomenclature in that the name of an animal taxon is not to be rejected merely because it is identical with the name of a taxon that is not animal (see Article 1.1.1)

(eg Trichia, Oenanthe, Melanotus)

Myxomycetes are the exception - they're in kingdom protozoa which falls under ICZN but they fall under the ICBN name space. (Hence "Trichia").

DNA

You may want to consider refs to DNA sequences. They're not part of taxonomy, but they can be considered the bottom rung of the taxonomic hierarchy and they will be of increasing significance.

Typography

what about Adalia 2-punctata, and Adalia bipunctata (not to mention those with hyphens [or apostrophes] which may get left out. And what about accented characters)?

Adalia 2-punctata is an abbreviation of Adalia bipunctata, so:
<abbr title="Adalia bipunctata">Adalia 2-punctata</abbr>

AndyMabbett 09:55, 21 Oct 2006 (PDT)

Gaps

The hierarchy is not always fully populated. Not every species belongs to a class. Maybe this was where fungi are different. In Paul Kirk's databases (which are the official ones used to drive the checklists and NBN) he has fixed fields for the higher level taxa which means that only certain ranks can be used. The blanks he fills in (mostly!!) with "insertae sedis" (think it's Latin for "unknown seat"). In my database I use a self-join which gives much more flexibility. Anyway there are lots of "insertae sedis" in Paul's database!

Homonyms

Apion carduorum sensu Morris 1990 is Apion gibbirostre (Gyllenhal, 1813). Apion carduorum Kirby, 1808 is a different species.

You'd mark the former up as something like
<abbr class="binominal" title="Apion gibbirostre">''Apion carduorum'' sensu Morris 1990</abbr>
AndyMabbett 12:21, 5 Oct 2006 (PDT)

Citations for authorites

If people are citing the authority in full they would include the literature reference, not just the date e.g.

Cuphophyllus niveus (Scop.) Bon, Doc. Mycol. 14(56): 11 (1985)[1984]

Hyppo

Nomenclatural challenge

You asked for comments. One challenge I see is the difference in Nomenclature for Animalia and Plantae (coming from the old 2 kingdom system). For Plantae the International Code of Botanical Nomenclature[8] is used and for Animalia the code from http://www.iczn.org/. Animalia code is not officially accepted but ICZN tries to be authoritive starting from 2008.

The two different nomenclatural systems differ in a few areas, and they affect markup.

  • Subgenus (Plantae): Dendroceros subg. Apoceros
  • Subgenus (Animalia): Sula (Morus)
  • Subspecies (Plantae): Begonia grandis ssp. evansiana
  • Subspecies (Animalia): Gorilla beringei graueri
--Hyppo 14:23, 9 Oct 2006 (PDT)
I would mark those up as:
<span class=genus">Dendroceros</span> subg. <span class="subgenus">Apoceros</span>
<span class=genus">Sula</span> <span class="subgenus">Morus</span>
<span class="binominal">Begonia grandis</span> ssp. <span class="subspecies">evansiana''</span>
<span class="binominal">Gorilla beringei</span> <span class="subspecies">graueri</span>
With wrapping class="biota" and possibly kingdom, attributes.
AndyMabbett 11:37, 10 Oct 2006 (PDT)

See also