species-brainstorming: Difference between revisions
AndyMabbett (talk | contribs) (→Response by Andy Mabbett: addendum & tyops) |
(Added inactive template per discussion in IRC) |
||
(77 intermediate revisions by 24 users not shown) | |||
Line 1: | Line 1: | ||
{{inactive}} | |||
=Species Brainstorming= | =Species Brainstorming= | ||
:'''Note: the original name of the proposed microformat, "species", is likely to change, probably to "biota" or "taxon". The former has been retained here, to avoid having to make many repetitive and perhaps redundant edits''' | :'''Note: the original name of the proposed microformat, "species", is likely to change, probably to "biota" or "taxon". The former has been retained here, to avoid having to make many repetitive and perhaps redundant edits''' | ||
:'''{{UpdateMarker}} The [http://www.kaply.com/weblog/2007/02/16/operator-07a-is-available/ Operator] extension now detects ''Species''. [http://www.westmidlandbirdclub.com/records/lists-2004uf.htm A test page is available]. Work on both continues!''' | |||
==Andy Mabbett== | ==Andy Mabbett== | ||
Line 20: | Line 24: | ||
===Straw man proposal=== | ===Straw man proposal=== | ||
See : [[species-strawman-01]] | |||
==Bill Hull== | ==Bill Hull== | ||
Line 319: | Line 32: | ||
===Taxonomic Databases Working Group=== | ===Taxonomic Databases Working Group=== | ||
[http://www.tdwg.org/index.html TDWG] is the organisation for standardisation in exchange of biodiversity data. The organisation | [http://www.tdwg.org/index.html TDWG] is the organisation for standardisation in exchange of biodiversity data. The organisation has now (November 2007) undergone some re-organization. It has a new collaborative development environment, standards process, standards architecture and it has formed alliances with major organizations in the domains of geospatial and ecological data. | ||
Central to the TDWG standards architecture are the [http://wiki.tdwg.org/twiki/bin/view/TAG/LsidVocs LSID vocabularies]. The role of these vocabularies is to define URIs for the nuts-and-bolts concepts that occur in the biodiversity informatics domain. See [http://wiki.tdwg.org/twiki/bin/view/TAG/WhatIsTheOntology a description of what the TDWG ontology is] for details. Although the vocabularies are defined in OWL the intention is for their URIs to be used as namespaces across different XML and non-XML based technologies. They can act as a central mapping point for those hard pressed developers who want to combine data presented to them in many formats. | |||
TDWG is | The species microformats that are proposed here are a good thing. The only danger is that they re-define any of the central terms defined in the TDWG vocabularies. If they do that then they are creating another language instead of extending HTML to embrace existing semantics - which I don't think is their intent. It would be nice to have the data in web pages in a form that can be combined with the hundreds of millions of records marked up with the TDWG URIs. | ||
If there is enough belief in the need for a Species Microformat why not propose a TDWG Applicability Statement and take it through a peer review process. The [http://www.tdwg.org/about-tdwg/process/ TDWG process] is quite simple and free (unless you count blood, sweat and tears). You would need to form a Task Group with a charter saying what you intended to do. As convener of the TAG Interest Group I would willingly host the Task Group. You could then propose a standard and have it reviewed by a range of biologists and IT people before it becomes ratified and recommended for adoption. RogerHyam 2007-11-5 | |||
==Malcolm Storey== | ==Malcolm Storey== | ||
Line 382: | Line 91: | ||
If people are citing the authority in full they would include the literature reference, not just the date e.g. | If people are citing the authority in full they would include the literature reference, not just the date e.g. | ||
:''Cuphophyllus niveus'' (Scop.) Bon, ''Doc. Mycol.'' 14(56): 11 (1985)[1984] | :''Cuphophyllus niveus'' (Scop.) Bon, ''Doc. Mycol.'' 14(56): 11 (1985)[1984] | ||
::Perhaps we should allow for the inclusion of an [[hcitation|hCitation]]? [[User:AndyMabbett|Andy Mabbett]] 15:08, 28 Feb 2007 (PST) | |||
==Hyppo== | ==Hyppo== | ||
Line 418: | Line 129: | ||
We would discourage full expression of the Linnaean hierarchy except for those who are maintaining such classifications (such as uBio). The rest of the hierarchy can be retrieved ontologically as necessary. | We would discourage full expression of the Linnaean hierarchy except for those who are maintaining such classifications (such as uBio). The rest of the hierarchy can be retrieved ontologically as necessary. | ||
Better to tie the scientific name (taxon name) to the authority or ontology from which it came. I.e. for those who are able to provide information on taxonomic concepts, support for TCS (Taxonomic Concept Schema) fields would be important. | Better to tie the scientific name (taxon name) to the authority or ontology from which it came. I.e. for those who are able to provide information on taxonomic concepts, support for TCS (Taxonomic Concept Schema) fields would be important. | ||
I prefer "taxon" or "taxon-name" or TaxonName over biota (which is plural, and too close to biotic which has a far larger scope than taxa). Would prefer "binomial" to "binominal" | I prefer "taxon" or "taxon-name" or TaxonName over biota (which is plural, and too close to biotic which has a far larger scope than taxa). Would prefer "binomial" to "binominal" | ||
*I also favour "taxon" over "biota" simply because it the more commonly used term. I also prefer "binomial". I did a quick straw poll of various experts and all favoured binomial. Neither is technically incorrect, but binomial is more commonly used. Indeed, a Google search for binomial returns 6,580,000 results while binominal returns 342,000 and a "did you mean: binomial" prompt. --[[User:CharlesRoper|Charles Roper]] 04:12, 9 Jan 2007 (PST) | |||
**This [http://www.googlebattle.com/index.php?domain=%22binomial+name%22+-equation&domain2=%22binominal+name%22+-equation&submit=Go%21 binomial vs. binominal Google battle] seems even more conclusive. [[User:AndyMabbett|Andy Mabbett]] 06:17, 9 Jan 2007 (PST) | |||
"class" is difficult not only because of the confusion with the programming concept of classes, but because it is a taxonomic rank. However, most of us have figured out the difference by now so this is not critical. | "class" is difficult not only because of the confusion with the programming concept of classes, but because it is a taxonomic rank. However, most of us have figured out the difference by now so this is not critical. | ||
"cname" should be "comname" or "common-name" or "vernacular" to make it more obvious what the information is. A sub-component would be the language for which that common name is used ( something like an HTML attribute lang="en") | "cname" should be "comname" or "common-name" or "vernacular" to make it more obvious what the information is. A sub-component would be the language for which that common name is used ( something like an HTML attribute lang="en") | ||
*I also favour "common-name" or "vernacular" --[[User:CharlesRoper|Charles Roper]] 04:12, 9 Jan 2007 (PST) | |||
There are known conflicts between names across kingdoms (as current codes of nomenclature allow these). Thus specification of kingdom may be encouraged. Disambiguation could be handled by applications outside the microformats (this could be difficult), or they could be dealt with in the core microformat: e.g. plant-taxon or fungal-taxon or animal-taxon. | There are known conflicts between names across kingdoms (as current codes of nomenclature allow these). Thus specification of kingdom may be encouraged. Disambiguation could be handled by applications outside the microformats (this could be difficult), or they could be dealt with in the core microformat: e.g. plant-taxon or fungal-taxon or animal-taxon. | ||
Line 453: | Line 166: | ||
Simply marking up the word as a taxon would lighten the load of any parser, making its job much simpler. --[[User:CharlesRoper|Charles Roper]] 10:50, 8 Jan 2007 (PST) | Simply marking up the word as a taxon would lighten the load of any parser, making its job much simpler. --[[User:CharlesRoper|Charles Roper]] 10:50, 8 Jan 2007 (PST) | ||
***Your first example requires the author of that page to find LSID, even assuming that they know such a thing exists. How is that "paving the cowpaths"? Your latter example removes semantic detail which is included in the straw-man proposal. It is akin to removing all the children of "adr" in hCard. I think your parser-load issue is a red herring. [[User:AndyMabbett|Andy Mabbett]] 11:07, 8 Jan 2007 (PST) | ***Your first example requires the author of that page to find LSID, even assuming that they know such a thing exists. How is that "paving the cowpaths"? Your latter example removes semantic detail which is included in the straw-man proposal. It is akin to removing all the children of "adr" in hCard. I think your parser-load issue is a red herring. [[User:AndyMabbett|Andy Mabbett]] 11:07, 8 Jan 2007 (PST) | ||
**** I would argue that finding and using an LSID would not be a difficult task for any author who is using a microformat. I don't see how it is any more difficult - in fact I see it as being easier - than manually marking up ranks. Why is parser-load a red herring? --[[User:CharlesRoper|Charles Roper]] 12:26, 8 Jan 2007 (PST) | |||
***Nice example (having done my doctoral work on a Passerine that may or may not be singing...). Absolutely I'd recommending marking up "Passeriformes" but no need to go on to specify "Aves." I'm still grokking microformats so I don't think we've got a conflict. [[User:CyndyParr|CyndyParr]] 10:20, 10 Jan 2007 (PST) | |||
****''Aves'' is available for use, but not required, so indeed, we don't have conflict ;-) [[User:AndyMabbett|Andy Mabbett]] 10:42, 10 Jan 2007 (PST) | |||
*The rest of the hierarchy can be retrieved ontologically as necessary. | *The rest of the hierarchy can be retrieved ontologically as necessary. | ||
**That's a use-case once the uF is published, certainly. the proposal doesn't require that the hierarchy be marked-up, it merely allows for it, in cases where it is '''already published'''. | **That's a use-case once the uF is published, certainly. the proposal doesn't require that the hierarchy be marked-up, it merely allows for it, in cases where it is '''already published'''. | ||
***I've yet to see any consistent examples of a hierarchy being marked-up using class names resembling those found in the proposal. A microformat is supposed take (and perhaps tweak, or clean up) mark-up practises that are '''already in use''', not invent new ones. In other words, microformats should pave the cowpaths. While allowing for the marking-up of the hierarchy is fair enough (I understand the reasons for wanting that option), I believe the vast majority of authors do not need that facility, or (from my own experience) do not have time or energy to make use of anything more complex than simply marking-up a piece of text as a taxonomic name. In its current state, I don't believe the current species microformat proposal fulfils any of the "philosophy of microformats" points raised in [http://ifindkarma.typepad.com/relax/2004/12/microformats.html this article]. I believe the added complexity acts as a disincentive potential users and is also clearly confusing. With taxonomic intelligence (hierarchies, synonymy, etc) being available from elsewhere (e.g. uBio), why have it embedded in the microformat? What examples of this kind of usage are there and what leads you to believe authors '''will''' use it, if it's available? [[rel-license]] is an example of a microformat that is simple and holds intelligence elsewhere. I believe simplicity is the key to a successful species microformat. --[[User:CharlesRoper|Charles Roper]] 10:50, 8 Jan 2007 (PST) | ***I've yet to see any consistent examples of a hierarchy being marked-up using class names resembling those found in the proposal. A microformat is supposed take (and perhaps tweak, or clean up) mark-up practises that are '''already in use''', not invent new ones. In other words, microformats should pave the cowpaths. While allowing for the marking-up of the hierarchy is fair enough (I understand the reasons for wanting that option), I believe the vast majority of authors do not need that facility, or (from my own experience) do not have time or energy to make use of anything more complex than simply marking-up a piece of text as a taxonomic name. In its current state, I don't believe the current species microformat proposal fulfils any of the "philosophy of microformats" points raised in [http://ifindkarma.typepad.com/relax/2004/12/microformats.html this article]. I believe the added complexity acts as a disincentive potential users and is also clearly confusing. With taxonomic intelligence (hierarchies, synonymy, etc) being available from elsewhere (e.g. uBio), why have it embedded in the microformat? What examples of this kind of usage are there and what leads you to believe authors '''will''' use it, if it's available? [[rel-license]] is an example of a microformat that is simple and holds intelligence elsewhere. I believe simplicity is the key to a successful species microformat. --[[User:CharlesRoper|Charles Roper]] 10:50, 8 Jan 2007 (PST) | ||
****''I've yet to see any consistent examples of a hierarchy being marked-up using class names resembling those found in the proposal.'' Perhaps not but, unlike other uFs, in taxonomy there exist clearly defined standards for the names of the components of taxonomic names. This is akin to the pre-existing class names from vCard, as used in hCard. | ****''I've yet to see any consistent examples of a hierarchy being marked-up using class names resembling those found in the proposal.'' Perhaps not but, unlike other uFs, in taxonomy there exist clearly defined standards for the names of the components of taxonomic names. This is akin to the pre-existing class names from vCard, as used in hCard. | ||
****''A microformat is supposed take (and perhaps tweak, or clean up) mark-up practises that are '''already in use''''' Taxonomic classes ''''are '''' already in use. | *****Not so: vCard is widely used standard already and thus it was a natural progression to develop hCard. There is no software based vCard equivalent of the taxonomic hierarchy in common use that I am aware of. | ||
****''A microformat is supposed take (and perhaps tweak, or clean up) mark-up practises that are '''already in use''''' Taxonomic classes ''''are '''' already in use.[[User:AndyMabbett|Andy Mabbett]] | |||
*****My concern still stands that there is no consistent mark-up usage that I can find. | |||
***Fair enough [[User:CyndyParr|CyndyParr]] 10:20, 10 Jan 2007 (PST) | |||
****''I believe the vast majority of authors [...] do not have time or energy to make use of anything more complex than simply marking-up a piece of text as a taxonomic name''' and - as has been pointed out previously, they will be able to do the latter, and nobody will force them to do the former. Why should they not, though, be able to do the latter should they wish? | ****''I believe the vast majority of authors [...] do not have time or energy to make use of anything more complex than simply marking-up a piece of text as a taxonomic name''' and - as has been pointed out previously, they will be able to do the latter, and nobody will force them to do the former. Why should they not, though, be able to do the latter should they wish? | ||
*****As I say, I find the concept of allowing the full suite of ranks to be fair - I understand your desire to have them in there. I just feel that the complexity they add to the specification will put off authors and confuse them. I also maintain that very few authors will make use of this extra complexity. Should we have some sort of poll to try and determine how many people would be able to make use of the full proposal? I'm not totally against having all of the ranks in the Species microformat, I've just yet to be convinced they are necessary or conducive to adoption of the standard. --[[User:CharlesRoper|Charles Roper]] 12:26, 8 Jan 2007 (PST) | |||
****''What examples of this kind of usage are there'' Those on [[species-examples]], e.g. Wikipedia. | ****''What examples of this kind of usage are there'' Those on [[species-examples]], e.g. Wikipedia. | ||
*****I've yet to find any consistent mark-up usage.--[[User:CharlesRoper|Charles Roper]] 12:26, 8 Jan 2007 (PST) | |||
****''[[rel-license]] is an example of a microformat that is simple and holds intelligence elsewhere'''' It holds no intelligence elsewhere, which was not already on the pre-microformatting page.[[User:AndyMabbett|Andy Mabbett]] 11:07, 8 Jan 2007 (PST) | ****''[[rel-license]] is an example of a microformat that is simple and holds intelligence elsewhere'''' It holds no intelligence elsewhere, which was not already on the pre-microformatting page.[[User:AndyMabbett|Andy Mabbett]] 11:07, 8 Jan 2007 (PST) | ||
*****The license on the end of the rel-license link is the intelligence. To look at it from a different angle, why not embed the license information within class attributes? Why not have a full license microformat, just in case some author needs it? Rel-license as it stands serves the needs of most authors most of the time, which is a fundamental philosophy of microformats. | |||
*Better to tie the scientific name (taxon name) to the authority or ontology from which it came. | *Better to tie the scientific name (taxon name) to the authority or ontology from which it came. | ||
**That would require the publisher to add extra data, which they might not wish to publish, nor, indeed, have to hand. Microformats are about recognising what data is '''already''' published and then enabling people to add semantics which identify the type of data on their pages. | **That would require the publisher to add extra data, which they might not wish to publish, nor, indeed, have to hand. Microformats are about recognising what data is '''already''' published and then enabling people to add semantics which identify the type of data on their pages. | ||
***I'm just suggesting support for such authority or ontology for those of us who think it important [[User:CyndyParr|CyndyParr]] 10:20, 10 Jan 2007 (PST) | |||
****Again, the option to do so is in the current proposal. [[User:AndyMabbett|Andy Mabbett]] 10:42, 10 Jan 2007 (PST) | |||
*[common names] A sub-component would be the language for which that common name is used (something like an HTML attribute lang="en") | *[common names] A sub-component would be the language for which that common name is used (something like an HTML attribute lang="en") | ||
**Indeed, but that's already available, and (on properly constructed pages) should already be on the parent container. | **Indeed, but that's already available, and (on properly constructed pages) should already be on the parent container. [[User:AndyMabbett|Andy Mabbett]] | ||
***Fair enough [[User:CyndyParr|CyndyParr]] 10:20, 10 Jan 2007 (PST) | |||
*conflicts between names across kingdoms (as current codes of nomenclature allow these). Thus specification of kingdom may be encouraged. | *conflicts between names across kingdoms (as current codes of nomenclature allow these). Thus specification of kingdom may be encouraged. | ||
**already in the proposal! | **already in the proposal! [[User:AndyMabbett|Andy Mabbett]] | ||
***but perhaps the proposal could be more explicit about the importance of kingdom given its important role in disambiguating species names (using a name of any other rank is less desirable given instability and required application overhead). I realize that I'm going beyond the microformat itself to "best practices" but please forgive me; I've been wrangling with taxonomic databases for a long time. [[User:CyndyParr|CyndyParr]] 10:20, 10 Jan 2007 (PST) | |||
*Disambiguation could be handled by applications outside the microformats | *Disambiguation could be handled by applications outside the microformats | ||
**Not sure what you mean here, since all parsing is done "outside microformats". | **Not sure what you mean here, since all parsing is done "outside microformats". [[User:AndyMabbett|Andy Mabbett]] | ||
***Thanks for the clarification [[User:CyndyParr|CyndyParr]] 10:20, 10 Jan 2007 (PST) | |||
***Another reason to make use of nameservers, rather than embedding the information within the microformat. --[[User:CharlesRoper|Charles Roper]] 10:50, 8 Jan 2007 (PST) | ***Another reason to make use of nameservers, rather than embedding the information within the microformat. --[[User:CharlesRoper|Charles Roper]] 10:50, 8 Jan 2007 (PST) | ||
** And how is enforcing the use of nameservers "paving the cowpaths"? [[User:AndyMabbett|Andy Mabbett]] 11:07, 8 Jan 2007 (PST) | **** And how is enforcing the use of nameservers "paving the cowpaths"? [[User:AndyMabbett|Andy Mabbett]] 11:07, 8 Jan 2007 (PST) | ||
*****The use of nameservers isn't enforced; it's optional (if disambiguation or further taxonomic intelligence is required). --[[User:CharlesRoper|Charles Roper]] 12:26, 8 Jan 2007 (PST) | |||
******Agreed [[User:CyndyParr|CyndyParr]] 10:20, 10 Jan 2007 (PST) | |||
(I'm either in agreement with your other points, or ambivalent.) | (I'm either in agreement with your other points, or ambivalent.) | ||
Line 483: | Line 206: | ||
[[User:AndyMabbett|Andy Mabbett]] 11:06, 5 Jan 2007 (PST) | [[User:AndyMabbett|Andy Mabbett]] 11:06, 5 Jan 2007 (PST) | ||
*I am now! [[User:CyndyParr|CyndyParr]] 10:20, 10 Jan 2007 (PST) | |||
==Pengo== | |||
Unfortunately scientific names seem to change as often as common names. I have some examples and use cases this microformat needs to address, around the problems of ambiguity: | |||
Ambiguity 1. Ambiguous scientific names.. ''[http://en.wikipedia.org/wiki/Sousa_chinensis Sousa chinensis]'' may either refer to '''Chinese White Dolphin''' (also known as ''Sousa chinensis chinensis'') or Humpback dolphin, also known as '''''Sousa''''' (genus) which includes up to five species or subspecies of dolphin including the Chinese White Dolphin. I don't care whether the Chinese White Dolphin is a species or subspecies, but the microformat needs to allow the user to be specific about which system is being addressed. | |||
Ambiguity 2. Another example is the [http://en.wikipedia.org/wiki/Orangutan Orangutan]... or Orangutans. Organutans were once believed to be a single species, but are now considered two separate species. The problem is that the new scientific name for just the Bornean species (''Pongo pygmaeus'') is the same as the old scientific name which encompassed both species (''Pongo pygmaeus''). Meanwhile the new scientific name for the Sumatran Orangutan (''Pongo abelii'') is always unambiguous. | |||
Ambiguity 3. ''[http://en.wikipedia.org/wiki/Doronomyrmex_pocahontas Doronomyrmex pocahontas]'' is an ant species that probably doesn't belong in the genus ''Doronomyrmex'', but rather ''Leptothorax''. But, until a full taxonomic study of the known species of ''Doronomyrmex'' and ''Leptothorax'' is carried out, it will stay there. Meanwhile the the term "''Leptothorax'' ([http://en.wiktionary.org/wiki/sensu_stricto sensu stricto])" is used to mean "in the sense of the original author". | |||
Use cases: | |||
So how do we: | |||
# tag species in new documents, where we are using the most current nomenclature in the tags, to indicate that we don't mean the old nomenclature | |||
# tag species allowing for new nomenclature to arise which may obsolete what we're using | |||
# tag species in old documents, where we have updated the nomenclature in the tag, but the taxt may be referring to the old nomenclature, and we want to indicate that the updated nomenclature is being used. | |||
# tag species in [others'] documents that are tagged automatically and where the specific nomenclature being used is unknown or ambiguous | |||
# address issues where competing nomenclatures exist side-by-side, or transition periods | |||
# tag species that have some clues as to which nomenclature is being used, e.g. the date of publication, and the author. | |||
# tag a taxon which is now considered paraphyletic | |||
# decide what's out of the scope of this microformat | |||
Brainstorm solutions: | |||
* Allow an "old-synonym" field, which strictly lists the previous name of the species (and never a newer name). So, e.g. | |||
<pre><nowiki><span species="Pongo pygmaeus" old-synonym="Pongo pygmaeus pygmaeus">Bornean Orangutan</span></nowiki></pre> | |||
* Allow the English common name to be included, when it clears ambiguity. E.g. the "Chinese White Dolphin" has always been called that, regardless of whether it was considered a species or subspecies. | |||
* Allow a taxonomy-year field for what year the taxonomy used in the tag comes from. | |||
* Use a UID as described by others above. | |||
* Have an ad-hoc "disambiguation" field which could include anything to disambiguate, such as years, synonyms, "sensu stricto", common names, authors (i.e. "in the sense of this author") etc. What goes in it for a particular taxon will develop from usage. | |||
* Have a taxonomy-uncertain="true" field to indicate it has been (for example) automatically tagged and may not be accurate, so that other suggestions can be given by 3rd party software. | |||
Basically I don't synonyms are necessary unless they are to show that the species was previously called something else, which may help to give a more exact meaning. | |||
Comments? Are there already existing solutions to this problem in the real world? | |||
[[User:PeNGo|Pengo]] 19:49, 28 Jan 2007 (PST) | |||
===Response to Pengo by Andy Mabbett=== | |||
Thank you for your expert contribution. Of your proposed solutions, the common (or vernacular) name, UID and author/ year are already in the current proposal. It may be sensible to have a "synonym" property (as used on http://en.wikipedia.org/wiki/Doronomyrmex_pocahontas), but I don't think "old-synonym" is particularly well named. Perhaps, if it's needed at all, "formerly" would be better? It is worth remembering, though, that the microformat is meant for labelling what people '''''already''''' publish and, for instance, http://en.wikipedia.org/wiki/Bornean_Orangutan refers to ''Pongo pygmaeus'', not any previous name. [[User:AndyMabbett|Andy Mabbett]] 02:20, 30 Jan 2007 (PST) | |||
* Then most effective way to disambiguate is to use a UID. I feel other solutions would over-complicate the specification. I think we need to put some effort into populating the examples regrouped page with specific examples of publishing styles, e.g., plain binomials, plain common names, scientific names with common names, scientific names with synonyms, binomial with subspecies, etc. [[User:CharlesRoper|Charles Roper]] 05:25, 2 Feb 2007 (PST) | |||
**There are very few exmples of UIDs being published in-the-wild. [[User:AndyMabbett|Andy Mabbett]] 06:00, 2 Feb 2007 (PST) | |||
***I found a good source of synonym usage; see the Coleopterist's Checklist of Beetles of the British Isles: http://www.coleopterist.org.uk/checklist.htm. Look for the indented specific epithet names, e.g. in the family [http://www.coleopterist.org.uk/haliplidae-list.htm HALIPLIDAE], ''pallens'' Fowler, 1887 and ''halberti'' Bullock, 1928 are examples of synonyms, with the favoured specific epithet being ''confinis'' Stephens, 1828. On a more general note, checklists such as this are ripe for microformatting and are an excellent example of common markup practice. The species microformat could be used to great effect with content such as this, creating minable dictionaries of species names which are, in turn, essential tools in for use in biodiversity informatics. [[User:CharlesRoper|Charles Roper]] 14:43, 24 Feb 2007 (PST) | |||
***Re. UIDs: yes, we have an interesting chicken & egg situation here. Without a reliable way to publish a UID (other than making them human readable text, which is undesirable) how are we supposed to be able to make use of them? A microformat would be a good means with which to deploy UIDs, but it is frowned upon to implement a pattern that isn't already being practised. Judging by the messages here, on the discussion list and elsewhere, there is clearly a desire for linking taxon names with UIDs, particularly LSIDs, which look set to become the standard UID for taxonomic naming. [[User:CharlesRoper|Charles Roper]] 10:41, 2 Feb 2007 (PST) | |||
****Well, the proposal allows for the inclusion of UIDs (as with all the suggested attributes, some work on the exact format might need to be done), should people to chose to publish them; whether or not they do is not something for uFs to push for. [[User:AndyMabbett|Andy Mabbett]] 13:13, 2 Feb 2007 (PST) | |||
==Charles Roper== | |||
===Synonyms=== | |||
I found an interesting [http://collections2.eeb.uconn.edu/collections/insects/CTBnew/duodecimguttata.html example of synonym usage] in the [http://collections2.eeb.uconn.edu/collections/insects/CTBnew/checklist.htm Tiger Beetles of Connecticut checklist]. In the particular example cited, the synonyms refer to, or are associated with, the species name - ''Cicindela duodecimguttata'' Dejean 1825. Synonyms are often mentioned alongside or near preferred scientific names; how should we tie them together, especially when, as in this case, the name and the synonym are not positioned close to one another, but are still clearly associated? As a segue to this question, how should multiple synonymous common names be represented? How about common names in different languages? For example, the [http://names.ubio.org/browser/details.php?names=on&authors=on&sci=on&vern=on&namebankID=2478269 Otter has many different common names]. | |||
:I take it you refer to the text which may be paraphrased (by omitting some prose) as: | |||
::'''''Cicindela duodecimguttata'' is known from 23 localities. ''Cicindela duodecimguttata'', once classified as a subspecies of ''C. repanda'', shares many traits with ''C. repanda''. Where ''C. duodecimguttata'' occurs, the more common ''C. repanda'' is usually found.''' | |||
::'''Synonomies: ''Cicindela proteus'' Kirby 1837:9. ''Cicindela bucolica'' Casey 1913:28. ''Cicindela hudsonica'' Casey 1916:29. ''Cicindela edmontonensis'' Carr 1920:21''' | |||
:The problem would seem to be that ''C. repanda'' is referred to both as a species in its own right, and as a past synonym of ''C. duodecimguttata''. If the whole thing is wrapped in one <code>div class="biota"</code>, allowing the other listed synonyms to be included, then how is ''C. repanda'' to be marked up as a species in its own right? | |||
:I would mark up the first occurrence of each, then use the include-pattern to "attach" the other listed synonyms with the former (I've only included one synonym in the following, for clarity): | |||
:<pre><nowiki> | |||
<span class="biota"> | |||
<span class="binominal">Cicindela duodecimguttata</span> | |||
<object class="include" data="#C-proteus"></object> | |||
</span> | |||
is known from 23 localities. Cicindela duodecimguttata, once classified as a subspecies of | |||
<span class="biota"> | |||
<span class="binominal">C. repanda</span> | |||
</span> | |||
, shares many traits with C. repanda. Where C. duodecimguttata occurs, the more common C. repanda is usually found. | |||
Synonomies: | |||
<span class="synonym" id="C-proteus"> | |||
<span class="binominal">Cicindela proteus</span> [or maybe "synonym-binominal" ?] | |||
<span class="authority">Kirby</span> | |||
<span class="year">1837</span>:9.</span> | |||
Cicindela bucolica Casey 1913:28. Cicindela hudsonica Casey 1916:29. Cicindela edmontonensis Carr 1920:21 | |||
</nowiki></pre> | |||
:I might then use the its entry on the "shares many traits" line to mark up ''C. repanda'' as an synonym, and include it in the same way. | |||
:Multiple and foreign-language common names would be catered for by allowing the common name attribute to be "0 or many" (the first such occurrence having precedence), and using a <code>lang</code> attribte where appropraite. | |||
:[[User:AndyMabbett|Andy Mabbett]] 14:42, 28 Feb 2007 (PST) | |||
==Ryan Kaldari== | |||
1. The name of this microformat needs to be changed ASAP. Calling it "species" is confusing and misleading. There was even resistance to implementing this microformat in Wikipedia solely because of the confusing name.[http://en.wikipedia.org/wiki/Template_talk:Taxobox#Hidden_microformat_category] | |||
2. Synonyms should be added as a node to the format. | |||
3. I wouldn't worry too much about accommodating LSIDs specifically, as it seems to be a rapidly dying format. Just concentrate on accommodating GUIDs in general (of any format) and identifying them as such. Also, keep in mind that many objects are going to have more than one GUID (even though this is discouraged), so we should be able to accommodate this. | |||
== Other use cases == | |||
Please add your suggestions! | |||
'Species' microformats could be used to: | |||
*... | |||
==See also== | ==See also== | ||
{{species}} | {{species}} |
Latest revision as of 17:17, 28 April 2021
Species Brainstorming
- Note: the original name of the proposed microformat, "species", is likely to change, probably to "biota" or "taxon". The former has been retained here, to avoid having to make many repetitive and perhaps redundant edits
- updated! The Operator extension now detects Species. A test page is available. Work on both continues!
Andy Mabbett
Proposal
There should, I believe, be a "species" microformat for the markup of plant and animal names, to include their scientific names. Consider:
<abbr class="species" title="Anas platyrhynchos">Mallard</abbr>
or
<span class="species">Anas platyrhynchos</span>
The microformat would allow user agents to be configured to perform look-ups on on-line databases of species, according to user preferences. Specification of the taxonomic class would help user agents to know which such databases were applicable (i.e., use database A for plants, but database B for mammals and database C for insects.)
It would also allow for more specific searching (do I mean "crow" or do I mean "Corvus corone"?).
The specification should encourage, but not mandate, the correct capitalisation of scientific names, so "Anas platyrhynchos'" not "anas platyrhynchos" nor (except historically) "Anas Platyrhynchos". A reminder that such names should be styled with italics will also be included.
Straw man proposal
See : species-strawman-01
Bill Hull
My website has 17000+ photos of 4700+ bird species. There are also a handful of butterflies (organized very poorly as I am unaware of any published butterfly world taxonomies) and shortly will have a number of dragon/damselflies. The site is made up of static pages but is built from a database so it is easy for me to add it new HTML tags to the pages. If you are interested in some prototyping at some point I can probably build stuff into the pages. - Bill Hull
Roger Hyam
Taxonomic Databases Working Group
TDWG is the organisation for standardisation in exchange of biodiversity data. The organisation has now (November 2007) undergone some re-organization. It has a new collaborative development environment, standards process, standards architecture and it has formed alliances with major organizations in the domains of geospatial and ecological data.
Central to the TDWG standards architecture are the LSID vocabularies. The role of these vocabularies is to define URIs for the nuts-and-bolts concepts that occur in the biodiversity informatics domain. See a description of what the TDWG ontology is for details. Although the vocabularies are defined in OWL the intention is for their URIs to be used as namespaces across different XML and non-XML based technologies. They can act as a central mapping point for those hard pressed developers who want to combine data presented to them in many formats.
The species microformats that are proposed here are a good thing. The only danger is that they re-define any of the central terms defined in the TDWG vocabularies. If they do that then they are creating another language instead of extending HTML to embrace existing semantics - which I don't think is their intent. It would be nice to have the data in web pages in a form that can be combined with the hundreds of millions of records marked up with the TDWG URIs.
If there is enough belief in the need for a Species Microformat why not propose a TDWG Applicability Statement and take it through a peer review process. The TDWG process is quite simple and free (unless you count blood, sweat and tears). You would need to form a Task Group with a charter saying what you intended to do. As convener of the TAG Interest Group I would willingly host the Task Group. You could then propose a standard and have it reviewed by a range of biologists and IT people before it becomes ratified and recommended for adoption. RogerHyam 2007-11-5
Malcolm Storey
(extracted from e-mails to Andy Mabbett, by kind permission)
- "Hopefully I'll have more time for things like this in the New Year, but expect it all be done and dusted by then!!" - Malcolm Storey, BioImages
ICZN, ICBN et al
You don't cover the full set of levels of taxonomic hierarchy defined by the relevant body ICZN or ICBN (plus the others - one each for garden plant varieties, bacteria, viruses. Don't know about mycoplasmas, diseases, BSE factors etc.
AIUI ICBN only goes down to species.
ICZN isn't so easy: [3]
- 1.2.2. The Code regulates the names of taxa in the family group, genus group, and species group. Articles 1-4, 7-10, 11.1-11.3, 14, 27, 28 and 32.5.2.5 also regulate names of taxa at ranks above the family group. (But none of the above articles list the taxonomic ranks.)
ICZN Only goes down to subspecies (art 1.3.4)
Note also:
- 1.4. Independence. Zoological nomenclature is independent of other systems of nomenclature in that the name of an animal taxon is not to be rejected merely because it is identical with the name of a taxon that is not animal (see Article 1.1.1)
(eg Trichia, Oenanthe, Melanotus)
Myxomycetes are the exception - they're in kingdom protozoa which falls under ICZN but they fall under the ICBN name space. (Hence "Trichia").
DNA
You may want to consider refs to DNA sequences. They're not part of taxonomy, but they can be considered the bottom rung of the taxonomic hierarchy and they will be of increasing significance.
Typography
what about Adalia 2-punctata, and Adalia bipunctata (not to mention those with hyphens [or apostrophes] which may get left out. And what about accented characters)?
- Adalia 2-punctata is an abbreviation of Adalia bipunctata, so:
<abbr class="binominal" title="Adalia bipunctata">Adalia 2-punctata</abbr>
AndyMabbett 09:55, 21 Oct 2006 (PDT)
Gaps
The hierarchy is not always fully populated. Not every species belongs to a class. Maybe this was where fungi are different. In Paul Kirk's databases (which are the official ones used to drive the checklists and NBN) he has fixed fields for the higher level taxa which means that only certain ranks can be used. The blanks he fills in (mostly!!) with "insertae sedis" (think it's Latin for "unknown seat"). In my database I use a self-join which gives much more flexibility. Anyway there are lots of "insertae sedis" in Paul's database!
Homonyms
Apion carduorum sensu Morris 1990 is Apion gibbirostre (Gyllenhal, 1813). Apion carduorum Kirby, 1808 is a different species.
- You'd mark the former up as something like
<abbr class="binominal" title="Apion gibbirostre">''Apion carduorum'' sensu Morris 1990</abbr>
- AndyMabbett 12:21, 5 Oct 2006 (PDT)
Citations for authorites
If people are citing the authority in full they would include the literature reference, not just the date e.g.
- Cuphophyllus niveus (Scop.) Bon, Doc. Mycol. 14(56): 11 (1985)[1984]
- Perhaps we should allow for the inclusion of an hCitation? Andy Mabbett 15:08, 28 Feb 2007 (PST)
Hyppo
Nomenclatural challenge
You asked for comments. One challenge I see is the difference in Nomenclature for Animalia and Plantae (coming from the old 2 kingdom system). For Plantae the International Code of Botanical Nomenclature[4] is used and for Animalia the code from http://www.iczn.org/. Animalia code is not officially accepted but ICZN tries to be authoritive starting from 2008.
The two different nomenclatural systems differ in a few areas, and they affect markup.
- Subgenus (Plantae): Dendroceros subg. Apoceros
- Subgenus (Animalia): Sula (Morus)
- Subspecies (Plantae): Begonia grandis ssp. evansiana
- Subspecies (Animalia): Gorilla beringei graueri
- --Hyppo 14:23, 9 Oct 2006 (PDT)
- I would mark those up as:
<span class=genus">Dendroceros</span> subg. <span class="subgenus">Apoceros</span>
<span class=genus">Sula</span> <span class="subgenus">Morus</span>
<span class="binominal">Begonia grandis</span> ssp. <span class="subspecies">evansiana''</span>
<span class="binominal">Gorilla beringei</span> <span class="subspecies">graueri</span>
- With wrapping class="biota" and possibly kingdom, attributes.
- AndyMabbett 11:37, 10 Oct 2006 (PDT)
Cyndy Parr
The ideas expressed here are promising. Below are my comments on all the preceding -- as I have time I'll organize, elaborate, and try to move parts into the right discussion threads above.
In the Spire project we have been developing ontologies in OWL for taxonomic names and hierarchies. Ideally, we'd like to have a microformat where people can tag a scientific name and an application can then check an ontology of their choice for more information (richer semantics).
We would discourage full expression of the Linnaean hierarchy except for those who are maintaining such classifications (such as uBio). The rest of the hierarchy can be retrieved ontologically as necessary.
Better to tie the scientific name (taxon name) to the authority or ontology from which it came. I.e. for those who are able to provide information on taxonomic concepts, support for TCS (Taxonomic Concept Schema) fields would be important.
I prefer "taxon" or "taxon-name" or TaxonName over biota (which is plural, and too close to biotic which has a far larger scope than taxa). Would prefer "binomial" to "binominal"
- I also favour "taxon" over "biota" simply because it the more commonly used term. I also prefer "binomial". I did a quick straw poll of various experts and all favoured binomial. Neither is technically incorrect, but binomial is more commonly used. Indeed, a Google search for binomial returns 6,580,000 results while binominal returns 342,000 and a "did you mean: binomial" prompt. --Charles Roper 04:12, 9 Jan 2007 (PST)
- This binomial vs. binominal Google battle seems even more conclusive. Andy Mabbett 06:17, 9 Jan 2007 (PST)
"class" is difficult not only because of the confusion with the programming concept of classes, but because it is a taxonomic rank. However, most of us have figured out the difference by now so this is not critical.
"cname" should be "comname" or "common-name" or "vernacular" to make it more obvious what the information is. A sub-component would be the language for which that common name is used ( something like an HTML attribute lang="en")
- I also favour "common-name" or "vernacular" --Charles Roper 04:12, 9 Jan 2007 (PST)
There are known conflicts between names across kingdoms (as current codes of nomenclature allow these). Thus specification of kingdom may be encouraged. Disambiguation could be handled by applications outside the microformats (this could be difficult), or they could be dealt with in the core microformat: e.g. plant-taxon or fungal-taxon or animal-taxon.
A sightings microformat is a good idea and I would be interested in being involved in that. We've been toying with this in OWL and also using structured blogging over at http://fieldmarking.reger.com
Your terms such as gender (better: sex), age bracket (better: life stage), count, type (better: depending on the meaning, caste or morph) all belong in a specimen or sighting microformat and used in combination with the taxon microformat, not be part of it.
Response by Andy Mabbett
Thank you very much for your detailed contribution. I have a few responses:
- We would discourage full expression of the Linnaean hierarchy except for those who are maintaining such classifications (such as uBio).
- Why? Also, I'm not aware of any microformat which is restricted to a subset of users, nor how this would be done. How would you suggest that someone mark up this: "Not all of the Passeriformes sing"?
- I would prefer express this something like so:
- Why? Also, I'm not aware of any microformat which is restricted to a subset of users, nor how this would be done. How would you suggest that someone mark up this: "Not all of the Passeriformes sing"?
<span class="taxon lsidres:urn:lsid:ubio.org:namebank:21833"> <i class="sci-name">Passeriformes</i> </span>
Or, to simplify further:
<i class="taxon sci-name lsidres:urn:lsid:ubio.org:namebank:21833">Passeriformes</i>
Or, at the simplest level:
<i class="taxon">Passeriformes</i>
Simply marking up the word as a taxon would lighten the load of any parser, making its job much simpler. --Charles Roper 10:50, 8 Jan 2007 (PST)
- Your first example requires the author of that page to find LSID, even assuming that they know such a thing exists. How is that "paving the cowpaths"? Your latter example removes semantic detail which is included in the straw-man proposal. It is akin to removing all the children of "adr" in hCard. I think your parser-load issue is a red herring. Andy Mabbett 11:07, 8 Jan 2007 (PST)
- I would argue that finding and using an LSID would not be a difficult task for any author who is using a microformat. I don't see how it is any more difficult - in fact I see it as being easier - than manually marking up ranks. Why is parser-load a red herring? --Charles Roper 12:26, 8 Jan 2007 (PST)
- Nice example (having done my doctoral work on a Passerine that may or may not be singing...). Absolutely I'd recommending marking up "Passeriformes" but no need to go on to specify "Aves." I'm still grokking microformats so I don't think we've got a conflict. CyndyParr 10:20, 10 Jan 2007 (PST)
- Aves is available for use, but not required, so indeed, we don't have conflict ;-) Andy Mabbett 10:42, 10 Jan 2007 (PST)
- Your first example requires the author of that page to find LSID, even assuming that they know such a thing exists. How is that "paving the cowpaths"? Your latter example removes semantic detail which is included in the straw-man proposal. It is akin to removing all the children of "adr" in hCard. I think your parser-load issue is a red herring. Andy Mabbett 11:07, 8 Jan 2007 (PST)
- The rest of the hierarchy can be retrieved ontologically as necessary.
- That's a use-case once the uF is published, certainly. the proposal doesn't require that the hierarchy be marked-up, it merely allows for it, in cases where it is already published.
- I've yet to see any consistent examples of a hierarchy being marked-up using class names resembling those found in the proposal. A microformat is supposed take (and perhaps tweak, or clean up) mark-up practises that are already in use, not invent new ones. In other words, microformats should pave the cowpaths. While allowing for the marking-up of the hierarchy is fair enough (I understand the reasons for wanting that option), I believe the vast majority of authors do not need that facility, or (from my own experience) do not have time or energy to make use of anything more complex than simply marking-up a piece of text as a taxonomic name. In its current state, I don't believe the current species microformat proposal fulfils any of the "philosophy of microformats" points raised in this article. I believe the added complexity acts as a disincentive potential users and is also clearly confusing. With taxonomic intelligence (hierarchies, synonymy, etc) being available from elsewhere (e.g. uBio), why have it embedded in the microformat? What examples of this kind of usage are there and what leads you to believe authors will use it, if it's available? rel-license is an example of a microformat that is simple and holds intelligence elsewhere. I believe simplicity is the key to a successful species microformat. --Charles Roper 10:50, 8 Jan 2007 (PST)
- I've yet to see any consistent examples of a hierarchy being marked-up using class names resembling those found in the proposal. Perhaps not but, unlike other uFs, in taxonomy there exist clearly defined standards for the names of the components of taxonomic names. This is akin to the pre-existing class names from vCard, as used in hCard.
- Not so: vCard is widely used standard already and thus it was a natural progression to develop hCard. There is no software based vCard equivalent of the taxonomic hierarchy in common use that I am aware of.
- A microformat is supposed take (and perhaps tweak, or clean up) mark-up practises that are already in use Taxonomic classes 'are ' already in use.Andy Mabbett
- My concern still stands that there is no consistent mark-up usage that I can find.
- I've yet to see any consistent examples of a hierarchy being marked-up using class names resembling those found in the proposal. Perhaps not but, unlike other uFs, in taxonomy there exist clearly defined standards for the names of the components of taxonomic names. This is akin to the pre-existing class names from vCard, as used in hCard.
- Fair enough CyndyParr 10:20, 10 Jan 2007 (PST)
- I believe the vast majority of authors [...] do not have time or energy to make use of anything more complex than simply marking-up a piece of text as a taxonomic name' and - as has been pointed out previously, they will be able to do the latter, and nobody will force them to do the former. Why should they not, though, be able to do the latter should they wish?
- As I say, I find the concept of allowing the full suite of ranks to be fair - I understand your desire to have them in there. I just feel that the complexity they add to the specification will put off authors and confuse them. I also maintain that very few authors will make use of this extra complexity. Should we have some sort of poll to try and determine how many people would be able to make use of the full proposal? I'm not totally against having all of the ranks in the Species microformat, I've just yet to be convinced they are necessary or conducive to adoption of the standard. --Charles Roper 12:26, 8 Jan 2007 (PST)
- What examples of this kind of usage are there Those on species-examples, e.g. Wikipedia.
- I've yet to find any consistent mark-up usage.--Charles Roper 12:26, 8 Jan 2007 (PST)
- rel-license is an example of a microformat that is simple and holds intelligence elsewhere'' It holds no intelligence elsewhere, which was not already on the pre-microformatting page.Andy Mabbett 11:07, 8 Jan 2007 (PST)
- The license on the end of the rel-license link is the intelligence. To look at it from a different angle, why not embed the license information within class attributes? Why not have a full license microformat, just in case some author needs it? Rel-license as it stands serves the needs of most authors most of the time, which is a fundamental philosophy of microformats.
- I believe the vast majority of authors [...] do not have time or energy to make use of anything more complex than simply marking-up a piece of text as a taxonomic name' and - as has been pointed out previously, they will be able to do the latter, and nobody will force them to do the former. Why should they not, though, be able to do the latter should they wish?
- I've yet to see any consistent examples of a hierarchy being marked-up using class names resembling those found in the proposal. A microformat is supposed take (and perhaps tweak, or clean up) mark-up practises that are already in use, not invent new ones. In other words, microformats should pave the cowpaths. While allowing for the marking-up of the hierarchy is fair enough (I understand the reasons for wanting that option), I believe the vast majority of authors do not need that facility, or (from my own experience) do not have time or energy to make use of anything more complex than simply marking-up a piece of text as a taxonomic name. In its current state, I don't believe the current species microformat proposal fulfils any of the "philosophy of microformats" points raised in this article. I believe the added complexity acts as a disincentive potential users and is also clearly confusing. With taxonomic intelligence (hierarchies, synonymy, etc) being available from elsewhere (e.g. uBio), why have it embedded in the microformat? What examples of this kind of usage are there and what leads you to believe authors will use it, if it's available? rel-license is an example of a microformat that is simple and holds intelligence elsewhere. I believe simplicity is the key to a successful species microformat. --Charles Roper 10:50, 8 Jan 2007 (PST)
- That's a use-case once the uF is published, certainly. the proposal doesn't require that the hierarchy be marked-up, it merely allows for it, in cases where it is already published.
- Better to tie the scientific name (taxon name) to the authority or ontology from which it came.
- That would require the publisher to add extra data, which they might not wish to publish, nor, indeed, have to hand. Microformats are about recognising what data is already published and then enabling people to add semantics which identify the type of data on their pages.
- I'm just suggesting support for such authority or ontology for those of us who think it important CyndyParr 10:20, 10 Jan 2007 (PST)
- Again, the option to do so is in the current proposal. Andy Mabbett 10:42, 10 Jan 2007 (PST)
- I'm just suggesting support for such authority or ontology for those of us who think it important CyndyParr 10:20, 10 Jan 2007 (PST)
- That would require the publisher to add extra data, which they might not wish to publish, nor, indeed, have to hand. Microformats are about recognising what data is already published and then enabling people to add semantics which identify the type of data on their pages.
- [common names] A sub-component would be the language for which that common name is used (something like an HTML attribute lang="en")
- Indeed, but that's already available, and (on properly constructed pages) should already be on the parent container. Andy Mabbett
- Fair enough CyndyParr 10:20, 10 Jan 2007 (PST)
- Indeed, but that's already available, and (on properly constructed pages) should already be on the parent container. Andy Mabbett
- conflicts between names across kingdoms (as current codes of nomenclature allow these). Thus specification of kingdom may be encouraged.
- already in the proposal! Andy Mabbett
- but perhaps the proposal could be more explicit about the importance of kingdom given its important role in disambiguating species names (using a name of any other rank is less desirable given instability and required application overhead). I realize that I'm going beyond the microformat itself to "best practices" but please forgive me; I've been wrangling with taxonomic databases for a long time. CyndyParr 10:20, 10 Jan 2007 (PST)
- already in the proposal! Andy Mabbett
- Disambiguation could be handled by applications outside the microformats
- Not sure what you mean here, since all parsing is done "outside microformats". Andy Mabbett
- Thanks for the clarification CyndyParr 10:20, 10 Jan 2007 (PST)
- Another reason to make use of nameservers, rather than embedding the information within the microformat. --Charles Roper 10:50, 8 Jan 2007 (PST)
- And how is enforcing the use of nameservers "paving the cowpaths"? Andy Mabbett 11:07, 8 Jan 2007 (PST)
- The use of nameservers isn't enforced; it's optional (if disambiguation or further taxonomic intelligence is required). --Charles Roper 12:26, 8 Jan 2007 (PST)
- Agreed CyndyParr 10:20, 10 Jan 2007 (PST)
- The use of nameservers isn't enforced; it's optional (if disambiguation or further taxonomic intelligence is required). --Charles Roper 12:26, 8 Jan 2007 (PST)
- And how is enforcing the use of nameservers "paving the cowpaths"? Andy Mabbett 11:07, 8 Jan 2007 (PST)
- Not sure what you mean here, since all parsing is done "outside microformats". Andy Mabbett
(I'm either in agreement with your other points, or ambivalent.)
Thank you again - do stick around. Are you on the mailing list?
Andy Mabbett 11:06, 5 Jan 2007 (PST)
- I am now! CyndyParr 10:20, 10 Jan 2007 (PST)
Pengo
Unfortunately scientific names seem to change as often as common names. I have some examples and use cases this microformat needs to address, around the problems of ambiguity:
Ambiguity 1. Ambiguous scientific names.. Sousa chinensis may either refer to Chinese White Dolphin (also known as Sousa chinensis chinensis) or Humpback dolphin, also known as Sousa (genus) which includes up to five species or subspecies of dolphin including the Chinese White Dolphin. I don't care whether the Chinese White Dolphin is a species or subspecies, but the microformat needs to allow the user to be specific about which system is being addressed.
Ambiguity 2. Another example is the Orangutan... or Orangutans. Organutans were once believed to be a single species, but are now considered two separate species. The problem is that the new scientific name for just the Bornean species (Pongo pygmaeus) is the same as the old scientific name which encompassed both species (Pongo pygmaeus). Meanwhile the new scientific name for the Sumatran Orangutan (Pongo abelii) is always unambiguous.
Ambiguity 3. Doronomyrmex pocahontas is an ant species that probably doesn't belong in the genus Doronomyrmex, but rather Leptothorax. But, until a full taxonomic study of the known species of Doronomyrmex and Leptothorax is carried out, it will stay there. Meanwhile the the term "Leptothorax (sensu stricto)" is used to mean "in the sense of the original author".
Use cases: So how do we:
- tag species in new documents, where we are using the most current nomenclature in the tags, to indicate that we don't mean the old nomenclature
- tag species allowing for new nomenclature to arise which may obsolete what we're using
- tag species in old documents, where we have updated the nomenclature in the tag, but the taxt may be referring to the old nomenclature, and we want to indicate that the updated nomenclature is being used.
- tag species in [others'] documents that are tagged automatically and where the specific nomenclature being used is unknown or ambiguous
- address issues where competing nomenclatures exist side-by-side, or transition periods
- tag species that have some clues as to which nomenclature is being used, e.g. the date of publication, and the author.
- tag a taxon which is now considered paraphyletic
- decide what's out of the scope of this microformat
Brainstorm solutions:
- Allow an "old-synonym" field, which strictly lists the previous name of the species (and never a newer name). So, e.g.
<span species="Pongo pygmaeus" old-synonym="Pongo pygmaeus pygmaeus">Bornean Orangutan</span>
- Allow the English common name to be included, when it clears ambiguity. E.g. the "Chinese White Dolphin" has always been called that, regardless of whether it was considered a species or subspecies.
- Allow a taxonomy-year field for what year the taxonomy used in the tag comes from.
- Use a UID as described by others above.
- Have an ad-hoc "disambiguation" field which could include anything to disambiguate, such as years, synonyms, "sensu stricto", common names, authors (i.e. "in the sense of this author") etc. What goes in it for a particular taxon will develop from usage.
- Have a taxonomy-uncertain="true" field to indicate it has been (for example) automatically tagged and may not be accurate, so that other suggestions can be given by 3rd party software.
Basically I don't synonyms are necessary unless they are to show that the species was previously called something else, which may help to give a more exact meaning.
Comments? Are there already existing solutions to this problem in the real world? Pengo 19:49, 28 Jan 2007 (PST)
Response to Pengo by Andy Mabbett
Thank you for your expert contribution. Of your proposed solutions, the common (or vernacular) name, UID and author/ year are already in the current proposal. It may be sensible to have a "synonym" property (as used on http://en.wikipedia.org/wiki/Doronomyrmex_pocahontas), but I don't think "old-synonym" is particularly well named. Perhaps, if it's needed at all, "formerly" would be better? It is worth remembering, though, that the microformat is meant for labelling what people already publish and, for instance, http://en.wikipedia.org/wiki/Bornean_Orangutan refers to Pongo pygmaeus, not any previous name. Andy Mabbett 02:20, 30 Jan 2007 (PST)
- Then most effective way to disambiguate is to use a UID. I feel other solutions would over-complicate the specification. I think we need to put some effort into populating the examples regrouped page with specific examples of publishing styles, e.g., plain binomials, plain common names, scientific names with common names, scientific names with synonyms, binomial with subspecies, etc. Charles Roper 05:25, 2 Feb 2007 (PST)
- There are very few exmples of UIDs being published in-the-wild. Andy Mabbett 06:00, 2 Feb 2007 (PST)
- I found a good source of synonym usage; see the Coleopterist's Checklist of Beetles of the British Isles: http://www.coleopterist.org.uk/checklist.htm. Look for the indented specific epithet names, e.g. in the family HALIPLIDAE, pallens Fowler, 1887 and halberti Bullock, 1928 are examples of synonyms, with the favoured specific epithet being confinis Stephens, 1828. On a more general note, checklists such as this are ripe for microformatting and are an excellent example of common markup practice. The species microformat could be used to great effect with content such as this, creating minable dictionaries of species names which are, in turn, essential tools in for use in biodiversity informatics. Charles Roper 14:43, 24 Feb 2007 (PST)
- Re. UIDs: yes, we have an interesting chicken & egg situation here. Without a reliable way to publish a UID (other than making them human readable text, which is undesirable) how are we supposed to be able to make use of them? A microformat would be a good means with which to deploy UIDs, but it is frowned upon to implement a pattern that isn't already being practised. Judging by the messages here, on the discussion list and elsewhere, there is clearly a desire for linking taxon names with UIDs, particularly LSIDs, which look set to become the standard UID for taxonomic naming. Charles Roper 10:41, 2 Feb 2007 (PST)
- Well, the proposal allows for the inclusion of UIDs (as with all the suggested attributes, some work on the exact format might need to be done), should people to chose to publish them; whether or not they do is not something for uFs to push for. Andy Mabbett 13:13, 2 Feb 2007 (PST)
- There are very few exmples of UIDs being published in-the-wild. Andy Mabbett 06:00, 2 Feb 2007 (PST)
Charles Roper
Synonyms
I found an interesting example of synonym usage in the Tiger Beetles of Connecticut checklist. In the particular example cited, the synonyms refer to, or are associated with, the species name - Cicindela duodecimguttata Dejean 1825. Synonyms are often mentioned alongside or near preferred scientific names; how should we tie them together, especially when, as in this case, the name and the synonym are not positioned close to one another, but are still clearly associated? As a segue to this question, how should multiple synonymous common names be represented? How about common names in different languages? For example, the Otter has many different common names.
- I take it you refer to the text which may be paraphrased (by omitting some prose) as:
- Cicindela duodecimguttata is known from 23 localities. Cicindela duodecimguttata, once classified as a subspecies of C. repanda, shares many traits with C. repanda. Where C. duodecimguttata occurs, the more common C. repanda is usually found.
- Synonomies: Cicindela proteus Kirby 1837:9. Cicindela bucolica Casey 1913:28. Cicindela hudsonica Casey 1916:29. Cicindela edmontonensis Carr 1920:21
- The problem would seem to be that C. repanda is referred to both as a species in its own right, and as a past synonym of C. duodecimguttata. If the whole thing is wrapped in one
div class="biota"
, allowing the other listed synonyms to be included, then how is C. repanda to be marked up as a species in its own right?
- I would mark up the first occurrence of each, then use the include-pattern to "attach" the other listed synonyms with the former (I've only included one synonym in the following, for clarity):
<span class="biota">
<span class="binominal">Cicindela duodecimguttata</span> <object class="include" data="#C-proteus"></object>
</span>
is known from 23 localities. Cicindela duodecimguttata, once classified as a subspecies of
<span class="biota">
<span class="binominal">C. repanda</span>
</span>
, shares many traits with C. repanda. Where C. duodecimguttata occurs, the more common C. repanda is usually found.
Synonomies: <span class="synonym" id="C-proteus">
<span class="binominal">Cicindela proteus</span> [or maybe "synonym-binominal" ?] <span class="authority">Kirby</span> <span class="year">1837</span>:9.</span>
Cicindela bucolica Casey 1913:28. Cicindela hudsonica Casey 1916:29. Cicindela edmontonensis Carr 1920:21
- I might then use the its entry on the "shares many traits" line to mark up C. repanda as an synonym, and include it in the same way.
- Multiple and foreign-language common names would be catered for by allowing the common name attribute to be "0 or many" (the first such occurrence having precedence), and using a
lang
attribte where appropraite.
- Andy Mabbett 14:42, 28 Feb 2007 (PST)
Ryan Kaldari
1. The name of this microformat needs to be changed ASAP. Calling it "species" is confusing and misleading. There was even resistance to implementing this microformat in Wikipedia solely because of the confusing name.[5]
2. Synonyms should be added as a node to the format.
3. I wouldn't worry too much about accommodating LSIDs specifically, as it seems to be a rapidly dying format. Just concentrate on accommodating GUIDs in general (of any format) and identifying them as such. Also, keep in mind that many objects are going to have more than one GUID (even though this is discouraged), so we should be able to accommodate this.
Other use cases
Please add your suggestions!
'Species' microformats could be used to:
- ...
See also
- species
- examples
- quantitative evidence
- brainstorming (includes the straw man- or draft standard)