From Microformats Wiki
genealogy-formats /
Revision as of 21:15, 3 August 2007 by Bob Jonkman (talk | contribs) (GEDCOM _IS_ a format for genealogical data.)
Jump to navigation Jump to search

Genealogy Formats

I started this page because someone (Bob Jonkman apparently) added a bunch of stuff to the Technorati microformats page on genealogy, and I moved it here. -Tantek

In the wild

see: The Dring tree [1] for an interesting family tree website.

this family group is pretty much a direct translation of a gedcom FAM structure, but with some names added to the links. It also includes back links to parents.

an individual from the same tree This is basically an INDI record from GEDCOM.

problem statement

The main problem for geneaology on the web is that many people are posting their family trees, but if you were searching for your ancestors, there is no semantic in these pages which helps you link them to similar named individuals in your own tree. some sites like freeCEN and freeBMD have databases which can assist in this linkage, but they are incomplete and frustrating to use.

If there were some kind of order to this process, ordinary web searching might be used; and we could interlink family trees more readily.

RDF and the semantic web has been used to tackle this problem, but this doesnt help people that want to publish, or search published trees until there is a real semantic web.

What I think we need is some kind of microformat markup to add to examples like this tree of Abraham Lincoln.


GEDCOM has become pretty much the defacto standard for sharing data between geneaology systems. It is hierachical and link based, much like HTML; but it encodes family structure (which is a general graph) outside of this structural hierachy.

GEDCOM was developed (...) to provide a flexible, uniform format for exchanging computerized genealogical data.[2]

  • I'm not sure whether it makes sense to do GEDCOM as its own format, the FAM structure and the need to present different reports, suggest to me that we need some kind of post GEDCOM markup. To see how direct use of GEDCOM might pan out I hacked up this GEDCOM Worked example. To me the main issue seems to revolve around the FAM structure. I think the Jay Askren approach might be better thsn the Gene Stark work as a starting point.

  • Had a look at some examples of what GEDCOM creates [3]. Basically, seems to be XFN relationships (siblings, spouses etc.) and hCard information (could genealogy be inferred from existing XFNs regardless of a hGED format?). The only additional information we do not currently hold in a format is that of gender. GEDCOM specifies male or female for each individual. Creating something using these formats would be quite straightforward, but not sure its takeup would be good unless someone was interested in creating a hGEDCOM2GEDCOM. -- Frances Berriman
  • GEDCOM is basically a set of INDIvidual records, related by FAMily nodes the family nodes contain the HUSBand, WIFE and CHILd. The INDI records are quite similar and might be replaced by hCard records, but the graph structure is a little harder to capture; families arent strict trees, so a direct mapping to XML doesnt really work. Publishing a GEDCOM database directly to the web might not be the most logical thing to do.
  • Genealogical information has date-of-death, which is also missing in hCard format (although hCard does have date-of-birth). Much of genealogical information is event based: Date of birth, date of death, dates of marriages and divorces, and many other significant events such as religious observances (Baptisms, Bar/Bat Mitzvahs) and migrations ("Moved to Canada from the Netherlands"). This all translates wonderfully to hCalendar 1.0. Additionally, a properly researched family tree will cite sources for all the data listed, and so could use hCite. The biggest problem I see in using hCalendar is that genealogical data allows approximate dates, specifically "ABT 4 July 1776", "BEF 25 Dec 1903", "AFT 11 Nov 1918". It also also allows ambiguous dates, "July 1867" or just "1886", or even "4 July". And these in combination, (Approximately ambiguous dates? Ambiguously approximate dates?), eg. "BEF Feb 2007", "AFT 1945". The most ambiguous entries I've seen for dates are "DECEASED" when date-of-death is unknown, and "NOT MARRIED" for couples who have not had a wedding ceremony. (Info from Guidelines for event dates in the PAF Help File).
The only relationship links in GEDCOM are HUSBand, WIFE and CHILd. All other relationships (brother, sister, grandparents, grandchildren, uncles, aunts, nieces, nephews, cousins) can be inferred by traversing family records. This does mean that any collection of genealogical pages need some way to cross-reference to each other. This isn't a problem for all pages on a single Web site, which use RIN (Record Identifier) or REFN (User Reference Number). However, different Web pages maintained by different genealogists may have conflicting RINs and REFNs. There is a globally-unique AFN (Ancestral File Number) issued by the Church of Jesus Christ of Latter-Day Saints (LDS), but I don't know how they're issued and most genealogical sites don't use them anyway.
The GEDCOM format contains much other data specific to the LDS, but I don't know how widespread it is, nor how appropriate it would be to code it into a microformat intended to reach well beyond the LDS.
Regardless of whether an hGED microformat is developed, it would still be valuable to mark up genealogical information with microformats on Web pages for the semantic value.
Bob Jonkman 07:58, 9 Feb 2007 (PST)

Wikipedia's Persondata

Wikipedia's Persondata aligns very closely with hCard, but has additional date and place of birth & death fields. Andy Mabbett 13:04, 28 Jan 2007 (PST)

External Links

See also