Hoofman Genealogy Website
I created a website to document my family’s genealogy, accessible at hoofman.nl. The source data and website generator are available in the GitHub repository.
Technical Overview
The Hoofman genealogy website presents family history using structured genealogical data stored in GEDCOM format as source. A custom Python-based website generator processes this source data, incorporating additional metadata through YAML notes. The site is built using Astro, with PlantUML providing visual family relationships.
GEDCOM (GEnealogical Data Communication) is an open standard for storing and exchanging genealogical data. The hoofman website uses version 5.5.1 of the standard. YAML-formatted notes are used to store information that does not fit neatly within the GEDCOM model. For example,
- Marking individuals as private:
--- private: true
- Documenting timestamps:
--- timestamp: des morgens te tien ure
- Listing witnesses for events:
--- witnesses: - name: Pieter Hoofman xref_id: I00013 occupation: werkman age: 34 residence: Graauw - name: Jan Francies van Dosselaar occupation: werkman age: 39 residence: Graauw - name: Francies Arnold occupation: landbouwer age: 44 residence: Graauw
- Adding external source links:
--- links: - https://..... - https://..... - https://.....
Data Management
To create and manage genealogical data in GEDCOM format the open-source genealogy software Ancestris is used, which fully adheres to the GEDCOM standard. While Ancestris provides tools to export family websites, a custom website generator was developed for greater control over data presentation and to process the custom conventions.
Website Generator
The generator is written in Python, selected over Java and TypeScript to get more experience with the language.
GEDCOM Parsing
GEDCOM consists of two layers:
- Hierarchical Data Format – A general-purpose data representation structure for sequential data (similar to XML or JSON).
- Genealogical Structures – Nested records representing individuals, families, events, and source citations.
The first level GEDCOM Hierarchical Data Format is a simple tagged-line format:
Level [Xref] Tag [Value]
Level and Xref make it possible to express relationships between GEDCOM lines:
- Level: Indicates substructure relationships. A higher level represents a subrecord.
- Xref: A unique identifier (
@XXXX@
) linking records via cross-references.
Example:
0 @I00027@ INDI
1 NAME Maria Louisa /Hoofman/
1 SEX F
1 FAMC @F00005@
1 BIRT
2 DATE 29 JUN 1870
2 PLAC , Graauw, , , ,
2 ADDR Wijk B Nummer 160
Since existing GEDCOM Python libraries lacked recent updates (at least for version 5.5.1), I developed a custom parser.
The GedcomParser
class parses the first level GEDCOM hierarchical line format using a regular expression:
^\s*(\d+)\s+(@[^@]+@)?\s*([A-Za-z0-9_]+)\s?(@[^@]+@)?(.+)?$
This parser:
- Converts GEDCOM lines into
GedcomLine
objects. - Stores subrecord relationships within a
sublines
field of aGedcomLine
object with typelist[GedcomLine]
. - Builds a cross-reference dictionary (
id_map
) mapping xref IDs toGedcomLine
objects.
The second layer of GEDCOM structures is modeled in GedcomModel
, which organizes individuals, families, and events. For instance, GedcomModel
contains a list of Individual objects and provides a method, GedcomModel.get_individual
, to look up individuals by their ID. In this way, GedcomModel
and related classes (such as Individual
, Family
, EventDetail
, Source
, etc.) offer an API for interacting with the genealogical data from the original GEDCOM file. In addition to the basic information directly expressed in the GEDCOM format, such as “X is the father of Y,” more complex data queries can also be implemented. One example is the chronology of an individual, which includes all potentially influential events in that person’s life.
The following UML diagram represents the data model:
Website Generation
Each individual and source entry is converted into a Markdown file based on data extracted from GedcomModel
. These Markdown files serve as content for an Astro web project, which compiles them into a static website.
Visualizing Family Relations
Family tree diagrams are generated using PlantUML. The website generator creates PlantUML diagrams in plain text, encodes them into a URL, and embeds them in HTML via an <img>
tag:
<img src="https://www.plantuml.com/plantuml/svg/dPLlRzem4CRVvrESsXTz...">
This approach ensures flexibility in displaying relationships dynamically while keeping the site lightweight.