Usage¶
Currently ged4py supports parsing of existing GEDCOM files, there
is no support for (re-)generating GEDCOM data. The main interface for parsing
is ged4py.parser.GedcomReader class. To create parser instance
one has to pass file with GEDCOM data as a single required parameter, this
can be either file name of a Python file object. If file object is passed
then the file has to be open in a binary mode and it has to support
seek() and tell() methods. Example of instantiating a parser:
from ged4py import GedcomReader
path = "/path/to/file.gedcom"
with GedcomReader(path) as parser:
# GedcomReader provides context support
...
or using in-memory buffer as a file (could be useful for testing):
import io
from ged4py import GedcomReader
data = b"..." # make some binary date here
with io.BytesIO(data) as file:
parser = GedcomReader(file)
...
In most cases parser should be able to determine input file encoding from the file if data in the file follows GEDCOM specification. In other cases parser may need external help, if you know file encoding you can provide it as an argument to parser:
parser = GedcomReader(path, encoding="utf-8")
Any encoding supported by Python codecs module can be used as
an argument. In addition, this package registers two additional encodings
from the ansel package:
ansel |
American National Standard for Extended Latin Alphabet Coded Character Set for Bibliographic Use (ANSEL) |
gedcom |
GEDCOM extensions for ANSEL |
By default parser raises exception if it encounters errors while decoding
data in a file. To override this behavior one can specify different error
policy, following the same pattern as standard codecs.decode()
method, e.g.:
parser = GedcomReader(path, encoding="utf-8", errors='replace')
Main mode of operation for parser is iterating over records in a file in sequential manner. GEDCOM records are organized in hierarchical structures, and ged4py parser facilitates access to these hierarchies by grouping records in tree-like structures. Instead of providing iterator over every record in a file parser iterates over top-level (level 0) records, and for each level-0 record it returns record structure which includes nested records below level 0.
The main method of the parser is the method
records0() which returns iterator over all
level-0 records. Method takes an optional argument for a tag name, without
argument all level-0 records are returned by iterator (starting with “HEAD”
and ending with “TRLR”). If tag is given then only the records with
matching tag are returned:
with GedcomReader(path) as parser:
# iterate over all INDI records
for record in parser.records0("INDI"):
....
Records returned by iterator are instances of class
ged4py.model.Record or one of its few sub-classes. Each record
instance has a small set of attributes:
level- record level, 0 for top-level recordsxref_id- record reference ID, may beNonetag- record tag namevalue- record value, can beNone, string, or value of some other type depending on record typesub_records- list of subordinate records, direct sub-records of this record, it is easier to access items in this list using methods described below.
If, for example, GEDCOM file contains sequence of records like this:
0 @ID12345@ INDI
1 NAME John /Smith/
1 BIRT
2 DATE 1 JAN 1901
2 PLAC Some place
1 FAMC @ID45623@
1 FAMS @ID7612@
then the record object returned from iterator will have these attributes:
levelis 0 (true for all records returned byrecords0()),xref_id- “@ID12345@”,tag- “INDI”,value-None,sub_records- list ofRecordinstances corresponding to “NAME”, “BIRT”, “FAMC”, and “FAMS” tags (but not “DATE” or “PLAC”, records for these tags will be insub_recordsof “BIRT” record).
Record class has few convenience methods:
sub_tags()- return all direct subordinate records with a given tag name, list of records is returned, possibly empty.sub_tag()- return subordinate record with a given tag name (or tag “path”), if there is more than one record with matching tag then first one is returned, without matchNoneis returned.sub_tag_value()- return value of subordinate record with a given tag name (or tag “path”), orNoneif record is not found or its value isNone.
With the example records from above one can do record.sub_tag("BIRT/DATE")
on level-0 record to retrieve a Record instance
corresponding to level-2 “DATE” record, or alternatively use
record.sub_tag_value("BIRT/DATE") to retrieve the value attribute of
the same record.
There are few specialized sub-classes of Record
each corresponding to specific record tag:
NAME records generate
ged4py.model.NameRecinstances, this class knows how to split name representation into name components (first, last, maiden) and has attributes for accessing those.DATE records generate
ged4py.model.Dateinstances, thevalueattribute of this class is converted intoged4py.date.DateValueinstance.INDI records are represented by
ged4py.model.Individualclass.“pointer” records whose
valuehas special GEDCOM <POINTER> syntax (@xref_id@) are represented byged4py.model.Pointerclass. This class has special propertyrefwhich returns referenced record. Methodssub_tag()andsub_tag_value()have keyword argumentfollowwhich can be set toTrueto allow automatic dereferencing of the pointer records.