Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
tychonievich committed May 21, 2021
0 parents commit c660be1
Show file tree
Hide file tree
Showing 12 changed files with 5,358 additions and 0 deletions.
34 changes: 34 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# FamilySearch GEDCOM

The official FamilySearch GEDCOM specification for exchanging genealogical data.

This repository is for the collaborative development of the FamilySearch GEDCOM specification.
If you are looking for the specifciation itself, see <https://gedcom.io>.

If you are looking for FamilySearch's GEDCOM 5.5.1 Java parser, which previously had this same repository name, see <https://github.com/familysearch/gedcom5-java>


## Repository structure

- [`change.log.md`](changelog.md) is a running log of major changes made to the specification.
- [`specifcation/`](specification/) contains the FamilySearch GEDCOM specification:
- [`specifcation/gedcom.md`](specification/gedcom.md) is the source document used to define the FamilySearch GEDCOM specification. It is written in pandoc-flavor markdown and is intended to be more easily written than read.
- other files are rendered versions of `gedcom.md`. One of these is likely to be the one users of the specification want.
- [`build/`](build/) contains files needed to render the specification
- See [`build/README.md`](build/) for more
- [`extracted-files/`](extracted-files/) contains digested information automatically extracted from the specification. All files in this directory are automatically generated by scripts in the [`build/`](build/) directory.
- [`extracted-files/grammar.abnf`](extracted-files/grammar.abnf) contains all the character-level ABNF for parsing lines and datatypes
- [`extracted-files/grammar.gedstruct`](extracted-files/grammar.gedstruct) contains a custom structure organization metasyntax
- [`extracted-files/tags/`](extracted-files/tags/) contains summary information for each <https://gedcom.io/terms/>-based URI defined in the specification.

## Branches

- `main` contains the current release.
Patch versions are generally pushed directly to `main` upon approval.

- `next-minor` contains a working draft of the next minor release. Changes from `main` have been discussed and approved by the working group supervising the next minor release, but have not been fully vetted and approved for inclusion in the standard and may change at any time without notice.

- `next-major` contains a working draft of the next major release. Changes from `main` have been discussed and approved by the working group supervising the next major release, but have not been fully vetted and approved for inclusion in the standard and may change at any time without notice.

- All other branches are for conversation drafts that may or may not be incorporated into a future version of the specification.

59 changes: 59 additions & 0 deletions build/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
This directory is used to convert the `specifications/gedcom.md` source file into fully-hyperlinked HTML and PDF.

# Building -- quick-start guide

1. Install dependencies:

- [python 3](https://python.org)
- [pandoc](https://pandoc.org)
- [weasyprint](https://weasyprint.org) installed by running `python3 -mpip install --user --upgrade weasyprint`
- [git](https://git-scm.com/)
- `make`-compatible executable

2. From the directory containing this README, run `make`

# Building -- how it works

Getting from `gedcom.md` to `gedcom.pdf` is a multi-step process, all of which is handled by the `Makefile`:

1. `hyperlink.py` reads `gedcom.md` and adds hyperlinks into `gedcom-tmp.md`. It is somewhat dependent on the internal formatting of `gedcom.md` and may need adjustment if, e.g., tables are switched to a different markdown table format.

2. `pandoc` converts `gedcom-tmp.md` into `gedcom-tmp.html`.
It uses `template.html` for structure,
`pandoc.css` for styling,
and `gedcom.xml`, `gedstruct.xml`, and `abnf.xml` for syntax highlighting.

Pandoc's command-line options include

- syntax highlighting options:
- `--syntax-definition=gedcom.xml`
- `--syntax-definition=gedstruct.xml`
- `--syntax-definition=abnf.xml`
- `--highlight-style=kate`
- general formatting options
- `--from=markdown+smart`
- `--standalone`
- `--toc`
- `--number-sections`
- `--self-contained`
- `--metadata="date:`date you want on the cover page`"`
- stylistic options
- `--css=pandoc.css`
- `--template=template.html`
- input/output options
- `--wrap=none`
- `--to=html5`
- `--output=gedcom-tmo.html`
- `gedcom-tmp.md`

3. `hyperlink-code.py` converts `gedcom-tmp.html` into `gedcom.html` by

- removing all `col` and `colgroup` elements, which are incorrectly handled by some versions of the webkit rendering engine used by weasyprint.
- adding hyperlinks inside code blocks (which markdown cannot do)

This is dependent on the code environment classes created by syntax highlighting, and may need adjusting if pandoc changes these class names or of the syntax highlighting definition files XML are edited.

4. `python3 -mweasyprint gedcom.html gedcom.pdf` turns the HTML into PDF

Note that a relatively recent version of `weasyprint` (published in 2020 or later) is needed to correctly handle syntax-highlighted code blocks.
Also note that it is expected that this will emit a variety of warning messages based on CSS rules intended for screen, not print. If it emits any error messages, those should be resolved whether they impede the creation of the PDF or not.
90 changes: 90 additions & 0 deletions build/abnf.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE language SYSTEM "language.dtd">
<language name="ABNF" section="Other" extensions="*.abnf;*.ABNF" mimetype="" version="1" kateversion="5.0" author="Luther Tychonievich" license="Public Domain" >
<!-- see https://docs.kde.org/trunk5/en/applications/katepart/highlight.html for this XML file format -->

<highlighting>
<contexts>
<context name="abnf" attribute="Error" lineEndContext="#stay">
<DetectChar attribute="Comment" char=";" context="comment" />
<RegExpr attribute="Rule" String="^[a-zA-Z][-a-zA-Z0-9]*" context="define" />
</context>

<context name="define" attribute="Error" lineEndContext="#pop">
<DetectChar attribute="Comment" char=";" context="comment" />
<DetectSpaces attribute="Normal Text"/>
<Detect2Chars attribute="Delim" char="/" char1="=" context="#pop!elements" />
<DetectChar attribute="Delim" char="=" context="#pop!elements" />
</context>

<context name="comment" attribute="Comment" lineEndContext="#pop" />
<context name="elements" attribute="Error" lineEndContext="#stay">
<DetectChar attribute="Comment" char=";" context="comment" />
<DetectSpaces attribute="Normal Text"/>
<DetectChar attribute="Delim" char="/" />
<DetectChar attribute="Delim" char="(" context="paren" />
<DetectChar attribute="Delim" char="[" context="bracket" />
<RegExpr attribute="Rule" String="^[a-zA-Z][-a-zA-Z0-9]*" context="#pop!define" />
<RegExpr attribute="String2" String="%s&quot;" context="cstr" />
<RegExpr attribute="String" String="(?:%i)?&quot;" context="istr" />
<RegExpr attribute="Normal Text" String="[a-zA-Z][-a-zA-Z0-9]*" />
<RegExpr attribute="Char" String="%x[0-9a-fA-F]+(?:(?:[.][0-9a-fA-F]+)+|-[0-9a-fA-F]+)?" />
<RegExpr attribute="Char" String="%d[0-9]+(?:(?:[.][0-9]+)+|-[0-9]+)?" />
<RegExpr attribute="Char" String="%b[01]+(?:(?:[.][01]+)+|-[01]+)?" />
<RegExpr attribute="Repeat" String="[0-9]+|[0-9]*\*[0-9]*" context="element" />
<!-- omitting prose descriptions -->
</context>

<context name="element" attribute="Error" lineEndContext="#pop!error">
<DetectChar attribute="Delim" char="(" context="#pop!paren" />
<DetectChar attribute="Delim" char="[" context="#pop!bracket" />
<RegExpr attribute="String2" String="%s&quot;" context="#pop!cstr" />
<RegExpr attribute="String" String="(?:%i)?&quot;" context="#pop!istr" />
<RegExpr attribute="Normal Text" String="[a-zA-Z][-a-zA-Z0-9]*" context="#pop" />
<RegExpr attribute="Char" String="%x[0-9a-fA-F]+(?:(?:[.][0-9a-fA-F]+)+|-[0-9a-fA-F]+)?" context="#pop" />
<RegExpr attribute="Char" String="%d[0-9]+(?:(?:[.][0-9]+)+|-[0-9]+)?" context="#pop" />
<RegExpr attribute="Char" String="%b[01]+(?:(?:[.][01]+)+|-[01]+)?" context="#pop" />
<!-- omitting prose descriptions -->
</context>

<context name="paren" attribute="Error" lineEndContext="#stay">
<DetectChar attribute="Delim" char=")" context="#pop" />
<IncludeRules context="elements" />
</context>

<context name="bracket" attribute="Error" lineEndContext="#stay">
<DetectChar attribute="Delim" char="]" context="#pop" />
<IncludeRules context="elements" />
</context>

<context name="istr" attribute="String" lineEndContext="error">
<DetectChar char="&quot;" context="#pop" />
</context>
<context name="cstr" attribute="String2" lineEndContext="error">
<DetectChar char="&quot;" context="#pop" />
</context>
<context name="error" attribute="Error" lineEndContext="#pop" />

</contexts>

<itemDatas>
<itemData name="Normal Text" defStyleNum="dsNormal" />
<itemData name="Comment" defStyleNum="dsComment" />
<itemData name="Rule" defStyleNum="dsVariable" />
<itemData name="Delim" defStyleNum="dsControlFlow" />
<itemData name="String" defStyleNum="dsString" />
<itemData name="String2" defStyleNum="dsSpecialString" />
<itemData name="Char" defStyleNum="dsChar" />
<itemData name="Repeat" defStyleNum="dsControlFlow" />
<itemData name="Error" defStyleNum="dsError" />
</itemDatas>
</highlighting>

<general>
<comments>
<comment name="singleLine" start="#" />
</comments>
</general>


</language>
59 changes: 59 additions & 0 deletions build/extract-grammars.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
from sys import argv
from os.path import join, dirname, isfile, isdir, exists
from os import makedirs

def get_paths():
"""Parses command-line arguments, if present; else uses defaults"""
spec = join(dirname(argv[0]),'../GEDCOM.md') if len(argv) < 2 or not isfile(argv[1]) else argv[1]
dest = join(dirname(argv[0]),'../')
for arg in argv:
if arg and isdir(arg):
dest = arg
break
if arg and not exists(arg) and arg[0] != '-' and isdir(dirname(arg)):
dest = arg
break

if not isdir(dest):
makedirs(dest)

return spec, dest


if __name__ == '__main__':
src, dst = get_paths()
abnf = []
gedstruct = []
where = None
header = ''
with open(src) as f:
for line in f:
if line.startswith('```'):
if where:
if where == 'abnf': abnf.append('\n\n')
elif where == 'gedstruct': gedstruct.append('\n\n')
where = None
elif 'gedstruct' in line:
where = 'gedstruct'
if header:
gedstruct.append(header.replace('`', '') + '\n')
header = ''
elif 'abnf' in line:
where = 'abnf'
if header:
abnf.append('; ' + '-'*13 + ' ' +header + ' ' + '-'*13 + '\n\n')
header = ''
elif where == 'abnf': abnf.append(line)
elif where == 'gedstruct': gedstruct.append(line)
elif line.startswith('#'):
header = line
if '{' in header: header = header[:header.find('{')]
header = header.strip('# \n\r\t')
with open(join(dst,'grammar.abnf'), 'w') as f:
f.write('''; This document is in ABNF, see <https://tools.ietf.org/html/std68>
; This document uses RFC 7405 to add case-sensitive literals to ABNF.
''')
f.write(''.join(abnf))
with open(join(dst,'grammar.gedstruct'), 'w') as f:
f.write(''.join(gedstruct))
39 changes: 39 additions & 0 deletions build/gedcom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE language SYSTEM "language.dtd">
<language name="gedcom" section="Other" extensions="*.ged;*.GED;*.gedcom;*.GEDCOM" mimetype="" version="1" kateversion="5.0" author="Luther Tychonievich" license="Public Domain" >
<!-- see https://docs.kde.org/trunk5/en/applications/katepart/highlight.html for this XML file format -->

<highlighting>
<contexts>
<context name="gedcom" attribute="Error" lineEndContext="#stay">
<RegExpr String="^0 (?! )" attribute="Level" context="anchor" />
<RegExpr String="^[1-9][0-9]* (?! )" attribute="Level" context="tag" />
</context>
<context name="anchor" attribute="Error" lineEndContext="#pop">
<RegExpr String="@[A-Z0-9_]+@ (?! )" attribute="Anchor" context="#pop!tag" />
<RegExpr String="[A-Z0-9_]+(?: |$)" attribute="Tag" context="#pop!payload" />
</context>
<context name="tag" attribute="Error" lineEndContext="#pop">
<RegExpr String="[A-Z0-9_]+(?: |$)" attribute="Tag" context="#pop!payload" />
</context>
<context name="payload" attribute="Error" lineEndContext="#pop">
<RegExpr String="@@" attribute="Escape" context="#pop!text" />
<RegExpr String="@[A-Z0-9_]+@$" attribute="Pointer" context="#pop!error" />
<RegExpr String="[^@].*" attribute="Text" context="#pop!error" />
</context>
<context name="error" attribute="Error" lineEndContext="#pop" />
<context name="text" attribute="Text" lineEndContext="#pop" />
</contexts>

<itemDatas>
<itemData name="Text" defStyleNum="dsNormal" />
<itemData name="Pointer" defStyleNum="dsVariable" />
<itemData name="Anchor" defStyleNum="dsControlFlow" />
<itemData name="Tag" defStyleNum="dsFunction" />
<itemData name="Level" defStyleNum="dsDataType" />
<itemData name="Escape" defStyleNum="dsSpecialChar" />
<itemData name="Error" defStyleNum="dsError" />
</itemDatas>
</highlighting>

</language>
84 changes: 84 additions & 0 deletions build/gedstruct.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE language SYSTEM "language.dtd">
<language name="gedstruct" section="Other" extensions="*.gedstruct" mimetype="" version="1" kateversion="5.0" author="Luther Tychonievich" license="Public Domain" >
<!-- see https://docs.kde.org/trunk5/en/applications/katepart/highlight.html for this XML file format -->

<highlighting>
<contexts>
<context name="gedstruct" attribute="Error" lineEndContext="#stay">
<DetectChar attribute="Alternate" char="[" context="alt" />
<StringDetect String="0 " column="0" attribute="Level" context="tag1" />
<StringDetect String="n " column="0" attribute="Level" context="tag1" />
<StringDetect String="+1 " column="2" attribute="Level" context="tag1" />
<StringDetect String="+2 " column="5" attribute="Level" context="tag1" />
<StringDetect String="+3 " column="8" attribute="Level" context="tag1" />
<StringDetect String="+4 " column="11" attribute="Level" context="tag1" />
<DetectSpaces attribute="Text"/>
</context>
<context name="alt" attribute="Error" lineEndContext="#stay">
<DetectChar attribute="Alternate" char="]" context="#pop" />
<DetectChar attribute="Alternate" char="|" />
<IncludeRules context="gedstruct" />
<DetectSpaces attribute="Text"/>
</context>
<context name="tag1" attribute="Error" lineEndContext="#stay">
<RegExpr String="@XREF:[A-Z_][A-Z0-9_]+@ " attribute="Anchor" context="#pop!tag" />
<IncludeRules context="tag" />
</context>
<context name="tag" attribute="Error" lineEndContext="#stay">
<RegExpr String="[A-Z_][A-Z0-9_]+ " attribute="Tag" context="#pop!payload" />
<RegExpr String="&lt;&lt;[^&gt;]*&gt;&gt;" attribute="Recur" context="#pop!just-count" />
<DetectChar attribute="Alternate" char="[" context="alt-tag" />
</context>
<context name="alt-tag" attribute="Error" lineEndContext="#stay">
<DetectChar attribute="Alternate" char="]" context="#pop#pop!payload" />
<DetectChar attribute="Alternate" char="|" />
<RegExpr String="[A-Z_][A-Z0-9_]+ " attribute="Tag" />
<DetectSpaces attribute="Text"/>
</context>
<!--
<context name="payload" attribute="Error" lineEndContext="#pop">
<DetectSpaces attribute="Text" />
<RegExpr String="\{[01]:[1M]\}$" attribute="Count" />
<RegExpr String="&lt;[^&gt;]*&gt;" attribute="Text" />
<RegExpr String="@&lt;XREF:[A-Z_][A-Z0-9_]*&gt;@" attribute="Pointer" />
<RegExpr String="\[[^\]|]+\|&lt;NULL&gt;\]" attribute="Text" />
</context>
-->
<context name="just-count" attribute="Error" lineEndContext="#pop">
<DetectSpaces attribute="Text" />
<RegExpr String="\{[01]:[1M]\}$" attribute="Count" />
</context>
<context name="payload" attribute="Error" lineEndContext="#pop">
<DetectSpaces attribute="Text" />
<RegExpr String="\{[01]:[1M]\} " attribute="Count" context="#pop!iri"/>
<RegExpr String="&lt;[^&gt;]*&gt;" attribute="Text" context="#pop!count"/>
<RegExpr String="@&lt;XREF:[A-Z_][A-Z0-9_]*&gt;@" attribute="Pointer" context="#pop!count"/>
<RegExpr String="\[[^\]|]+\|&lt;NULL&gt;\]" attribute="Text" context="#pop!count"/>
</context>
<context name="count" attribute="Error" lineEndContext="#pop">
<DetectSpaces attribute="Text" />
<RegExpr String="\{[01]:[1M]\} " attribute="Count" context="#pop!iri"/>
</context>
<context name="iri" attribute="Error" lineEndContext="#pop">
<DetectSpaces attribute="Text" />
<RegExpr String="[a-z0-9]+:\S+$" attribute="IRI" />
</context>
</contexts>

<itemDatas>
<itemData name="Text" defStyleNum="dsNormal" />
<itemData name="Alternate" defStyleNum="dsControlFlow" />
<itemData name="Pointer" defStyleNum="dsVariable" />
<itemData name="Anchor" defStyleNum="dsControlFlow" />
<itemData name="Tag" defStyleNum="dsFunction" />
<itemData name="Level" defStyleNum="dsDataType" />
<itemData name="Recur" defStyleNum="dsKeyword" />
<itemData name="Count" defStyleNum="dsChar" />
<itemData name="Value" defStyleNum="dsSpecialChar" />
<itemData name="Error" defStyleNum="dsError" />
<itemData name="IRI" defStyleNum="dsAttribute" />
</itemDatas>
</highlighting>

</language>
Loading

0 comments on commit c660be1

Please sign in to comment.