Skip to content

Commit

Permalink
Format markdown in Python instead of removing it with pandoc
Browse files Browse the repository at this point in the history
Requested in comments of #383
  • Loading branch information
tychonievich committed Nov 30, 2023
1 parent 5ef5ce3 commit e7a118b
Show file tree
Hide file tree
Showing 216 changed files with 1,582 additions and 1,343 deletions.
1 change: 1 addition & 0 deletions build/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ This directory is used to convert the `specifications/gedcom.md` source file int
- [weasyprint](https://weasyprint.org) installed by running `python3 -mpip install --user --upgrade weasyprint`
- Note: version 52.5 was notably faster than later versions;
`python3 -mpip install --user weasyprint==52.5` will install that version instead of the latest version.
- [mdformat_gfm](https://pypi.org/project/mdformat-gfm/) installed by running `python3 -mpip install --user --upgrade mdformat_gfm`
- [git](https://git-scm.com/)
- `make`-compatible executable

Expand Down
35 changes: 32 additions & 3 deletions build/uri-def.py
Original file line number Diff line number Diff line change
Expand Up @@ -302,12 +302,41 @@ def find_enumsets(txt):
return res

def tidy_markdown(md, indent, width=79):
"""Run markdown through pandoc to remove markup and wrap columns"""
"""
The markdown files in the specification directory use the following Markdown dialect:
Part of GFM:
- setext headers with classes like `{.unnumbered}`, unlisting marks like `{-}`, and anchors like `{#container}`
- language-specific code blocks both <code>\`\`\`gedcom</code> and <code>\`\`\` {.gedstruct .long}</code> headers
- markdown inside HTML between lines `<div class="example">` and `</div>` (only inside lists) and between definition list tags `<dl>`, `<dt>`, and `<dd>` (only in acknowledgements)
- code blocks with no leading blank line
- tables with `|---|:--|` format
- tables with `--- | --- | ---` format
Not part of GFM:
- YAML front matter
- divs with `:::class` headers and `:::` footers
- automatic links with `[name of section to link to]`
- inline code with class `1 NO MARR`{.gedcom} (used only once)
pip install mdformat-gfm
"""
global prefixes
for k,v in prefixes.items():
md = re.sub(r'\b'+k+':', v, md)
out = run(['pandoc','-t','plain','--columns='+str(width-indent)], input=md.encode('utf-8'), capture_output=True)
return out.stdout.rstrip().decode('utf-8').replace('\n','\n'+' '*indent)

# for now ignoring YAML frontmatter
md = re.sub(r':::(\S+)', r'<div class="\1">\n', md) # convert ::: divs to <div>s
md = re.sub(r':::', '\n</div>', md) # convert ::: divs to <div>s
md = re.sub(r'\]\([^\)]*\)({[^}]*})?', ']', md) # remove links
md = re.sub(r'`{\.\S+\}', '`', md) # remove inline code classes

import mdformat
out = mdformat.text(md, extensions={"gfm"}, options={"number":True, "wrap":width})

return out.rstrip().replace('\n','\n'+' '*indent).replace('\[','[').replace('\]',']')

def yaml_str_helper(pfx, md, width=79):
txt = tidy_markdown(md, len(pfx), width)
Expand Down
4 changes: 2 additions & 2 deletions extracted-files/grammar.gedstruct
Original file line number Diff line number Diff line change
Expand Up @@ -109,12 +109,12 @@ n @XREF:INDI@ INDI {1:1} g7:record-INDI
MULTIMEDIA_RECORD :=
n @XREF:OBJE@ OBJE {1:1} g7:record-OBJE
+1 RESN <List:Enum> {0:1} g7:RESN
+1 FILE <Special> {1:M} g7:FILE
+1 FILE <FilePath> {1:M} g7:FILE
+2 FORM <MediaType> {1:1} g7:FORM
+3 MEDI <Enum> {0:1} g7:MEDI
+4 PHRASE <Text> {0:1} g7:PHRASE
+2 TITL <Text> {0:1} g7:TITL
+2 TRAN <Special> {0:M} g7:FILE-TRAN
+2 TRAN <FilePath> {0:M} g7:FILE-TRAN
+3 FORM <MediaType> {1:1} g7:FORM
+1 <<IDENTIFIER_STRUCTURE>> {0:M}
+1 <<NOTE_STRUCTURE>> {0:M}
Expand Down
4 changes: 2 additions & 2 deletions extracted-files/payloads.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ https://gedcom.io/terms/v7/ADOP-FAMC @<https://gedcom.io/terms/v7/record-FAM>@
https://gedcom.io/terms/v7/FAMS @<https://gedcom.io/terms/v7/record-FAM>@
https://gedcom.io/terms/v7/FAX http://www.w3.org/2001/XMLSchema#string
https://gedcom.io/terms/v7/FCOM Y|<NULL>
https://gedcom.io/terms/v7/FILE http://www.w3.org/2001/XMLSchema#string
https://gedcom.io/terms/v7/FILE https://gedcom.io/terms/v7/type-FilePath
https://gedcom.io/terms/v7/FORM http://www.w3.org/ns/dcat#mediaType
https://gedcom.io/terms/v7/PLAC-FORM https://gedcom.io/terms/v7/type-List#Text
https://gedcom.io/terms/v7/HEAD-PLAC-FORM https://gedcom.io/terms/v7/type-List#Text
Expand Down Expand Up @@ -165,7 +165,7 @@ https://gedcom.io/terms/v7/TOP http://www.w3.org/2001/XMLSchema#nonNegativeInteg
https://gedcom.io/terms/v7/NAME-TRAN https://gedcom.io/terms/v7/type-Name
https://gedcom.io/terms/v7/PLAC-TRAN https://gedcom.io/terms/v7/type-List#Text
https://gedcom.io/terms/v7/NOTE-TRAN http://www.w3.org/2001/XMLSchema#string
https://gedcom.io/terms/v7/FILE-TRAN http://www.w3.org/2001/XMLSchema#string
https://gedcom.io/terms/v7/FILE-TRAN https://gedcom.io/terms/v7/type-FilePath
https://gedcom.io/terms/v7/TRLR
https://gedcom.io/terms/v7/TYPE http://www.w3.org/2001/XMLSchema#string
https://gedcom.io/terms/v7/NAME-TYPE https://gedcom.io/terms/v7/type-Enum
Expand Down
39 changes: 22 additions & 17 deletions extracted-files/tags/ADDR
Original file line number Diff line number Diff line change
Expand Up @@ -10,31 +10,36 @@ standard tag: ADDR

specification:
- Address
- The location of, or most relevant to, the subject of the superstructure.
See ADDRESS_STRUCTURE for more.
- The location of, or most relevant to, the subject of the superstructure. See
`ADDRESS_STRUCTURE` for more.
- |
A specific building, plot, or location. The payload is the full formatted
address as it would appear on a mailing label, including appropriate line
breaks (encoded using CONT tags). The expected order of address components
varies by region; the address should be organized as expected by the
addressed region.
breaks (encoded using `CONT` tags). The expected order of address components
varies by region; the address should be organized as expected by the addressed
region.

Optionally, additional substructures such as STAE and CTRY are provided to
Optionally, additional substructures such as `STAE` and `CTRY` are provided to
be used by systems that have structured their addresses for indexing and
sorting. If the substructures and ADDR payload disagree, the ADDR payload
shall be taken as correct. Because the regionally-correct order and
formatting of address components cannot be determined from the
substructures alone, the ADDR payload is required, even if its content
appears to be redundant with the substructures.
sorting. If the substructures and `ADDR` payload disagree, the `ADDR` payload
shall be taken as correct. Because the regionally-correct order and formatting
of address components cannot be determined from the substructures alone, the
`ADDR` payload is required, even if its content appears to be redundant with
the substructures.

ADR1 and ADR2 were introduced in version 5.5 (1996) and ADR3 in version
5.5.1 (1999), defined as “The first/second/third line of an address.” Some
applications interpreted ADR1 as “the first line of the street address”,
but most took the spec as-written and treated it as a straight copy of a
line of text already available in the ADDR payload.
<div class="deprecation">

`ADR1` and `ADR2` were introduced in version 5.5 (1996) and `ADR3` in version
5.5.1 (1999), defined as "The first/second/third line of an address." Some
applications interpreted ADR1 as "the first line of the *street* address", but
most took the spec as-written and treated it as a straight copy of a line of
text already available in the `ADDR` payload.

Duplicating information bloats files and introduces the potential for
self-contradiction. ADR1, ADR2, and ADR3 should not be added to new files.
self-contradiction. `ADR1`, `ADR2`, and `ADR3` should not be added to new
files.

</div>

label: 'Address'

Expand Down
6 changes: 3 additions & 3 deletions extracted-files/tags/ADOP
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,10 @@ standard tag: ADOP

specification:
- Adoption
- An Individual Event. See also INDIVIDUAL_EVENT_STRUCTURE.
- An [Individual Event]. See also `INDIVIDUAL_EVENT_STRUCTURE`.
- adoption
- Creation of a legally approved child-parent relationship that does not
exist biologically.
- Creation of a legally approved child-parent relationship that does not exist
biologically.

label: 'Adoption'

Expand Down
10 changes: 5 additions & 5 deletions extracted-files/tags/ADOP-FAMC
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,11 @@ specification:
- |
The individual or couple that adopted this individual.

Adoption by an individual, rather than a couple, may be represented either
by pointing to a FAM where that individual is a HUSB or WIFE and using a
https://gedcom.io/terms/v7/FAMC-ADOP substructure to indicate which 1
performed the adoption; or by using a FAM where the adopting individual is
the only HUSB/WIFE.
Adoption by an individual, rather than a couple, may be represented either by
pointing to a `FAM` where that individual is a `HUSB` or `WIFE` and using a
`https://gedcom.io/terms/v7/FAMC-ADOP` substructure to indicate which 1
performed the adoption; or by using a `FAM` where the adopting individual is
the only `HUSB`/`WIFE`.

label: 'Family child'

Expand Down
12 changes: 8 additions & 4 deletions extracted-files/tags/ADR1
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,15 @@ standard tag: ADR1
specification:
- Address Line 1
- |
The first line of the address, used for indexing. This structures payload
should be a single line of text equal to the first line of the
corresponding ADDR. See ADDRESS_STRUCTURE for more.
The first line of the address, used for indexing. This structure's payload
should be a single line of text equal to the first line of the corresponding
`ADDR`. See `ADDRESS_STRUCTURE` for more.

ADR1 should not be added to new files; see ADDRESS_STRUCTURE for more.
<div class="deprecation">

`ADR1` should not be added to new files; see `ADDRESS_STRUCTURE` for more.

</div>

label: 'Address Line 1'

Expand Down
12 changes: 8 additions & 4 deletions extracted-files/tags/ADR2
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,15 @@ standard tag: ADR2
specification:
- Address Line 2
- |
The second line of the address, used for indexing. This structures payload
should be a single line of text equal to the second line of the
corresponding ADDR. See ADDRESS_STRUCTURE for more.
The second line of the address, used for indexing. This structure's payload
should be a single line of text equal to the second line of the corresponding
`ADDR`. See `ADDRESS_STRUCTURE` for more.

ADR2 should not be added to new files; see ADDRESS_STRUCTURE for more.
<div class="deprecation">

`ADR2` should not be added to new files; see `ADDRESS_STRUCTURE` for more.

</div>

label: 'Address Line 2'

Expand Down
12 changes: 8 additions & 4 deletions extracted-files/tags/ADR3
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,15 @@ standard tag: ADR3
specification:
- Address Line 3
- |
The third line of the address, used for indexing. This structures payload
should be a single line of text equal to the third line of the
corresponding ADDR. See ADDRESS_STRUCTURE for more.
The third line of the address, used for indexing. This structure's payload
should be a single line of text equal to the third line of the corresponding
`ADDR`. See `ADDRESS_STRUCTURE` for more.

ADR3 should not be added to new files; see ADDRESS_STRUCTURE for more.
<div class="deprecation">

`ADR3` should not be added to new files; see `ADDRESS_STRUCTURE` for more.

</div>

label: 'Address Line 3'

Expand Down
4 changes: 2 additions & 2 deletions extracted-files/tags/AGE
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ standard tag: AGE

specification:
- Age at event
- The age of the individual at the time an event occurred, or the age listed
in the document.
- The age of the individual at the time an event occurred, or the age listed in
the document.

label: 'Age at event'

Expand Down
9 changes: 4 additions & 5 deletions extracted-files/tags/AGNC
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,10 @@ standard tag: AGNC

specification:
- Responsible agency
- The organization, institution, corporation, person, or other entity that
has responsibility for the associated context. Examples are an employer of
a person of an associated occupation, or a church that administered rites
or events, or an organization responsible for creating or archiving
records.
- The organization, institution, corporation, person, or other entity that has
responsibility for the associated context. Examples are an employer of a person
of an associated occupation, or a church that administered rites or events, or
an organization responsible for creating or archiving records.

label: 'Responsible agency'

Expand Down
18 changes: 11 additions & 7 deletions extracted-files/tags/ALIA
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,19 @@ specification:
- Alias
- |
A single individual may have facts distributed across multiple individual
records, connected by ALIA pointers (named after alias in the computing
records, connected by `ALIA` pointers (named after "alias" in the computing
sense, not the pseudonym sense).

This specification does not define how to connect INDI records with ALIA.
Some systems organize ALIA pointers to create a tree structure, with the
root INDI record containing the composite view of all facts in the leaf
INDI records. Others distribute events and attributes between INDI records
mutually linked by symmetric pairs of ALIA pointers. A future version of
this specification may adjust the definition of ALIA.
<div class="note">

This specification does not define how to connect `INDI` records with `ALIA`.
Some systems organize `ALIA` pointers to create a tree structure, with the root
`INDI` record containing the composite view of all facts in the leaf `INDI`
records. Others distribute events and attributes between `INDI` records
mutually linked by symmetric pairs of `ALIA` pointers. A future version of this
specification may adjust the definition of `ALIA`.

</div>

label: 'Alias'

Expand Down
4 changes: 2 additions & 2 deletions extracted-files/tags/ANCI
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ standard tag: ANCI

specification:
- Ancestor interest
- Indicates an interest in additional research for ancestors of this
individual. (See also DESI).
- Indicates an interest in additional research for ancestors of this individual.
(See also `DESI`).

label: 'Ancestor interest'

Expand Down
2 changes: 1 addition & 1 deletion extracted-files/tags/ANUL
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ standard tag: ANUL

specification:
- Annulment
- A Family Event. See also FAMILY_EVENT_STRUCTURE.
- A [Family Event]. See also `FAMILY_EVENT_STRUCTURE`.
- annulment
- Declaring a marriage void from the beginning (never existed).

Expand Down
38 changes: 22 additions & 16 deletions extracted-files/tags/ASSO
Original file line number Diff line number Diff line change
Expand Up @@ -10,26 +10,32 @@ standard tag: ASSO

specification:
- Associates
- A pointer to an associated individual. See ASSOCIATION_STRUCTURE for more.
- A pointer to an associated individual. See `ASSOCIATION_STRUCTURE` for more.
- |
An individual associated with the subject of the superstructure. The nature
of the association is indicated in the ROLE substructure.
An individual associated with the subject of the superstructure. The nature of
the association is indicated in the `ROLE` substructure.

A voidPtr and PHRASE can be used to describe associations to people not
referenced by any INDI record.
A `voidPtr` and `PHRASE` can be used to describe associations to people not
referenced by any `INDI` record.

The following indicates that “Mr Stockdale” was the individual’s teacher
and that individual @I2@ was the clergy officiating at their baptism.
<div class="example">

0 @I1@ INDI
1 ASSO @VOID@
2 PHRASE Mr Stockdale
2 ROLE OTHER
3 PHRASE Teacher
1 BAPM
2 DATE 1930
2 ASSO @I2@
3 ROLE CLERGY
The following indicates that "Mr Stockdale" was the individual's teacher and
that individual `@I2@` was the clergy officiating at their baptism.

```gedcom
0 @I1@ INDI
1 ASSO @VOID@
2 PHRASE Mr Stockdale
2 ROLE OTHER
3 PHRASE Teacher
1 BAPM
2 DATE 1930
2 ASSO @I2@
3 ROLE CLERGY
```

</div>

label: 'Associates'

Expand Down
6 changes: 3 additions & 3 deletions extracted-files/tags/AUTH
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@ standard tag: AUTH
specification:
- Author
- The person, agency, or entity who created the record. For a published work,
this could be the author, compiler, transcriber, abstractor, or editor. For
an unpublished source, this may be an individual, a government agency,
church organization, or private organization.
this could be the author, compiler, transcriber, abstractor, or editor. For an
unpublished source, this may be an individual, a government agency, church
organization, or private organization.

label: 'Author'

Expand Down
6 changes: 3 additions & 3 deletions extracted-files/tags/BAPL
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,10 @@ standard tag: BAPL

specification:
- Baptism, Latter-Day Saint
- A Latter-Day Saint Ordinance. See also LDS_INDIVIDUAL_ORDINANCE.
- A [Latter-Day Saint Ordinance]. See also `LDS_INDIVIDUAL_ORDINANCE`.
- baptism
- The event of baptism performed at age 8 or later by priesthood authority of
The Church of Jesus Christ of Latter-day Saints. (See also BAPM)
- The event of baptism performed at age 8 or later by priesthood authority of The
Church of Jesus Christ of Latter-day Saints. (See also [`BAPM`])

label: 'Baptism, Latter-Day Saint'

Expand Down
4 changes: 2 additions & 2 deletions extracted-files/tags/BAPM
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ standard tag: BAPM

specification:
- Baptism
- An Individual Event. See also INDIVIDUAL_EVENT_STRUCTURE.
- An [Individual Event]. See also `INDIVIDUAL_EVENT_STRUCTURE`.
- baptism
- Baptism, performed in infancy or later. (See also BAPL and CHR.)
- Baptism, performed in infancy or later. (See also [`BAPL`] and `CHR`.)

label: 'Baptism'

Expand Down
Loading

0 comments on commit e7a118b

Please sign in to comment.