Skip to content

Commit

Permalink
Update how tables with URIs in them are parsed (#590)
Browse files Browse the repository at this point in the history
* Update how tables with URIs in them are parsed

Resolves #574

* Add the extracted files this PR changes

* Fix over-eager change of tag names to enum-TAG
  • Loading branch information
tychonievich authored Feb 10, 2025
1 parent fb25aff commit 3708d06
Show file tree
Hide file tree
Showing 7 changed files with 24 additions and 19 deletions.
11 changes: 8 additions & 3 deletions build/uri-def.py
Original file line number Diff line number Diff line change
Expand Up @@ -251,7 +251,9 @@ def find_descriptions(txt, g7, ssp):
if header.startswith('Fam'): pfx = 'FAM-'
if header.startswith('Indi'): pfx = 'INDI-'
for tag, name, desc in re.findall(r'`([A-Z_0-9]+)` *\| *([^|\n]*?) *\| *([^|\n]*[^ |\n]) *', table.group(2)):
if '<br' in name: name = name[:name.find('<br')]
if '<br' in name:
tag = name[name.find('`g7:')+4:name.rfind('`')]
name = name[:name.find('<br')]
if tag not in g7: tag = pfx+tag
if tag not in g7:
raise Exception('Found table for '+tag+' but no section or structure')
Expand All @@ -270,14 +272,16 @@ def find_enum_by_link(txt, enums, tagsets):
# 'g7:INDI-FACT',
# 'g7:FAM-FACT',
# )) ## do not do for enumset-EVEN
enum_prefix = {k[k.find('enum-')+5:] for e in enums.values() for k in e }
for sect in re.finditer(r'# *`(g7:enumset-[^`]*)`[\s\S]*?\n#', txt):
if '[Events]' in sect.group(0):
key = sect.group(1).replace('`','').replace('.','-')
for k in tagsets:
if 'Event' in k:
enums.setdefault(key, [])
for tag in tagsets[k]:
tag = tag.replace('INDI-','enum-').replace('FAM-','enum-')
if tag.startswith('INDI-') and tag[5:] in enum_prefix: tag = 'enum-'+tag[5:]
if tag.startswith('FAM-') and tag[4:] in enum_prefix: tag = 'enum-'+tag[4:]
tag = 'g7:'+tag
if tag in enums[key]: continue
enums[key].append(tag)
Expand All @@ -287,7 +291,8 @@ def find_enum_by_link(txt, enums, tagsets):
if 'Attribute' in k:
enums.setdefault(key, [])
for tag in tagsets[k]:
tag = tag.replace('INDI-','enum-').replace('FAM-','enum-')
if tag.startswith('INDI-') and tag[5:] in enum_prefix: tag = 'enum-'+tag[5:]
if tag.startswith('FAM-') and tag[4:] in enum_prefix: tag = 'enum-'+tag[4:]
tag = 'g7:'+tag
if tag in enums[key]: continue
enums[key].append(tag)
Expand Down
4 changes: 2 additions & 2 deletions extracted-files/enumerationsets.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -75,9 +75,9 @@ https://gedcom.io/terms/v7/enumset-EVENATTR https://gedcom.io/terms/v7/NATI
https://gedcom.io/terms/v7/enumset-EVENATTR https://gedcom.io/terms/v7/NMR
https://gedcom.io/terms/v7/enumset-EVENATTR https://gedcom.io/terms/v7/OCCU
https://gedcom.io/terms/v7/enumset-EVENATTR https://gedcom.io/terms/v7/PROP
https://gedcom.io/terms/v7/enumset-EVENATTR https://gedcom.io/terms/v7/RELI
https://gedcom.io/terms/v7/enumset-EVENATTR https://gedcom.io/terms/v7/INDI-RELI
https://gedcom.io/terms/v7/enumset-EVENATTR https://gedcom.io/terms/v7/SSN
https://gedcom.io/terms/v7/enumset-EVENATTR https://gedcom.io/terms/v7/TITL
https://gedcom.io/terms/v7/enumset-EVENATTR https://gedcom.io/terms/v7/INDI-TITL
https://gedcom.io/terms/v7/enumset-MEDI https://gedcom.io/terms/v7/enum-AUDIO
https://gedcom.io/terms/v7/enumset-MEDI https://gedcom.io/terms/v7/enum-BOOK
https://gedcom.io/terms/v7/enumset-MEDI https://gedcom.io/terms/v7/enum-CARD
Expand Down
6 changes: 6 additions & 0 deletions extracted-files/tags/INDI-RELI
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ standard tag: 'RELI'
specification:
- Religion
- An [Individual Attribute]. See also `INDIVIDUAL_ATTRIBUTE_STRUCTURE`.
- religion
- A religious denomination to which a person is affiliated or for which a record
applies.

label: 'Religion'

Expand Down Expand Up @@ -41,5 +44,8 @@ substructures:
superstructures:
"https://gedcom.io/terms/v7/record-INDI": "{0:M}"

value of:
- "https://gedcom.io/terms/v7/enumset-EVENATTR"

contact: "https://gedcom.io/community/"
...
6 changes: 6 additions & 0 deletions extracted-files/tags/INDI-TITL
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ standard tag: 'TITL'
specification:
- Title
- An [Individual Attribute]. See also `INDIVIDUAL_ATTRIBUTE_STRUCTURE`.
- title
- A formal designation used by an individual in connection with positions of
royalty or other social status, such as Grand Duke.

label: 'Title'

Expand Down Expand Up @@ -41,5 +44,8 @@ substructures:
superstructures:
"https://gedcom.io/terms/v7/record-INDI": "{0:M}"

value of:
- "https://gedcom.io/terms/v7/enumset-EVENATTR"

contact: "https://gedcom.io/community/"
...
6 changes: 0 additions & 6 deletions extracted-files/tags/RELI
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,6 @@ specification:
- Religion
- A religious denomination associated with the event or attribute described by
the superstructure.
- religion
- A religious denomination to which a person is affiliated or for which a record
applies.

label: 'Religion'

Expand Down Expand Up @@ -75,8 +72,5 @@ superstructures:
"https://gedcom.io/terms/v7/SSN": "{0:1}"
"https://gedcom.io/terms/v7/WILL": "{0:1}"

value of:
- "https://gedcom.io/terms/v7/enumset-EVENATTR"

contact: "https://gedcom.io/community/"
...
6 changes: 0 additions & 6 deletions extracted-files/tags/TITL
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,6 @@ specification:
the `SOURCE_RECORD` substructures `AUTH`, `PUBL`, `REPO`, and so on. In such
cases, the entire citation text may be presented as the payload of the
`SOUR`.`TITL`.
- title
- A formal designation used by an individual in connection with positions of
royalty or other social status, such as Grand Duke.

label: 'Title'

Expand All @@ -52,8 +49,5 @@ superstructures:
"https://gedcom.io/terms/v7/OBJE": "{0:1}"
"https://gedcom.io/terms/v7/record-SOUR": "{0:1}"

value of:
- "https://gedcom.io/terms/v7/enumset-EVENATTR"

contact: "https://gedcom.io/community/"
...
4 changes: 2 additions & 2 deletions extracted-files/tags/enumset-EVENATTR
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,9 @@ enumeration values:
- "https://gedcom.io/terms/v7/NMR"
- "https://gedcom.io/terms/v7/OCCU"
- "https://gedcom.io/terms/v7/PROP"
- "https://gedcom.io/terms/v7/RELI"
- "https://gedcom.io/terms/v7/INDI-RELI"
- "https://gedcom.io/terms/v7/SSN"
- "https://gedcom.io/terms/v7/TITL"
- "https://gedcom.io/terms/v7/INDI-TITL"

contact: "https://gedcom.io/community/"
...

0 comments on commit 3708d06

Please sign in to comment.