Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing Tool: Code Revision and Cleanup #38

Merged
merged 25 commits into from
Nov 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
d009128
Added --rename-headers functionality
esimpsons3ti Jul 31, 2024
e114d3e
Updated code, got to 100% unit test coverage
esimpsons3ti Aug 19, 2024
e1cf1f9
Adding missing docstrings
esimpsons3ti Aug 19, 2024
6edcafc
Making everything flake8 compliant
esimpsons3ti Aug 19, 2024
fe85192
Remove terminator dependence from tests
rfrenchseti Aug 19, 2024
641c87c
Force \n line terminator on writing CSV
rfrenchseti Aug 20, 2024
4fa4dc6
Missed a to_csv
rfrenchseti Aug 20, 2024
328b339
Removing --rename-headers and --dont-number-unique-tags
esimpsons3ti Aug 20, 2024
c601ada
Merge branch 'es-overhaul' of https://github.com/SETI/rms-pds4indexto…
esimpsons3ti Aug 20, 2024
82294e8
Minor syntax changes, fixed issue with label generation
esimpsons3ti Aug 26, 2024
d85ce1c
Fixing incorrect capitalization
esimpsons3ti Aug 26, 2024
a5a3fb5
Updated config file, cleaned up debugging code
esimpsons3ti Aug 28, 2024
f4d745b
Updated label template with statements
esimpsons3ti Aug 28, 2024
1143081
Fixed duplicate scraped label issue caused by generalized glob patterns
esimpsons3ti Aug 29, 2024
2fb7bab
Got unit test coverage back up to 100%
esimpsons3ti Aug 29, 2024
9f6bfc5
Making flake8 compliant
esimpsons3ti Aug 29, 2024
a138f7c
Making changes according to pull request
esimpsons3ti Sep 4, 2024
59be059
Adding further implementation in label template
esimpsons3ti Sep 4, 2024
94b43c1
Added unit tests for references in label generation
esimpsons3ti Sep 5, 2024
6639527
Making changes according to pull request
esimpsons3ti Oct 17, 2024
d4411fd
Fixing f-string format
esimpsons3ti Oct 17, 2024
7ca45a6
Making changes according to pull request
esimpsons3ti Nov 13, 2024
864ee89
Using DataFrame.applymap(), not DataFrame.map()
esimpsons3ti Nov 13, 2024
5548c89
unified column order for extra file info, removed Python 3.8 requirement
esimpsons3ti Nov 19, 2024
a299746
limit-xpaths-file takes priority over add-extra-file-info term order
esimpsons3ti Nov 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/run-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
python-version: ['3.8', '3.9', '3.10', '3.11', '3.12']
python-version: ['3.9', '3.10', '3.11', '3.12']
fail-fast: false
steps:
- name: Checkout
Expand Down
65 changes: 35 additions & 30 deletions docs/pds4_create_xml_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -145,9 +145,10 @@ Limiting results

- ``--limit-xpaths-file XPATHS_FILEPATH``: Specify a text file containing a list of
specific XPaths to extract from the label files. If this argument is not specified, all
elements found in the label files will be included. The given text file can specify
XPaths using ``glob``-style syntax, where each XPath level is treated as if it were a
directory in a filesystem. Available wildcards are:
elements found in the label files will be included. This command uses only the whole
versions of the XPath(s) -- simplified versions are not allowed. The given text file
can specify XPaths using ``glob``-style syntax, where each XPath level is treated as if
it were a directory in a filesystem. Available wildcards are:

- ``?`` matches any single character within an XPath level
- ``*`` matches any series of characters within an XPath level
Expand Down Expand Up @@ -302,6 +303,8 @@ Below is the ``label-contents`` section of the default configuration file::
External_Reference:
Source_Product_Internal:
Source_Product_External:
File_Area_Ancillary:
File_Area_Metadata:

Each listed value with an empty dictionary is an optional field the user can include in
their generated label. If the user does decide to include one of these fields, **they must
Expand All @@ -311,39 +314,41 @@ element will remain empty**.
For reference, provided below are the full contents of the optional label classes::

Citation_Information:
author_list
editor_list
publication_year
doi
keyword
description
author_list:
editor_list:
publication_year:
doi:
keyword:
description:
Funding_Acknowledgement:
funding_source
funding_year
funding_award
funding_acknowledgement_text
funding_source:
funding_year:
funding_award:
funding_acknowledgement_text:
Modification_Detail:
modification_date
version_id
description
modification_date:
version_id:
description:
Internal_Reference:
lid_reference
reference_type
comment
lid_reference:
reference_type:
comment:
External_Reference:
doi
reference_text
description
doi:
reference_text:
description:
Source_Product_Internal:
lidvid_reference
reference_type
comment
lidvid_reference:
reference_type:
comment:
Source_Product_External:
external_source_product_identifier
reference_type
doi
curating_facility
description
external_source_product_identifier:
reference_type:
doi:
curating_facility:
description:
File_Area_Ancillary / File_Area_Metadata:
creation_date_time:


If no new contents are specified for label generation, the label will contain the
Expand Down
2 changes: 2 additions & 0 deletions pds4indextools/default_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -44,3 +44,5 @@ label-contents:
External_Reference:
Source_Product_Internal:
Source_Product_External:
File_Area_Ancillary:
File_Area_Metadata:
107 changes: 76 additions & 31 deletions pds4indextools/index_label_template_pds.xml
Original file line number Diff line number Diff line change
Expand Up @@ -20,25 +20,37 @@ $END_IF
<product_class>Product_Ancillary</product_class>
$IF(Citation_Information)
<Citation_Information>
$IF(Citation_Information['author_list'] and isinstance(Citation_Information['author_list'], list))
$FOR(Citation_Information['author_list'])
<author_list>$VALUE$</author_list>
$END_FOR
$ELSE_IF(Citation_Information['author_list'] and not isinstance(Citation_Information['author_list'], list))
<author_list>$Citation_Information['author_list']$</author_list>
$END_IF
<editor_list>$Citation_Information['editor_list']$</editor_list>
<publication_year>$Citation_Information['publication_year']$</publication_year>
<doi>$Citation_Information['doi']$</doi>
$IF(Citation_Information['keyword'] and isinstance(Citation_Information['keyword'], list))
$FOR(Citation_Information['keyword'])
<keyword>$VALUE$</keyword>
$END_FOR
$ELSE_IF(Citation_Information['keyword'] and not isinstance(Citation_Information['keyword'], list))
<keyword>$Citation_Information['keyword']$</keyword>
$END_IF
<description>$Citation_Information['description']$</description>
$IF(Citation_Information.get('Funding_Acknowledgement'))
$IF('Funding_Acknowledgement' in Citation_Information)
$IF(Citation_Information['Funding_Acknowledgement'])
<Funding_Acknowledgement>
<funding_source>$Funding_Acknowledgement['funding_source']$</funding_source>
<funding_year>$Funding_Acknowledgement['funding_year']$</funding_year>
<funding_award>$Funding_Acknowledgement['funding_award']$</funding_award>
<funding_acknowledgement_text>$Funding_Acknowledgement['funding_acknowledgement_text']$</funding_acknowledgement_text>
<funding_source>$Citation_Information['Funding_Acknowledgement']['funding_source']$</funding_source>
<funding_year>$Citation_Information['Funding_Acknowledgement']['funding_year']$</funding_year>
<funding_award>$Citation_Information['Funding_Acknowledgement']['funding_award']$</funding_award>
<funding_acknowledgement_text>$Citation_Information['Funding_Acknowledgement']['funding_acknowledgement_text']$</funding_acknowledgement_text>
</Funding_Acknowledgement>
$END_IF
$END_IF
</Citation_Information>
$END_IF
$IF(Modification_Detail)
$IF(Modification_Detail and isinstance(Modification_Detail, list))
<Modification_History>
$FOR(field, k=Modification_Detail)
<Modification_Detail>
Expand All @@ -48,6 +60,14 @@ $END_IF
</Modification_Detail>
$END_FOR
</Modification_History>
$ELSE_IF(Modification_Detail)
<Modification_History>
<Modification_Detail>
<modification_date>$Modification_Detail['modification_date']$</modification_date>
<version_id>$Modification_Detail['version_id']$</version_id>
<description>$Modification_Detail['description']$</description>
</Modification_Detail>
</Modification_History>
$END_IF
<License_Information>
<name>Creative Common Public License CC0 1.0 (2024)</name>
Expand All @@ -58,59 +78,86 @@ $END_IF
</Internal_Reference>
</License_Information>
</Identification_Area>
$IF(Internal_Reference or External_Reference or Source_Product_Internal or Source_Product_External)
<Reference_List>
$IF(Internal_Reference)
$FOR(field, k=Internal_Reference)
<Internal_Reference>
<lid_reference></lid_reference>
<reference_type></reference_type>
<comment></comment>
<lid_reference>$field['lid_reference']$</lid_reference>
<reference_type>$field['reference_type']$</reference_type>
<comment>$field['comment']$</comment>
</Internal_Reference>
$END_FOR
$END_IF
$IF(External_Reference)
$FOR(field, k=External_Reference)
<External_Reference>
<doi></doi>
<reference_text></reference_text>
<description></description>
<doi>$field['doi']$</doi>
<reference_text>$field['reference_text']$</reference_text>
<description>$field['description']$</description>
</External_Reference>
$END_FOR
$END_IF
$IF(Source_Product_Internal)
$FOR(field, k=Source_Product_Internal)
<Source_Product_Internal>
<lidvid_reference></lidvid_reference>
<reference_type></reference_type>
<comment></comment>
<lidvid_reference>$field['lidvid_reference']$</lidvid_reference>
<reference_type>$field['reference_type']$</reference_type>
<comment>$field['comment']$</comment>
</Source_Product_Internal>
$END_FOR
$END_IF
$IF(Source_Product_External)
$FOR(field, k=Source_Product_External)
<Source_Product_External>
<external_source_product_identifier></external_source_product_identifier>
<reference_type></reference_type>
<doi></doi>
<curating_facility></curating_facility>
<description></description>
<external_source_product_identifier>$field['external_source_product_identifier']$</external_source_product_identifier>
<reference_type>$field['reference_type']$</reference_type>
<doi>$field['doi']$</doi>
<curating_facility>$field['curating_facility']$</curating_facility>
<description>$field['description']$</description>
</Source_Product_External>
$END_FOR
$END_IF
</Reference_List>
$END_IF
$IF(Product_Ancillary)
<File_Area_Ancillary>
$END_IF
$IF(Product_Metadata_Supplemental)
$ELSE
<File_Area_Metadata>
$END_IF
$IF(Product_Ancillary and File_Area_Ancillary)
<File>
<file_name>$BASENAME(TEMPFILE)$</file_name>
<file_name>$BASENAME(index_file_name)$</file_name>
<local_identifier>index-table</local_identifier>
<creation_date_time>$DATETIME(creation_date_time)$</creation_date_time>
<md5_checksum>$FILE_MD5(TEMPFILE)$</md5_checksum>
$IF(File_Area_Ancillary['creation_date_time'])
<creation_date_time>$File_Area_Ancillary['creation_date_time']$</creation_date_time>
$ELSE
<creation_date_time>$DATETIME(calculated_creation_date_time)$</creation_date_time>
$END_IF
<md5_checksum>$FILE_MD5(index_file_name)$</md5_checksum>
<comment></comment>
</File>
$ELSE_IF(Product_Metadata_Supplemental and File_Area_Metadata)
<File>
<file_name>$BASENAME(index_file_name)$</file_name>
<local_identifier>index-table</local_identifier>
$IF(File_Area_Metadata['creation_date_time'])
<creation_date_time>$File_Area_Metadata['creation_date_time']$</creation_date_time>
$ELSE
<creation_date_time>$DATETIME(calculated_creation_date_time)$</creation_date_time>
$END_IF
<md5_checksum>$FILE_MD5(index_file_name)$</md5_checksum>
<comment></comment>
</File>
$ELSE
<File>
<file_name>$BASENAME(index_file_name)$</file_name>
<local_identifier>index-table</local_identifier>
<creation_date_time>$DATETIME(calculated_creation_date_time)$</creation_date_time>
<md5_checksum>$FILE_MD5(index_file_name)$</md5_checksum>
<comment></comment>
</File>
$END_IF
<Header>
<offset unit="byte">0</offset>
<object_length unit="byte">$object_length_h$</object_length>
Expand All @@ -121,7 +168,7 @@ $END_IF
<Table_Character>
<offset unit="byte"></offset>
<object_length unit="byte">$object_length_t$</object_length>
<records>$FILE_RECORDS(TEMPFILE)$</records>
<records>$FILE_RECORDS(index_file_name)$</records>
<record_delimiter>Line-Feed</record_delimiter>
<description></description>
<Record_Character>
Expand All @@ -145,7 +192,7 @@ $END_IF
<offset unit="byte">0</offset>
<object_length unit="byte">$object_length_t$</object_length>
<parsing_standard_id>PDS DSV 1</parsing_standard_id>
<records>$FILE_RECORDS(TEMPFILE)$</records>
<records>$FILE_RECORDS(index_file_name)$</records>
<record_delimiter>Line-Feed</record_delimiter>
<field_delimiter>Comma</field_delimiter>
<Record_Delimited>
Expand All @@ -166,13 +213,11 @@ $END_IF
$END_IF
$IF(Product_Ancillary)
</File_Area_Ancillary>
$END_IF
$IF(Product_Metadata_Supplemental)
$ELSE
</File_Area_Metadata>
$END_IF
$IF(Product_Ancillary)
</Product_Ancillary>
$END_IF
$IF(Product_Metadata_Supplemental)
$ELSE
</Product_Metadata_Supplemental>
$END_IF
Loading