-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexing Tool: Code Revision and Cleanup #38
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #38 +/- ##
===========================================
+ Coverage 72.01% 96.34% +24.33%
===========================================
Files 1 1
Lines 611 602 -9
Branches 142 124 -18
===========================================
+ Hits 440 580 +140
+ Misses 141 17 -124
+ Partials 30 5 -25 ☔ View full report in Codecov by Sentry. |
/review |
Code Review Agent Run #91ec65
High-level FeedbackEnsure all XPath expressions follow PDS4 standards by removing non-standard suffixes like '<1>'. Update special constant values in configuration files to align with PDS4 specifications. Add comprehensive assertions to test cases to verify expected outcomes, including file content checks and error handling. Improve error messages and logging for better debugging. Consider refactoring complex functions for better maintainability. Ensure all datetime formats comply with PDS4 standards. Review and update documentation to reflect changes in functionality and configuration options.Actionable Issues
📄 test_files/expected/tester_config.yaml
Issues: Total - 1, High importance - 1
📄 test_files/samples/element_extra_file_info.txt
Issues: Total - 1, High importance - 1
📄 tests/test_pds4_create_xml_index_blackbox.py
Issues: Total - 1, High importance - 1
📄 test_files/labels/bad_lid_label.xml
Issues: Total - 1, High importance - 1
📄 tests/test_pds4_create_xml_index_whitebox.py
Issues: Total - 1, High importance - 0
📄 test_files/expected/simplify_xpaths_success_1.txt
Issues: Total - 1, High importance - 1
📄 test_files/expected/tester_config_nillable.yaml
Issues: Total - 1, High importance - 1
|
The
|
I think I mentioned this before, but it would be nice if |
I get this result (I sorted the lines so it's easier to see relationships):
The xpath Also, I don't see how the xpath:
can even exist. There is no place in these labels where there are two Here is
Here is
|
When generating a fixed-width table, the label has the wrong positions for the fields:
Note the math does not add up (a 52 character field and then the next one starts at 26). |
In the documentation, it says:
but this is missing the last two lines:
|
In the documentation under "For reference, provided below are the full contents of the optional label classes:" I think every line should end with a colon. Right now if you copy this list into your yaml config file, it won't parse. |
If there are no references, the label contains:
It seems like in this case the section should be left out entirely. |
I'm not sure why, but this part of the label template doesn't work:
If a
If I remove that loop from the label template, it works. Is this a bug in |
The documentation for the label config is a little confusing. For example,
then it just separates them with spaces:
On the other hand,
yields
|
Likewise, let's say you put this in your config file:
You get this:
|
gives:
|
If you follow those instructions and look at the output headers file, you will see the cleaned names, which then don't actually work with the sort option. Perhaps the message should say "For a list of available sort keys, use the --output-headers-file option without --clean-header-field-names". Alternatively, you could do the sort on the cleaned names instead of the original names, which might make more sense given that those are the actual header names in the index file. |
For this option:
I suggest adding a note that the XPaths in the file must be the full, not simplified, version. This should be put in the main documentation, too. |
The column byte positions in the label are wrong for fixed-width index files. Consider:
The first three columns actually start at 1, 80, and 151, but the label says they start at 1, 79, and 150. Also, any column that has a string in quotes, like |
In the
I assume this general issue could be true for the other optional label fields as well. Perhaps for the optional fields there should be IF statements in the template to allow the individual fields to be omitted? |
The indexing tool
pds4_create_xml_index.py
contained code that was redundant/nonfunctional. This pull request contains a revision of a large potion of the indexing tool, along with redone supplemental files and 100% unit test coverage.pds4_create_xml_index
tool:get_true_type
andsort_dataframe
.label_results
to have a simplified structure, allowing for easier data extraction and header modification.creation_date_time
value, under theFile
class inFile_Area_Ancillary
/File_Area_Metadata
md5_checksum
was different between operating systems was fixed.Fixes #37
Summary by Bito
This PR enhances the PDS4 XML index creation tool with expanded test coverage, improved error handling, and optimized processing. Key updates include refactoring the 'get_true_type' function, enhancing XML template handling, and optimizing XPath processing. New configuration files and test cases were added to validate these changes.Code change type: Refactoring, Testing, Optimization, Documentation
Unit tests added: True
Estimated effort to review (1-5, lower is better): 5