Created scripts/hhs_doi_cli.py and updated dependencies in pyproject.… #11

joshlawrimore · 2024-11-27T19:00:28Z

…toml.

@agt24 modified the filter_cli.py CLI to include DOI extraction with pdf2doi. I further modified it and created this PR. @leej3 , If it doesn't break anything, please approve the review. I'm happy to take any suggestions.

…toml

quang-ng

Can we relocate the processing code to the dsst-etl folder and create a unit test for it? Here, we only call the function.

scripts/hhs_doi_cli.py

quang-ng · 2024-11-29T10:16:17Z

scripts/hhs_doi_cli.py

+    output_exists: bool = output_csv.exists()
+
+    if output_exists and new_run is False:
+        print("Removing previously processed PDFs")


could we using logger instead of print?

Done! Set up a package-level logger at dsst_etl.logger

quang-ng · 2024-11-29T10:17:19Z

scripts/hhs_doi_cli.py

+            pdfs = list(pdfs)
+
+    dicts: list[dict] = []
+    with open(output_csv, "a") as f_out:


this is too complex, could we break it into smaller functions

I agree. I broke the main function into 3 and then put all functionout outside of argparse in dsst_etl.hhs_doi.py

quang-ng · 2024-11-29T10:17:57Z

scripts/hhs_doi_cli.py

+                    identifier = doi_info.get("identifier")
+                    identifier_type = doi_info.get("identifier_type")
+                    extraction_method = doi_info.get("method")
+                except Exception:


Insert a logger here so we can identify which exception occurs.

Co-authored-by: Quang Nguyen <[email protected]>

leej3

I haven't run the code but it looks reasonable to merge. I'm assuming this functionality will be reviewed/used in upcoming development.

I agree with @quang-ngs points:

Using different logging levels will make things easier to debug in the long run.
Files in the script directory should have little but a CLI... other functionality should be in the package and ideally it should be tested.
The function is too large which will harm maintainability. The deep nesting can be avoided by turning the core functionality into its own function.

…odule in the dsst-etl package.

quang-ng

Great!!! 🚀 🚀 🚀 🚀 🚀 🚀 🚀

…he current working directory.

Created scripts/hhs_doi_cli.py and updated dependencies in pyproject.…

d45e5ee

…toml

quang-ng reviewed Nov 29, 2024

View reviewed changes

Update scripts/hhs_doi_cli.py

48ee376

Co-authored-by: Quang Nguyen <[email protected]>

leej3 approved these changes Nov 29, 2024

View reviewed changes

joshlawrimore added 3 commits November 29, 2024 14:13

Created a package logger, refactored /scripts/hhs_doi_cli.py into a m…

090c88e

…odule in the dsst-etl package.

Fixed failing isort tests

71fa678

Removed doc-string references to logger variable

cf670d0

quang-ng approved these changes Dec 2, 2024

View reviewed changes

Changed the Path to be relative to the inputted pdf_dir rather than t…

354717f

…he current working directory.

joshlawrimore merged commit d372ddb into main Dec 2, 2024
4 checks passed

joshlawrimore deleted the filter branch December 2, 2024 17:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Created scripts/hhs_doi_cli.py and updated dependencies in pyproject.… #11

Created scripts/hhs_doi_cli.py and updated dependencies in pyproject.… #11

joshlawrimore commented Nov 27, 2024 •

edited

Loading

quang-ng left a comment

quang-ng Nov 29, 2024

joshlawrimore Nov 29, 2024

quang-ng Nov 29, 2024

joshlawrimore Nov 29, 2024

quang-ng Nov 29, 2024

joshlawrimore Nov 29, 2024

leej3 left a comment

quang-ng left a comment

Created scripts/hhs_doi_cli.py and updated dependencies in pyproject.… #11

Created scripts/hhs_doi_cli.py and updated dependencies in pyproject.… #11

Conversation

joshlawrimore commented Nov 27, 2024 • edited Loading

quang-ng left a comment

Choose a reason for hiding this comment

quang-ng Nov 29, 2024

Choose a reason for hiding this comment

joshlawrimore Nov 29, 2024

Choose a reason for hiding this comment

quang-ng Nov 29, 2024

Choose a reason for hiding this comment

joshlawrimore Nov 29, 2024

Choose a reason for hiding this comment

quang-ng Nov 29, 2024

Choose a reason for hiding this comment

joshlawrimore Nov 29, 2024

Choose a reason for hiding this comment

leej3 left a comment

Choose a reason for hiding this comment

quang-ng left a comment

Choose a reason for hiding this comment

joshlawrimore commented Nov 27, 2024 •

edited

Loading