Skip to content

Commit

Permalink
custom_target and UMCG lesion_PIRADS (#8)
Browse files Browse the repository at this point in the history
### Data Processing Enhancements:
*
[`src/dragon_prep/prepare_umcg.py`](diffhunk://#diff-317657660713de29f27d6c002d61cdce86787751e6188f62c09cf05f28e7fc59R53):
Added a new field `lesion_PIRADS` to the `umcg_entries` dictionary and
cleaned up the notation by replacing "NA" with "N/A".
[[1]](diffhunk://#diff-317657660713de29f27d6c002d61cdce86787751e6188f62c09cf05f28e7fc59R53)
[[2]](diffhunk://#diff-317657660713de29f27d6c002d61cdce86787751e6188f62c09cf05f28e7fc59R68-R70)

### Classification Target Enhancements:
*
[`src/dragon_prep/utils.py`](diffhunk://#diff-65eab55e9c448f1d168962379b7c7ca3fcb3021ffdfb45023c6bed36edd41c8dR46):
Added a new `custom_target` type to the list of classification targets
and updated the `split_and_save_data` function to include this new
target type.
[[1]](diffhunk://#diff-65eab55e9c448f1d168962379b7c7ca3fcb3021ffdfb45023c6bed36edd41c8dR46)
[[2]](diffhunk://#diff-65eab55e9c448f1d168962379b7c7ca3fcb3021ffdfb45023c6bed36edd41c8dR310)
  • Loading branch information
joeranbosma authored Oct 28, 2024
1 parent 72f6891 commit 33e7166
Show file tree
Hide file tree
Showing 5 changed files with 9 additions and 3 deletions.
2 changes: 1 addition & 1 deletion build.sh
Original file line number Diff line number Diff line change
@@ -1 +1 @@
docker build . --tag dragon_prep:latest --tag dragon_prep:v0.2.5
docker build . --tag dragon_prep:latest --tag dragon_prep:v0.2.6
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
long_description = fh.read()

setuptools.setup(
version='0.2.5',
version='0.2.6',
author_email='[email protected]',
long_description=long_description,
long_description_content_type="text/markdown",
Expand Down
4 changes: 4 additions & 0 deletions src/dragon_prep/prepare_umcg.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ def prepare_umcg_radiology_reports(input_dir: Path) -> pd.DataFrame:

umcg_entries.append({
"num_pirads_345": sum([score >= 3 for score in scores]),
"lesion_PIRADS": ",".join(study_df.pirads.values),
**study_df.iloc[0].to_dict(),
})

Expand All @@ -64,6 +65,9 @@ def prepare_umcg_radiology_reports(input_dir: Path) -> pd.DataFrame:
# merge with radiology reports
df["text"] = df["uid"].map(df_reports.text.to_dict())

# clean up notation
df["lesion_PIRADS"] = df["lesion_PIRADS"].str.replace("NA", "N/A")

return df


Expand Down
2 changes: 2 additions & 0 deletions src/dragon_prep/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
"single_label_binary_classification_target",
"multi_label_binary_classification_target",
"multi_label_named_entity_recognition_target",
"custom_target",
]


Expand Down Expand Up @@ -306,6 +307,7 @@ def split_and_save_data(
- "multi_label_multi_class_classification_target": must be an array of strings for each case.
- "single_label_binary_classification_target": must be an int for each case.
- "multi_label_binary_classification_target": must be an array of ints (each either 0 or 1) for each case.
- "custom_target": custom target column (experimental).
task_name (str): The name of the task. The splits will be saved in a directory with this name.
output_dir (Path or str, optional): The output directory. Defaults to None (don't save anything).
seed (int, optional): The random seed. Defaults to 42.
Expand Down
2 changes: 1 addition & 1 deletion tests/development-README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,4 +33,4 @@ AutoPEP8 for formatting (this can be done automatically on save, see e.g. https:
# Push release to PyPI
1. Increase version in setup.py, and set below
2. Build: `python -m build`
3. Distribute package to PyPI: `python -m twine upload dist/*0.2.5*`
3. Distribute package to PyPI: `python -m twine upload dist/*0.2.6*`

0 comments on commit 33e7166

Please sign in to comment.