Skip to content

Commit

Permalink
PR comments
Browse files Browse the repository at this point in the history
  • Loading branch information
rklocke committed Dec 20, 2024
1 parent 79c054f commit 4a2f3b3
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 36 deletions.
14 changes: 7 additions & 7 deletions DI-1299/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Compare Somalier outputs in GRCh37 and GRCh38

### Description
## Description
The `compare_b37_and_b38.py` script is used to compare Somalier Predicted_Sex values for samples in GRCh37 and GRCh38 and return any mismatches.

### Inputs
## Inputs
- `--config (str)`: Path to a config file with relevant variables for the assay.

Example config:
Expand Down Expand Up @@ -33,21 +33,21 @@ Example config:
}
```

### Running
## Running
Example command:
```
```bash
python compare_b37_and_b38.py --config TWE_config.json
```

### How it works
## How it works
The script:
- Find all projects using the `search_term` in GRCh38 in DNAnexus.
- Gets the related GRCh37 project for each GRCh38 project with the prefix `002_` and the suffix `_{assay}`.
- Within the GRCh37 and GRCh38 projects, find all files by `filename` and reads them all in with pandas.
- Within the GRCh37 and GRCh38 projects, find all files by `filename` and reads them all in with pandas.
- Identifies any mismatches in `column_to_compare` between the GRCh37 and GRCh38 values for each sample and investigate.
- If mismatches are found, plots the variables in `variables_to_plot` to look at differences in these values for each sample between genome builds.

### Output
## Output
- `{assay}_all_results.tsv`: A TSV with Somalier results for all samples found; headers have '_GRCh37' and '_GRCh38' suffixes to show which genome the result is from.
- `{assay}_all_mismatches.tsv`: A TSV with all rows where there is a mismatch between the `column_to_compare` value in GRCh37 and GRCh38.
- If there are samples with mismatches found, one plot is created for all metrics included in each nested list within `variables_to_plot` for each sample in GRCh37 and GRCh38. Each variable in each nested list is plotted as a scatter plot, one per row, and each plot is saved to HTML.
20 changes: 1 addition & 19 deletions DI-1299/compare_b37_and_b38.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,25 +89,7 @@ def get_config_info(config_dict):
"variables_to_plot",
]

(
assay,
search_term,
number_of_projects,
filename,
column_to_compare,
sample_column,
variables_to_plot,
) = list(map(config_dict.get, keys))

return (
assay,
search_term,
number_of_projects,
filename,
column_to_compare,
sample_column,
variables_to_plot,
)
return list(map(config_dict.get, keys))


def find_projects(search_term, number_of_projects=None):
Expand Down
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,14 @@ Repository for code related to small tasks for supporting the RD service.

| Ticket | Summary |
| --- | --- |
| [EBH-3050] | Determine how many workbooks were released/not released (based on number of variants filtered) per clinical indication since new filtering introduced
| [DI-773] | Gather and plot QC metrics
| [DI-1057] | Validate East GLH RD Test Directory hosted on AWS RDS
| [DI-1094] | CEN/WES GRCh38 egg_sex_check thresholds - August 2024
| [DI-1189] | Create new PanelApp JSON file
| [DI-435] | Find and merge VCFs for creation of a GRCh38 POP AF VCF
| [DI-1294] | Remove any content from the PID fields in workbooks in a given path
| [DI-1299] | Compare values between GRCh37 and GRCh38 for Somalier
| [EBH-3050] | Determine how many workbooks were released/not released (based on number of variants filtered) per clinical indication since new filtering introduced |
| [DI-773] | Gather and plot QC metrics |
| [DI-1057] | Validate East GLH RD Test Directory hosted on AWS RDS |
| [DI-1094] | CEN/WES GRCh38 egg_sex_check thresholds - August 2024 |
| [DI-1189] | Create new PanelApp JSON file |
| [DI-435] | Find and merge VCFs for creation of a GRCh38 POP AF VCF |
| [DI-1294] | Remove any content from the PID fields in workbooks in a given path |
| [DI-1299] | Compare values between GRCh37 and GRCh38 for Somalier |


[EBH-3050]: https://cuhbioinformatics.atlassian.net/browse/EBH-3050
Expand All @@ -19,5 +19,5 @@ Repository for code related to small tasks for supporting the RD service.
[DI-1094]: https://cuhbioinformatics.atlassian.net/browse/DI-1094
[DI-1189]: https://cuhbioinformatics.atlassian.net/browse/DI-1189
[DI-435]: https://cuhbioinformatics.atlassian.net/browse/DI-435
[DI-1294]: https://cuhbioinformatics.atlassian.net/issues/DI-1294
[DI-1299]: https://cuhbioinformatics.atlassian.net/browse/DI-1299
[DI-1294]: https://cuhbioinformatics.atlassian.net/browse/DI-1294
[DI-1299]: https://cuhbioinformatics.atlassian.net/browse/DI-1299

0 comments on commit 4a2f3b3

Please sign in to comment.