Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beta v0.10.0 #184

Merged
merged 84 commits into from
Sep 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
6032359
Use scm for versioning and -v to get version #179
timothymillar May 8, 2024
f3fabf9
Add more optional --report fields #174
timothymillar May 9, 2024
831f108
Improve handeling of INFO and FORMAT fields
timothymillar May 9, 2024
a577a49
Add UAN and MCI INFO fields
timothymillar May 10, 2024
4f1b6b2
Rename fields PHQ and PHPM to SQ and SPM
timothymillar May 10, 2024
dbf1c3e
Test AOPSUM with AOP
timothymillar May 10, 2024
c99d2fa
Test SNVDP optional field
timothymillar May 10, 2024
d08bec0
Add tests for optional ACP fields
timothymillar May 10, 2024
b81daa0
Update CLI help text
timothymillar May 10, 2024
f398441
Update docs
timothymillar May 12, 2024
498f6b2
Update example
timothymillar May 12, 2024
0237a1e
Changelog
timothymillar May 13, 2024
ac71155
Initial pedigree submodule
timothymillar Jul 29, 2022
eb28099
Correct PMF for second gamete of unknown origin
timothymillar Jul 29, 2022
3265e67
Docstrings
timothymillar Jul 29, 2022
7563915
Rename pedigree sampler and add docstring
timothymillar Aug 1, 2022
40f5f24
Ensure zero-count reads removed from likelihood function
timothymillar Aug 1, 2022
cb84a02
Introduce H-K lambda parameter for diploid gametes
timothymillar Oct 31, 2022
9c53f8c
Tests
timothymillar Oct 31, 2022
9f11db8
Tests for log_gamete_pmf
timothymillar Oct 31, 2022
1a96491
Test second_gamete_log_pmf
timothymillar Oct 31, 2022
76aeae0
Reorder test perms
timothymillar Nov 1, 2022
cf2d5d3
Initial PedigreeCallingMCMC class
timothymillar Nov 1, 2022
d6b9a72
Initial call-pedigree CLI
timothymillar Nov 6, 2022
8d37298
Report pedigree posterior error PEDERR
timothymillar Nov 7, 2022
b608eec
Test pedigree validation
timothymillar Nov 7, 2022
33f3e78
Improve ploidy and inbreeding handeling with pedigree
timothymillar Nov 8, 2022
2341e8f
Note to handle read io for dummy samples
timothymillar Nov 8, 2022
38d7104
Seperate out mh-probabilities function
timothymillar Nov 8, 2022
fe07831
Simplify use of lambda in gamete PMF
timothymillar Nov 8, 2022
65a0366
Add gamete_allele_log_pmf
timothymillar Nov 8, 2022
c462722
Fix pedigree MH step proposal ratios
timothymillar Nov 15, 2022
d6802a6
WIP pedigree gibbs step
timothymillar Nov 15, 2022
66f3a92
Fix gibbs prior
timothymillar Jun 14, 2023
8620ad9
fixup
timothymillar Jun 19, 2023
cf17e92
assume unknown gametes are outcrossed
timothymillar Jun 19, 2023
c44dcd6
Add Gibbs option to pedigree MCMC
timothymillar Jun 20, 2023
0812217
Fix tests for pedigree prior
timothymillar Jun 20, 2023
adf7caa
Update call_pedigree for arg changes in main
timothymillar Nov 3, 2023
95a1f05
Handle samples with no alignmentfiles
timothymillar Nov 3, 2023
c7b73ec
Fix whitespace
timothymillar Nov 6, 2023
fd5bc53
Fix bug in log_unknown_const_prior
timothymillar Nov 6, 2023
000f8ab
Correct markov_blanket_log_allele_probability
timothymillar Nov 6, 2023
30cac1c
Improve testing of pedigree prior
timothymillar Nov 8, 2023
2127cd0
Add test for gamete_const_log_pmf
timothymillar Nov 8, 2023
526dbd0
Handel edge cases
timothymillar Nov 8, 2023
65e8735
Document details of trio_allele_log_pmf
timothymillar Nov 9, 2023
bf83592
Add more pedigree trantion probabilities functions
timothymillar Nov 9, 2023
d101820
Test for PedigreeCallingMCMC
timothymillar Nov 10, 2023
1b64cc3
Allow prior on frequencies in pedigree calling
timothymillar Nov 15, 2023
70630e4
Clean up call application handeling of prior freqs
timothymillar Nov 15, 2023
7c87670
Pass prior freqs to pedigree calling application
timothymillar Nov 15, 2023
114e10d
Tests for PedigreeAllelesMultiTrace
timothymillar Nov 15, 2023
88a3578
Remove unsupported inbreeding argument from call-pedigree
timothymillar Nov 15, 2023
aec9b7d
Add experimental warning to call-pedigree
timothymillar Nov 15, 2023
e604042
Tests for mchap call-pedigree cli
timothymillar Nov 15, 2023
8632fca
Use sample_children matrix
timothymillar Nov 24, 2023
e660ac2
Cache smaller combinatorial values
timothymillar Nov 29, 2023
f5b1895
Reuse dosage arrays in pedigree mcmc
timothymillar Nov 30, 2023
e526784
Update pedigree prio docstrings and tests
timothymillar Nov 30, 2023
17dd87b
Use a scratch array for dosage allele freqs
timothymillar Nov 30, 2023
7d83fa0
Initial implimentation of parental allele swap step
timothymillar Dec 11, 2023
f81f3e1
Fixup after rebase on 0.9.1
timothymillar Dec 12, 2023
3c3426c
Bump version in test VCFs
timothymillar Mar 13, 2024
175271b
Change default for --gamete-error to 0.01
timothymillar Mar 13, 2024
eee6b03
Bump version in test VCFs
timothymillar Mar 14, 2024
ad9521d
Add draft call-pedigree tutorial
timothymillar Mar 14, 2024
c045291
Update setup.py
rfinkers Apr 4, 2024
66aacb0
Add call-pedigree branch to CI testing
timothymillar Apr 4, 2024
a206636
Force formating with black
timothymillar Apr 4, 2024
6c38c5a
Rebase call-pedigree on master
timothymillar Jul 10, 2024
719498f
Test new optional fields for call-pedigree
timothymillar Jul 10, 2024
d32a90c
Minor fixes for numpy2
timothymillar Jul 10, 2024
4de1ca0
Pin numpy to 1.x https://github.com/numpy/numpy/issues/26898
timothymillar Jul 10, 2024
32e4fe9
Add atomize application
timothymillar May 13, 2024
8c42b7d
Tests for atomize tool
timothymillar May 13, 2024
0e13f9f
Add INFO AC, ACP and DP fields to atomize
timothymillar Jun 14, 2024
8489149
Update tests for atomize
timothymillar Jun 16, 2024
4fcdbd5
Additional tests for atomize
timothymillar Jul 10, 2024
1847e90
Update examples
timothymillar Sep 10, 2024
a7b57b9
Mark atomize as experimental
timothymillar Sep 10, 2024
e71a094
Update changelog
timothymillar Sep 10, 2024
e8839ea
Run pre-commit on cli.py
timothymillar Sep 10, 2024
4abb69c
Flag experimental tools in readme
timothymillar Sep 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 4 additions & 5 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,17 @@ name: Python package

on:
push:
branches: [ master ]
branches: [ master, "call-pedigree"]
pull_request:
branches: [ master ]
branches: [ master, "call-pedigree"]

jobs:
build:

runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11"]
python-version: ["3.10", "3.11"]

steps:
- uses: actions/checkout@v2
Expand All @@ -32,8 +32,7 @@ jobs:
uses: pre-commit/[email protected]
- name: Build and install mchap
run: |
python setup.py sdist
pip install dist/mchap-*.tar.gz
pip install .
- name: Test with pytest (bounds checked)
env:
NUMBA_BOUNDSCHECK: 1
Expand Down
18 changes: 18 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,24 @@
## Unreleased


## Beta v0.10.0

New Features:
- New experimental `atomize` tool for splitting haplotypes into basis SNVs #72.
- New experimental `call-pedigree` tool fo pedigree informed genotype calling.
- Optionally specify just the `INFO` or `FORMAT` variant of a optional VCF field.
- Use `setuptools_scm` for versioning #179.

VCF Changes:
- Renamed `PHQ` and `PHPM` to `SQ` and `SPM` for clarity.
- Added `INFO/UAN` field for number of unique alleles called #174.
- Added `INFO/MCI` field for proportion of sample with Markov Chain incongruence.
- Added optional fields #174:
* `INFO/AOPSUM` (sum of `FORMAT/AOP`).
* `INFO/ACP` and `FORMAT/ACP`.
* `INFO/SNVDP` and `FORMAT/SNVDP`.


## Beta v0.9.3

Bug Fixes:
Expand Down
14 changes: 13 additions & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,17 @@ frequencies (estimated from the mean of individual frequencies), but no genotype
Example notebook
----------------

An `example notebook`_ demonstrating genotype calling with MCHap in a bi-parental population.
See the `example notebook`_ demonstrating genotype calling with MCHap in a bi-parental population.

Experimental features
---------------------

\:warning: **WARNING: The following tools are highly experimental!!!** :warning:

- ``mchap call-pedigree``: for pedigree informed genotype calling.
- ``mchap atomize``: for converting micro-haplotype calls to phased sets of SNVs.

See the `experimental notebook`_ demonstrating the `call-pedigree` tool as presented at the 2024 `Tools for Polyploids`_ workshop.

Funding
-------
Expand All @@ -80,3 +90,5 @@ The development of MCHap was partially funded by the "Tools for Polyploids" Spec
.. _`MCHap assemble documentation`: docs/assemble.rst
.. _`MCHap call documentation`: docs/call.rst
.. _`example notebook`: docs/example/bi-parental.ipynb
.. _`experimental notebook`: docs/example/bi-parental-pedigree.ipynb
.. _`Tools for Polyploids`: https://www.polyploids.org/
25 changes: 17 additions & 8 deletions cli-assemble-help.txt
Original file line number Diff line number Diff line change
Expand Up @@ -115,11 +115,21 @@ options:
The chosen field determines tha sample ids required in
other input files e.g. the --sample-list argument.
--report [REPORT ...]
Extra fields to report within the output VCF: AFPRIOR
= prior allele frequencies; AFP = posterior mean
allele frequencies; AOP = posterior probability of
allele occurring at any copy number; GP = genotype
posterior probabilities; GL = genotype likelihoods.
Extra fields to report within the output VCF. The
INFO/FORMAT prefix may be omitted to return both
variations of the named field. Options include:
INFO/AFPRIOR = Prior allele frequencies; INFO/ACP =
Posterior allele counts; INFO/AFP = Posterior mean
allele frequencies; INFO/AOP = Posterior probability
of allele occurring across all samples; INFO/AOPSUM =
Posterior estimate of the number of samples containing
an allele; INFO/SNVDP = Read depth at each SNV
position; FORMAT/ACP: Posterior allele counts;
FORMAT/AFP: Posterior mean allele frequencies;
FORMAT/AOP: Posterior probability of allele occurring;
FORMAT/GP: Genotype posterior probabilities;
FORMAT/GL: Genotype likelihoods; FORMAT/SNVDP: Read
depth at each SNV position
--cores CORES Number of cpu cores to use (default = 1).
--mcmc-chains MCMC_CHAINS
Number of independent MCMC chains per assembly
Expand All @@ -133,9 +143,8 @@ options:
--mcmc-seed MCMC_SEED
Random seed for MCMC (default = 42).
--mcmc-chain-incongruence-threshold MCMC_CHAIN_INCONGRUENCE_THRESHOLD
Posterior phenotype probability threshold for
identification of incongruent posterior modes (default
= 0.60).
Posterior probability threshold for identification of
incongruent posterior modes (default = 0.60).
--mcmc-fix-homozygous MCMC_FIX_HOMOZYGOUS
Fix alleles that are homozygous with a probability
greater than or equal to the specified value (default
Expand Down
15 changes: 15 additions & 0 deletions cli-atomize-help.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
usage: Split MCHap haplotype calls into phased blocks of basis SNVs.
[-h] haplotypes

positional arguments:
haplotypes VCF file containing haplotype variants to be atomized. This file
must contain INFO/SNVPOS. The INFO/DP and FORMAT/DP fields will
be calculated from FORMAT/SNVDP if present in the input VCF
file. The INFO/ACP and FORMAT/DS fields will be calculated from
FORMAT/ACP or FORMAT/AFP if either is present in the input VCF
file. Note that the FORMAT/ACP or FORMAT/AFP fields from the
input VCF file will be normalized in the event that they do not
sum to ploidy or one respectively.

options:
-h, --help show this help message and exit
20 changes: 15 additions & 5 deletions cli-call-exact-help.txt
Original file line number Diff line number Diff line change
Expand Up @@ -102,9 +102,19 @@ options:
The chosen field determines tha sample ids required in
other input files e.g. the --sample-list argument.
--report [REPORT ...]
Extra fields to report within the output VCF: AFPRIOR
= prior allele frequencies; AFP = posterior mean
allele frequencies; AOP = posterior probability of
allele occurring at any copy number; GP = genotype
posterior probabilities; GL = genotype likelihoods.
Extra fields to report within the output VCF. The
INFO/FORMAT prefix may be omitted to return both
variations of the named field. Options include:
INFO/AFPRIOR = Prior allele frequencies; INFO/ACP =
Posterior allele counts; INFO/AFP = Posterior mean
allele frequencies; INFO/AOP = Posterior probability
of allele occurring across all samples; INFO/AOPSUM =
Posterior estimate of the number of samples containing
an allele; INFO/SNVDP = Read depth at each SNV
position; FORMAT/ACP: Posterior allele counts;
FORMAT/AFP: Posterior mean allele frequencies;
FORMAT/AOP: Posterior probability of allele occurring;
FORMAT/GP: Genotype posterior probabilities;
FORMAT/GL: Genotype likelihoods; FORMAT/SNVDP: Read
depth at each SNV position
--cores CORES Number of cpu cores to use (default = 1).
25 changes: 17 additions & 8 deletions cli-call-help.txt
Original file line number Diff line number Diff line change
Expand Up @@ -106,11 +106,21 @@ options:
The chosen field determines tha sample ids required in
other input files e.g. the --sample-list argument.
--report [REPORT ...]
Extra fields to report within the output VCF: AFPRIOR
= prior allele frequencies; AFP = posterior mean
allele frequencies; AOP = posterior probability of
allele occurring at any copy number; GP = genotype
posterior probabilities; GL = genotype likelihoods.
Extra fields to report within the output VCF. The
INFO/FORMAT prefix may be omitted to return both
variations of the named field. Options include:
INFO/AFPRIOR = Prior allele frequencies; INFO/ACP =
Posterior allele counts; INFO/AFP = Posterior mean
allele frequencies; INFO/AOP = Posterior probability
of allele occurring across all samples; INFO/AOPSUM =
Posterior estimate of the number of samples containing
an allele; INFO/SNVDP = Read depth at each SNV
position; FORMAT/ACP: Posterior allele counts;
FORMAT/AFP: Posterior mean allele frequencies;
FORMAT/AOP: Posterior probability of allele occurring;
FORMAT/GP: Genotype posterior probabilities;
FORMAT/GL: Genotype likelihoods; FORMAT/SNVDP: Read
depth at each SNV position
--cores CORES Number of cpu cores to use (default = 1).
--mcmc-chains MCMC_CHAINS
Number of independent MCMC chains per assembly
Expand All @@ -124,6 +134,5 @@ options:
--mcmc-seed MCMC_SEED
Random seed for MCMC (default = 42).
--mcmc-chain-incongruence-threshold MCMC_CHAIN_INCONGRUENCE_THRESHOLD
Posterior phenotype probability threshold for
identification of incongruent posterior modes (default
= 0.60).
Posterior probability threshold for identification of
incongruent posterior modes (default = 0.60).
Loading
Loading