Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

probability distribution for the states #17

Open
colinveal opened this issue Mar 26, 2020 · 4 comments
Open

probability distribution for the states #17

colinveal opened this issue Mar 26, 2020 · 4 comments
Labels
story-telling Discuss on a new feature

Comments

@colinveal
Copy link
Collaborator

we need to think about how we model the expected distribution for each state:

i,e normal read depth, can be model as normal, poisson, negative binomial. with high enough read depth it approximates to normal, except can't take negative values, previously we were performing a negative binomial transformation to give a normal distribution. that was so we could use the distribution of the difference between 2 normal distributions. However we could model directly as negative binomial.

duplications and a single copy deletion will be similar but have different parameters.

2 copy deletions will be near uniform 0,

Alternatively we could use the distribution for the normal copy number as the only distribution and base the probabilities on distance from mean of that, i.e > mean and low p = high p of duplication.

@pockerman
Copy link
Owner

you mean in terms of the observations if I understand correctly....in other words how to model the emissions probabilities for each state?

@pockerman
Copy link
Owner

how much astray you think the following approach sounds: cluster the observations into as many clusters as needed states. Then for each cluster fit a distribution. Use that probability as the probability distribution for each state in the HMM?

@pockerman pockerman added the story-telling Discuss on a new feature label Mar 27, 2020
@colinveal
Copy link
Collaborator Author

Sure any starting point to get the model will be good, we can always change the distributions.
We could keep it even simpler and base it on the normal copy number as the only distribution, and a window is assessed against that, i.e probably normal, probability above normal, probability below normal and then combine the 2 sets of probabilities to calculate the likelihood of each state, i.e sig below and sig below = 0.90 deletion, 0.05 TUF, 0.04 normal, 0.01 dup, normal normal = 0.75 normal, 0.10 deletion, 0.1 dup, 0.05 TUF etc

@pockerman
Copy link
Owner

ok cool I will start looking into the clustering approach and see what we get...I will add sklearn into our requirements to use their clustering algos although we may have to implement others ourselves

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
story-telling Discuss on a new feature
Projects
None yet
Development

No branches or pull requests

2 participants