probability distribution for the states #17

colinveal · 2020-03-26T13:06:34Z

we need to think about how we model the expected distribution for each state:

i,e normal read depth, can be model as normal, poisson, negative binomial. with high enough read depth it approximates to normal, except can't take negative values, previously we were performing a negative binomial transformation to give a normal distribution. that was so we could use the distribution of the difference between 2 normal distributions. However we could model directly as negative binomial.

duplications and a single copy deletion will be similar but have different parameters.

2 copy deletions will be near uniform 0,

Alternatively we could use the distribution for the normal copy number as the only distribution and base the probabilities on distance from mean of that, i.e > mean and low p = high p of duplication.

pockerman · 2020-03-26T17:41:08Z

you mean in terms of the observations if I understand correctly....in other words how to model the emissions probabilities for each state?

pockerman · 2020-03-27T10:31:16Z

how much astray you think the following approach sounds: cluster the observations into as many clusters as needed states. Then for each cluster fit a distribution. Use that probability as the probability distribution for each state in the HMM?

colinveal · 2020-03-27T12:38:14Z

Sure any starting point to get the model will be good, we can always change the distributions.
We could keep it even simpler and base it on the normal copy number as the only distribution, and a window is assessed against that, i.e probably normal, probability above normal, probability below normal and then combine the 2 sets of probabilities to calculate the likelihood of each state, i.e sig below and sig below = 0.90 deletion, 0.05 TUF, 0.04 normal, 0.01 dup, normal normal = 0.75 normal, 0.10 deletion, 0.1 dup, 0.05 TUF etc

pockerman · 2020-03-27T12:57:58Z

ok cool I will start looking into the clustering approach and see what we get...I will add sklearn into our requirements to use their clustering algos although we may have to implement others ourselves

pockerman added the story-telling Discuss on a new feature label Mar 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

probability distribution for the states #17

probability distribution for the states #17

colinveal commented Mar 26, 2020

pockerman commented Mar 26, 2020

pockerman commented Mar 27, 2020

colinveal commented Mar 27, 2020

pockerman commented Mar 27, 2020

probability distribution for the states #17

probability distribution for the states #17

Comments

colinveal commented Mar 26, 2020

pockerman commented Mar 26, 2020

pockerman commented Mar 27, 2020

colinveal commented Mar 27, 2020

pockerman commented Mar 27, 2020