Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing data for tetraploid multiparenting population #25

Open
PaulaEB opened this issue Dec 1, 2022 · 6 comments
Open

Missing data for tetraploid multiparenting population #25

PaulaEB opened this issue Dec 1, 2022 · 6 comments
Labels

Comments

@PaulaEB
Copy link

PaulaEB commented Dec 1, 2022

Hello David,
Thanks for developing updog!
My project goal is identify QTLs for pest resistance, so we have a multiparenting population similar to a NAM pop (4 pollen recipients and a pollen donor) so we have four half-sib families. We are treating each family separated but I'd like to know your thoughts about if it's possible to do use all the population for the genotype calling.

And a last question would be about the missing data for de geno field. In the multidog$inddf output we don't see missing data, is this normal?

Thank you very much!
Paula E

@dcgerard
Copy link
Owner

dcgerard commented Dec 9, 2022

Hey @PaulaEB,

Thanks for trying out {updog}!

I haven't gotten around to allowing for multiparent populations yet. Some things you can look into:

  1. Are the genotypes estimated to be the same for the same parent for runs on different populations?
  2. Are the sequencing error rates, allele biases, and overdispersions estimated to be about the same at the same SNP?

If the answer is yes to both, then combining the different populations would not help much. Estimating the parent genotypes and those parameters is the benefit of using a larger sample size.

As for the missing data, if an individual has NA listed, then it should provide NA in the output. If it has 0 listed for the read-depth, then {updog} will impute the genotype from the prior distribution (which is the best you can do if you aren't use information from other SNPs). E.g. consider:

library(updog)
refvec <- c(3, 4, 0, 8, 3)
sizevec <- c(10, 10, 0, 10, 10)
fout <- flexdog(refvec = refvec, sizevec = sizevec, ploidy = 4, )
fout$geno
plot(fout$postmat[3, ], fout$gene_dist)
abline(0, 1)

refvec <- c(3, 4, NA, 8, 3)
sizevec <- c(10, 10, NA, 10, 10)
fout <- flexdog(refvec = refvec, sizevec = sizevec, ploidy = 4, )
fout$geno

Best,
David

@PaulaEB
Copy link
Author

PaulaEB commented Nov 3, 2023

Hello @dcgerard, many thanks for your clarification! I am going back to this data, but I would like to keep the missing (0) missing as GATK mark the missing values in DP as DP=0 (https://gatk.broadinstitute.org/hc/en-us/articles/6012243429531-GenotypeGVCFs-and-the-death-of-the-dot)

Is it possible to change that from updog or should I do that in the VCF with other tool?

Thanks again
Paula

@dcgerard
Copy link
Owner

dcgerard commented Nov 6, 2023

Yey @PaulaEB,

You can do that in R really easily.

E.g., suppose this is the matrix containing the read-depths:

sizemat <- matrix(c(0, 1, 2, 1,
                    1, 0, 1, 1,
                    1, 2, 1, 0), ncol = 4, byrow = TRUE)

Then we can convert those 0's to NA's via:

sizemat[sizemat == 0] <- NA

Cheers,
David

@TrineAalborg
Copy link

Hello David,

Has updog been updated to support multiparent populations? The manual states that it now supports more general populations, but I cannot find arguments for other population structures that S1 and bi-parental F1 for specification?

Thank you,
Trine

@dcgerard
Copy link
Owner

Hey @TrineAalborg,

No, not yet. I'll get around to it when I find an interested graduate student 😂. You can either

  1. Fit each family separately or
  2. Use model = "norm".

If you fit each family separately, you can check if the od, seq, and bias parameters are all estimated about the same. If they are, then there isn't really a benefit of combining the families since a larger sample size mostly gets you better estimates of those quantities.

If they are pretty different, then you can try model = "norm", but this doesn't directly take into account the parent genotype information.

Cheers,
David

@TrineAalborg
Copy link

Okay thanks, I'll look into those options :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants