Missing data for tetraploid multiparenting population #25

PaulaEB · 2022-12-01T15:46:47Z

Hello David,
Thanks for developing updog!
My project goal is identify QTLs for pest resistance, so we have a multiparenting population similar to a NAM pop (4 pollen recipients and a pollen donor) so we have four half-sib families. We are treating each family separated but I'd like to know your thoughts about if it's possible to do use all the population for the genotype calling.

And a last question would be about the missing data for de geno field. In the multidog$inddf output we don't see missing data, is this normal?

Thank you very much!
Paula E

dcgerard · 2022-12-09T15:59:39Z

Hey @PaulaEB,

Thanks for trying out {updog}!

I haven't gotten around to allowing for multiparent populations yet. Some things you can look into:

Are the genotypes estimated to be the same for the same parent for runs on different populations?
Are the sequencing error rates, allele biases, and overdispersions estimated to be about the same at the same SNP?

If the answer is yes to both, then combining the different populations would not help much. Estimating the parent genotypes and those parameters is the benefit of using a larger sample size.

As for the missing data, if an individual has NA listed, then it should provide NA in the output. If it has 0 listed for the read-depth, then {updog} will impute the genotype from the prior distribution (which is the best you can do if you aren't use information from other SNPs). E.g. consider:

library(updog)
refvec <- c(3, 4, 0, 8, 3)
sizevec <- c(10, 10, 0, 10, 10)
fout <- flexdog(refvec = refvec, sizevec = sizevec, ploidy = 4, )
fout$geno
plot(fout$postmat[3, ], fout$gene_dist)
abline(0, 1)

refvec <- c(3, 4, NA, 8, 3)
sizevec <- c(10, 10, NA, 10, 10)
fout <- flexdog(refvec = refvec, sizevec = sizevec, ploidy = 4, )
fout$geno

Best,
David

PaulaEB · 2023-11-03T16:33:54Z

Hello @dcgerard, many thanks for your clarification! I am going back to this data, but I would like to keep the missing (0) missing as GATK mark the missing values in DP as DP=0 (https://gatk.broadinstitute.org/hc/en-us/articles/6012243429531-GenotypeGVCFs-and-the-death-of-the-dot)

Is it possible to change that from updog or should I do that in the VCF with other tool?

Thanks again
Paula

dcgerard · 2023-11-06T20:14:38Z

Yey @PaulaEB,

You can do that in R really easily.

E.g., suppose this is the matrix containing the read-depths:

sizemat <- matrix(c(0, 1, 2, 1,
                    1, 0, 1, 1,
                    1, 2, 1, 0), ncol = 4, byrow = TRUE)

Then we can convert those 0's to NA's via:

sizemat[sizemat == 0] <- NA

Cheers,
David

TrineAalborg · 2024-11-21T09:20:16Z

Hello David,

Has updog been updated to support multiparent populations? The manual states that it now supports more general populations, but I cannot find arguments for other population structures that S1 and bi-parental F1 for specification?

Thank you,
Trine

dcgerard · 2024-11-21T14:14:33Z

Hey @TrineAalborg,

No, not yet. I'll get around to it when I find an interested graduate student 😂. You can either

Fit each family separately or
Use model = "norm".

If you fit each family separately, you can check if the od, seq, and bias parameters are all estimated about the same. If they are, then there isn't really a benefit of combining the families since a larger sample size mostly gets you better estimates of those quantities.

If they are pretty different, then you can try model = "norm", but this doesn't directly take into account the parent genotype information.

Cheers,
David

TrineAalborg · 2024-11-27T07:21:08Z

Okay thanks, I'll look into those options :)

dcgerard added the question label Dec 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing data for tetraploid multiparenting population #25

Missing data for tetraploid multiparenting population #25

PaulaEB commented Dec 1, 2022

dcgerard commented Dec 9, 2022

PaulaEB commented Nov 3, 2023

dcgerard commented Nov 6, 2023

TrineAalborg commented Nov 21, 2024

dcgerard commented Nov 21, 2024

TrineAalborg commented Nov 27, 2024

Missing data for tetraploid multiparenting population #25

Missing data for tetraploid multiparenting population #25

Comments

PaulaEB commented Dec 1, 2022

dcgerard commented Dec 9, 2022

PaulaEB commented Nov 3, 2023

dcgerard commented Nov 6, 2023

TrineAalborg commented Nov 21, 2024

dcgerard commented Nov 21, 2024

TrineAalborg commented Nov 27, 2024