-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get the list of ancestry sites? #143
Comments
Hi, that's not really possible. I guess you could recreate the simple filters used in somalier and then iterate over the sites file. |
Ok, thanks for the reply. I am new to genetics and am finding it difficult to get lists of probes that are related to certain categories (e.g., ancestry, cancer, etc.). So you are saying even if I forked your repo, edited the code, and recompiled it, there is nowhere in your codebase where these probe names/numbers or some IDs could be extracted? I was assuming that there was a comparison somewhere, and at each site, upon a "match" (or filter pass), there would be some ID I would have access to that I could save as a text file somewhere. I could edit the code and rebuild the Docker image, but I just wanted you to point me to which piece of code would have that info. If my thinking is wrong on this, then no worries, but I thought this shouldn’t be too hard in theory since I would assume the probes that pass the filter must be somewhere in the code. |
ah, ok. so if you're willing to do that, you can look here: if you get to line 158, the variant passed filters and you can write/save |
note that these sites won't be the usual "ancestry-informative" sites. they are just sites likely to be assayed in exome (and genome) that are relatively common in the population. |
ok thanks! Both comments are really helpful. I think I can dig into if I need to, but might find another way to get the the list of site. I am using the illumina Infinium Global Screening Array-24 v3.0 BeadChip and they provide all these categories of different markers like "Ancestry-informative markers" and "Somatic mutations in cancer" but nowhere can I find the relevant lists/data of these haha seems so strange to me. I would have thought it to be standard practice to provide data on this but like I said this is all new to me. if you have any thought on this I would appreciate it but no worries considering that is off topic here. thanks for all your help :) |
for ancestry, maybe this: https://pmc.ncbi.nlm.nih.gov/articles/PMC3073397/#SD1 and for somatic mutations, check cosmic. |
wow super kind of you, thanks Brent! have a nice data :) |
Hey can you help me find a reference list of the sites you compare against? or the sites identified by somalier. for example when i run the docker container i get the message of
[somalier] found 2937 sites
and I just want to be able to pull these from my VCF file and do some custom analysis
The text was updated successfully, but these errors were encountered: