Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get the list of ancestry sites? #143

Open
PhillipMaire opened this issue Dec 19, 2024 · 7 comments
Open

get the list of ancestry sites? #143

PhillipMaire opened this issue Dec 19, 2024 · 7 comments

Comments

@PhillipMaire
Copy link

PhillipMaire commented Dec 19, 2024

Hey can you help me find a reference list of the sites you compare against? or the sites identified by somalier. for example when i run the docker container i get the message of
[somalier] found 2937 sites
and I just want to be able to pull these from my VCF file and do some custom analysis

@brentp
Copy link
Owner

brentp commented Dec 19, 2024

Hi, that's not really possible. I guess you could recreate the simple filters used in somalier and then iterate over the sites file.

@PhillipMaire
Copy link
Author

Ok, thanks for the reply. I am new to genetics and am finding it difficult to get lists of probes that are related to certain categories (e.g., ancestry, cancer, etc.). So you are saying even if I forked your repo, edited the code, and recompiled it, there is nowhere in your codebase where these probe names/numbers or some IDs could be extracted? I was assuming that there was a comparison somewhere, and at each site, upon a "match" (or filter pass), there would be some ID I would have access to that I could save as a text file somewhere.

I could edit the code and rebuild the Docker image, but I just wanted you to point me to which piece of code would have that info. If my thinking is wrong on this, then no worries, but I thought this shouldn’t be too hard in theory since I would assume the probes that pass the filter must be somewhere in the code.

@brentp
Copy link
Owner

brentp commented Dec 19, 2024

ah, ok. so if you're willing to do that, you can look here:
https://github.com/brentp/somalier/blob/master/src/somalier.nim#L109-L159

if you get to line 158, the variant passed filters and you can write/save v or site
let me know if any other questions, might be other details i can point you to.

@brentp
Copy link
Owner

brentp commented Dec 19, 2024

note that these sites won't be the usual "ancestry-informative" sites. they are just sites likely to be assayed in exome (and genome) that are relatively common in the population.

@PhillipMaire
Copy link
Author

ok thanks! Both comments are really helpful. I think I can dig into if I need to, but might find another way to get the the list of site.

I am using the illumina Infinium Global Screening Array-24 v3.0 BeadChip and they provide all these categories of different markers like "Ancestry-informative markers" and "Somatic mutations in cancer" but nowhere can I find the relevant lists/data of these haha seems so strange to me. I would have thought it to be standard practice to provide data on this but like I said this is all new to me. if you have any thought on this I would appreciate it but no worries considering that is off topic here.

thanks for all your help :)

@brentp
Copy link
Owner

brentp commented Dec 19, 2024

for ancestry, maybe this: https://pmc.ncbi.nlm.nih.gov/articles/PMC3073397/#SD1

and for somatic mutations, check cosmic.

@PhillipMaire
Copy link
Author

wow super kind of you, thanks Brent! have a nice data :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants