Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Genotype Harmonizer] Bug Report: Confusing Problems on SNPs excluding #667

Open
jacklin9703 opened this issue Nov 22, 2023 · 0 comments
Open

Comments

@jacklin9703
Copy link

Hi! I'm applying Genotype Harmonizer to several PLINK BED datasets. It's an excellent tool, despite I feel confused with outputs derived from GH. It seems that some SNPs are valid but excluded by GH.
Here is my command:
java -Xmx4g -jar GenotypeHarmonizer.jar --input mydata.b37 --inputType PLINK_BED --ref ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz --refType VCF --chrFilter 22 --update-reference-allele --debug --outputType PLINK_BED --output mydata.b37.chr22.GH

My input PLINK *.bim file looks like this: (According to PLINK2.0 documentation, column 5 refers to the ALT allele and column 6 REF allele)

22	rs5994159	0	16848573	T	C
22	rs9606483	0	16852708	C	A
22	rs10048902	0	16854058	G	C

Here is the log file along with my GH output:

chr	pos	id    	alleles    	action    	message
22	16848573	rs5994159	T\C	Excluded	Found variant with same ID but alleles are not comparable
22	16852708	rs9606483	C\A	Excluded	Found variant with same ID but alleles are not comparable
22	16854058	rs10048902	G\C	Excluded	Found variant with same ID but alleles are not comparable

GH reported these trouble SNPs as "Found variant with the same ID but alleles are not comparable". However, I've checked out the excluded SNPs above within the 1000G reference VCF file, and made sure they're indeed aligned with the reference panel:

#CHROM	POS	ID	REF	ALT	QUAL	FILTER
22	16848573	rs5994159	C	G,T	100	PASS
22	16852708	rs9606483	A	C,T	100	PASS
22	16854058	rs10048902	C	G,T	100	PASS

It seems that GH didn't recognize the REF/ALT alleles correctly from PLINK bim files. In practice, I wish to reserve these SNPs for downstream analysis. Any help is appreciated.

Best Regards,
Jack Lin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant