You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I'm applying Genotype Harmonizer to several PLINK BED datasets. It's an excellent tool, despite I feel confused with outputs derived from GH. It seems that some SNPs are valid but excluded by GH.
Here is my command: java -Xmx4g -jar GenotypeHarmonizer.jar --input mydata.b37 --inputType PLINK_BED --ref ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz --refType VCF --chrFilter 22 --update-reference-allele --debug --outputType PLINK_BED --output mydata.b37.chr22.GH
My input PLINK *.bim file looks like this: (According to PLINK2.0 documentation, column 5 refers to the ALT allele and column 6 REF allele)
22 rs5994159 0 16848573 T C
22 rs9606483 0 16852708 C A
22 rs10048902 0 16854058 G C
Here is the log file along with my GH output:
chr pos id alleles action message
22 16848573 rs5994159 T\C Excluded Found variant with same ID but alleles are not comparable
22 16852708 rs9606483 C\A Excluded Found variant with same ID but alleles are not comparable
22 16854058 rs10048902 G\C Excluded Found variant with same ID but alleles are not comparable
GH reported these trouble SNPs as "Found variant with the same ID but alleles are not comparable". However, I've checked out the excluded SNPs above within the 1000G reference VCF file, and made sure they're indeed aligned with the reference panel:
#CHROM POS ID REF ALT QUAL FILTER
22 16848573 rs5994159 C G,T 100 PASS
22 16852708 rs9606483 A C,T 100 PASS
22 16854058 rs10048902 C G,T 100 PASS
It seems that GH didn't recognize the REF/ALT alleles correctly from PLINK bim files. In practice, I wish to reserve these SNPs for downstream analysis. Any help is appreciated.
Best Regards,
Jack Lin
The text was updated successfully, but these errors were encountered:
Hi! I'm applying Genotype Harmonizer to several PLINK BED datasets. It's an excellent tool, despite I feel confused with outputs derived from GH. It seems that some SNPs are valid but excluded by GH.
Here is my command:
java -Xmx4g -jar GenotypeHarmonizer.jar --input mydata.b37 --inputType PLINK_BED --ref ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz --refType VCF --chrFilter 22 --update-reference-allele --debug --outputType PLINK_BED --output mydata.b37.chr22.GH
My input PLINK *.bim file looks like this: (According to PLINK2.0 documentation, column 5 refers to the ALT allele and column 6 REF allele)
Here is the log file along with my GH output:
GH reported these trouble SNPs as "Found variant with the same ID but alleles are not comparable". However, I've checked out the excluded SNPs above within the 1000G reference VCF file, and made sure they're indeed aligned with the reference panel:
It seems that GH didn't recognize the REF/ALT alleles correctly from PLINK bim files. In practice, I wish to reserve these SNPs for downstream analysis. Any help is appreciated.
Best Regards,
Jack Lin
The text was updated successfully, but these errors were encountered: