Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors extracting reads covering heterozygous SNPs? #7

Open
yjqiu opened this issue May 15, 2017 · 1 comment
Open

Errors extracting reads covering heterozygous SNPs? #7

yjqiu opened this issue May 15, 2017 · 1 comment

Comments

@yjqiu
Copy link

yjqiu commented May 15, 2017

I am using extracHAIRS to get reads covering heterozygous SNPs. However, the SNPs seems to be ignored when two ends are overlapping. And the quality score for the base is not correct. In the attached example, the first read pairs are not reported even they cover the SNP, and the second read was reported but with the quality score < instead of D.

Commands

extractHAIRS --bam test.bam --VCF test.vcf --singlereads 1

BAM Files

7001113:798:HGCT5BCXY:1:2107:10688:87752 163 chr22 16056034 27 100M = 16056083 148 CACTCAGCCAGTTCACCCCACCCACATTCCACAGGCTGCTTTAGGCTTTAGGACAGTGGCAAACATGGCCTCTGCCATCCCGGTCTGTGAGCGCCCCTTC DDDDDIHIIIIIIIIIIIIIIIIIIIIIEHHIIIIHIIIIIIIIHHIIHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGHIIIH MD:Z:100 PG:Z:MarkDuplicates RG:Z:JY333 NM:i:0 AS:i:100 XS:i:100
7001113:798:HGCT5BCXY:1:2107:10688:87752 83 chr22 16056083 27 99M = 16056034 -148 AGGACAGTGGCAAACATGGCCTCTGCCATCCCGGTCTGTGAGCGCCCCTTCTTACACCAAGGTCAGTTGCTAACCAATGAGCTGCTGGGGGCCTCCTTC IIIIIIIIIIIIIIIIHIIIGIIIIHIIIHIIIIIIHIIIIGIIIIIIIGIIIHE1IIIIIIIIIIIIIIIIIIIIIIHIIIIIIIIIGIHEFD<0DD< MD:Z:99 PG:Z:MarkDuplicates RG:Z:JY333 NM:i:0 AS:i:99 XS:i:94
7001113:798:HGCT5BCXY:2:1213:5687:10232 163 chr22 16056108 27 98M = 16056333 325 CCATCCCGGTCTGTGAGCGCCCCTTCTTACACCAAGGTCAGTTGCTAACCAATGAGCTGCTGGGGGCCTCCTTCTCCCACTCCCACTGCACTGTGTCC 0<D@DEEHHCDCG<CGFHD<HHHHIIIHEHHIEH@DCHEEHHIIIIIIIE?HHIIFHIEHFH?GHDHEGHHIIHHIIIEHHHHG?HGHHHIIIEHHHH MD:Z:98 PG:Z:MarkDuplicates RG:Z:JY333 NM:i:0AS:i:98 XS:i:93

VCF File

chr22 16056126 . G A 198.77 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=1.519;DP=23;Dels=0.00;FS=1.848;HaplotypeScore=0.6651;MLEAC=1;MLEAF=0.500;MQ=32.97;MQ0=3;MQRankSum=-0.228;QD=8.64;ReadPosRankSum=-0.076;SOR=0.605;VQSLOD=1.97;culprit=FS GT:AD:DP:GQ:PL 0/1:13,10:23:99:227,0,219

@vibansal
Copy link
Owner

For the second read, the quality value is '<' (27) instead of 'D' (35) since extractHAIRS outputs the minimum of the read mapping quality and base quality as the quality score. This behavior can be changed easily if needed.

The failure to report the first read is a bug. I have pushed a change to the code that should fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants