fail to include germline variant information #7

alex84425 · 2019-09-08T11:53:55Z

Hi, I try to generate noepeptide using this pipeline recently but encounter some difficulties.

firstly, I successfully run this pipeline with both somatic and germline variant which called by varscan2.

The command is showed below:

# merge varscan snp and indel variant, and I am not sure whether it is the correct to merge indel variant
cat ../../varscan_result/varscan_file_somatic_fetch_exon/001.vcf.*.Somatic.hc  |  grep -v ^\#  > 001.vcf.snp_indel.Somatic.hc
cat ../../varscan_result/varscan_file_Germline/001T.*.vcf > 001.vcf.snp_indel.Germline.hc
neoepiscope swap -i 001.vcf.snp_indel.Somatic.hc -o 001.vcf.snp_indel.Somatic.hc.sw
neoepiscope merge -g 001.vcf.snp_indel.Germline.hc  -s 001.vcf.snp_indel.Somatic.hc.sw  -o 001.merge.vcf


#hapcut2 need to sort variant
cat 001.merge.vcf   | sort -k1,1V -k2,2n  > 001.merge.sorted.vcf
mv 001.merge.sorted.vcf 001.merge.vcf

# phasing variant, and I am not sure that the difference between illumina read and 10X genomic, but I can only run correctly with the latter.
/home/alex2/git_file/HapCUT2/build/extractHAIRS  --bam ../../GATK_Recalibrator/001T.recal.bam  --VCF 001.merge.vcf  --out  001.merge.vcf.unlinked --10X 1 --indels 1
python3.6 /home/alex2/git_file/HapCUT2/utilities/LinkFragments.py  --bam ../../GATK_Recalibrator/001T.recal.bam   --VCF 001.merge.vcf --fragments 001.merge.vcf.unlinked  --out 001.merge.vcf.linked
/home/alex2/git_file/HapCUT2/build/HAPCUT2 --nf 1 --fragments 001.merge.vcf.linked  --VCF 001.merge.vcf  --output 001.merge.vcf.hp
# can not include germline info
neoepiscope prep -v 001.merge.vcf -c 001.merge.vcf.hp -o 001.merge.vcf.adhp
neoepiscope call -b hg19 -c 001.merge.vcf.adhp  -o 001.merge.vcf.out  -p netMHCpan 4 affinity -a HLA-A*24:02,HLA-A*24:02,HLA-B*54:01,HLA-B*40:02,HLA-C*03:04,HLA-C*01:02

However, I failed to run with somatic and germline variant which called by GATK pipeline. It shows some error in "neoepiscope call " step.

Traceback (most recent call last):
  File "/home/alex2/anaconda3/envs/exome-seq/bin/neoepiscope", line 10, in <module>
    sys.exit(main())
  File "/home/alex2/anaconda3/envs/exome-seq/lib/python3.6/site-packages/neoepiscope/__init__.py", line 765, in main
    protein_fasta=args.fasta,
  File "/home/alex2/anaconda3/envs/exome-seq/lib/python3.6/site-packages/neoepiscope/transcript.py", line 3149, in get_peptides_from_transcripts
    return_protein=True,
  File "/home/alex2/anaconda3/envs/exome-seq/lib/python3.6/site-packages/neoepiscope/transcript.py", line 2360, in neopeptides
    protein = seq_to_peptide(sequence[start_codon[0] :], reverse_strand=False)
  File "/home/alex2/anaconda3/envs/exome-seq/lib/python3.6/site-packages/neoepiscope/transcript.py", line 265, in seq_to_peptide
    codon = _codon_table[seq[i : i + 3]]
KeyError: 'G*A'

It seems that some wear character "*" appear in sequence, but I have no idea to solve this.
I am willing to share my .bam file and vcf to you.

data link:
https://drive.google.com/drive/folders/1O6PdYwImV0fEXDHPOemixV6cbVPhxatL?usp=sharing

The text was updated successfully, but these errors were encountered:

maryawood · 2019-09-09T15:54:44Z

Hello, I'm sorry that you are having trouble! I have requested access to your google drive folder so I can take a look at the data and try to see what the problem is.

alex84425 · 2019-09-09T15:57:35Z

Hello, I got your requested access mail, and I press the button. Maybe you can try again. Mary Wood <[email protected]> 於 2019年9月9日週一下午11:54寫道：

…

Hello, I'm sorry that you are having trouble! I have requested access to your google drive folder so I can take a look at the data and try to see what the problem is. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#7?email_source=notifications&email_token=AGH424Q7S2WTHUVQDPMASWLQIZWUNA5CNFSM4IUTFU6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6IDRTI#issuecomment-529545421>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGH424VV5MZ7K3VEW5NXCGLQIZWUNANCNFSM4IUTFU6A> .

-- Best, 國立交通大學生資所碩士班學生陸建利 Po-Yuan Chen, Master Student, Institute of Bioinformatics and Systems Biology, National Chiao Tung University

maryawood · 2019-09-09T16:04:21Z

Thank you, I am able to access the folder now! The file names in that directory do not match the ones you use in the commands. Could you tell me which file names in the google drive folder correspond to the file names in your commands?

alex84425 · 2019-09-09T16:15:06Z

Ok, I will try to rerun the command with new file name, so please wait for me. Mary Wood <[email protected]> 於 2019年9月10日週二上午12:04寫道：

…

Thank you, I am able to access the folder now! The file names in that directory do not match the ones you use in the commands. Could you tell me which file names in the google drive folder correspond to the file names in your commands? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#7?email_source=notifications&email_token=AGH424TQBF6SLAMJT3YDODTQIZX2DA5CNFSM4IUTFU6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6IEVUQ#issuecomment-529550034>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGH424TLQOIM7BZDVJQDK2TQIZX2DANCNFSM4IUTFU6A> .

-- Best, 國立交通大學生資所碩士班學生陸建利 Po-Yuan Chen, Master Student, Institute of Bioinformatics and Systems Biology, National Chiao Tung University

alex84425 · 2019-09-09T17:57:51Z

Well, I think I found the problem after reruning the command. Firstly, I have mixed the Somatic indel_snv called by varscan and germline indel_snv by mutect2 instead of GATK pipeline only I talked in Github. As you know, an error occur. Problem caused by the variants called by mutect2 or HaplotypeCaller. The "ALT" column of vcf file contain the "," symbol, such as "G,GAA" in ALT column. Therefore, I just remove the record which contain this case and keep testing, and it seems works. Sometime the problem will result in an error in HapCUT2 step or "neoepiscope call" but GATK_ReadBackedPhasing. By the way, the command I run store in cmd.sh file. 陸建利 <[email protected]> 於 2019年9月10日週二上午12:14寫道：

…

Ok, I will try to rerun the command with new file name, so please wait for me. Mary Wood ***@***.***> 於 2019年9月10日週二上午12:04寫道： > Thank you, I am able to access the folder now! The file names in that > directory do not match the ones you use in the commands. Could you tell me > which file names in the google drive folder correspond to the file names in > your commands? > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#7?email_source=notifications&email_token=AGH424TQBF6SLAMJT3YDODTQIZX2DA5CNFSM4IUTFU6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6IEVUQ#issuecomment-529550034>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AGH424TLQOIM7BZDVJQDK2TQIZX2DANCNFSM4IUTFU6A> > . > -- Best, 國立交通大學生資所碩士班學生陸建利 Po-Yuan Chen, Master Student, Institute of Bioinformatics and Systems Biology, National Chiao Tung University

-- Best, 國立交通大學生資所碩士班學生陸建利 Po-Yuan Chen, Master Student, Institute of Bioinformatics and Systems Biology, National Chiao Tung University

maryawood · 2019-09-09T21:02:44Z

Which version of neoepiscope did you use for your analysis? The software should be able to handle cases where there are multiple alternate alleles as you described, but I would like to test out the commands using the same version as you so I can better see what happened

alex84425 · 2019-09-10T01:44:55Z

Well, I think that the version number is "0.3.5" by checking the "/home/alex2/anaconda3/envs/exome-seq/lib/python3.6/site-packages/neoepiscope/version.py" Mary Wood <[email protected]> 於 2019年9月10日週二上午5:02寫道：

…

Which version of neoepiscope did you use for your analysis? The software should be able to handle cases where there are multiple alternate alleles as you described, but I would like to test out the commands using the same version as you so I can better see what happened — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#7?email_source=notifications&email_token=AGH424W6WA6QM6J7ETBKYVDQI22XLA5CNFSM4IUTFU6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6JA53Q#issuecomment-529665774>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGH424XVESTWNABSGZ2JRDTQI22XLANCNFSM4IUTFU6A> .

-- Best, 國立交通大學生資所碩士班學生陸建利 Po-Yuan Chen, Master Student, Institute of Bioinformatics and Systems Biology, National Chiao Tung University

alex84425 · 2019-09-13T10:22:09Z

So, is there any better way to deal with alternate alleles? It seems that "0.3.5" is the newest version. 陸建利 <[email protected]> 於 2019年9月10日週二上午9:44寫道：

…

Well, I think that the version number is "0.3.5" by checking the "/home/alex2/anaconda3/envs/exome-seq/lib/python3.6/site-packages/neoepiscope/version.py" Mary Wood ***@***.***> 於 2019年9月10日週二上午5:02寫道： > Which version of neoepiscope did you use for your analysis? The software > should be able to handle cases where there are multiple alternate alleles > as you described, but I would like to test out the commands using the same > version as you so I can better see what happened > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#7?email_source=notifications&email_token=AGH424W6WA6QM6J7ETBKYVDQI22XLA5CNFSM4IUTFU6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6JA53Q#issuecomment-529665774>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AGH424XVESTWNABSGZ2JRDTQI22XLANCNFSM4IUTFU6A> > . > -- Best, 國立交通大學生資所碩士班學生陸建利 Po-Yuan Chen, Master Student, Institute of Bioinformatics and Systems Biology, National Chiao Tung University

-- Best, 國立交通大學生資所碩士班學生陸建利 Po-Yuan Chen, Master Student, Institute of Bioinformatics and Systems Biology, National Chiao Tung University

maryawood · 2019-09-13T22:24:12Z

Sorry for the delay, I had not had a chance to work on this yet! I just took a look today, and as suspected I got a similar error whether or not I retained variants with multiple alternate alleles. It appears that the issue is actually variants with '*' as the alternate allele, representing spanning deletions, which neoepiscope does not currently support. Thank you for bringing this to our attention! We will plan to incorporate a fix for this into an upcoming release of neoepiscope to increase the flexibility of the tool.

alex84425 · 2019-09-15T17:52:43Z

Sorry to bother you again. I try to reappear the result of this journal and encounter some problem. journal link: (https://www.nature.com/articles/nature14426) Before posing issue, is "neoepiscope" support mouse model by simply changing the bowtie1_index and gtf_file that program required. Mary Wood <[email protected]> 於 2019年9月14日週六上午6:24寫道：

…

Sorry for the delay, I had not had a chance to work on this yet! I just took a look today, and as suspected I got a similar error whether or not I retained variants with multiple alternate alleles. It appears that the issue is actually variants with '*' as the alternate allele, representing spanning deletions, which neoepiscope does not currently support. Thank you for bringing this to our attention! We will plan to incorporate a fix for this into an upcoming release of neoepiscope to increase the flexibility of the tool. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#7?email_source=notifications&email_token=AGH424XG7QMTKEKRKJHYWK3QJQHI3A5CNFSM4IUTFU6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6WKUFQ#issuecomment-531409430>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGH424XDA4MUTHB55BCLG7DQJQHI3ANCNFSM4IUTFU6A> .

-- Best, 國立交通大學生資所碩士班學生陸建利 Po-Yuan Chen, Master Student, Institute of Bioinformatics and Systems Biology, National Chiao Tung University

maryawood · 2019-09-16T21:30:43Z

No bother at all! We don't currently have built-in support for genomes other than human hg19/hg38, but it's pretty easy to get things set up on your own to use a different genome/species. Instead of using the --build option when running neoepiscope call, you will use the --dicts and --bowtie-index options with some data you download (and process a bit) yourself.

If you'd like to use a mouse model instead of a human model, you can download the mouse GTF file for your genome build of choice from GENCODE: https://www.gencodegenes.org/mouse/

Using neoepiscope index, you can then index the GTF file to create pickled dictionaries necessary for predicting neoepitopes. This only needs to be done once. Then whenever you run neoepiscope call, you can use the --dicts option on the command line to specify the directory containing those pickled dictionaries.

Additionally, you will need a bowtie index for your mouse genome, which you can download from the bowtie website: ftp://ftp.ccb.jhu.edu/pub/data/bowtie_indexes/

You can use this index with the --bowtie-index option when running neoepiscope call

Hope this helps!

alex84425 · 2019-10-01T02:33:39Z

Me again XDD. Your suggestions works well. It seems that reference file from GENCODE is necessary instead of other version reference such as ensemble. In addition, I recently participate in the seminar, and I found that most speaker and research will include the normal epitope affinity into the result and compare them with each other. I think this function is worthy to add, and you just need to simply double run the affinity prediction tools. Mary Wood <[email protected]> 於 2019年9月17日週二上午5:30寫道：

…

No bother at all! We don't currently have built-in support for genomes other than human hg19/hg38, but it's pretty easy to get things set up on your own to use a different genome/species. Instead of using the --build option when running neoepiscope call, you will use the --dicts and --bowtie-index options with some data you download (and process a bit) yourself. If you'd like to use a mouse model instead of a human model, you can download the mouse GTF file for your genome build of choice from GENCODE: https://www.gencodegenes.org/mouse/ Using neoepiscope index, you can then index the GTF file to create pickled dictionaries necessary for predicting neoepitopes. This only needs to be done once. Then whenever you run neoepiscope call, you can use the --dicts option on the command line to specify the directory containing those pickled dictionaries. Additionally, you will need a bowtie index for your mouse genome, which you can download from the bowtie website: ftp://ftp.ccb.jhu.edu/pub/data/bowtie_indexes/ You can use this index with the --bowtie-index option when running neoepiscope call Hope this helps! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#7?email_source=notifications&email_token=AGH424QKIEFAQGOISZT27LLQJ73ILA5CNFSM4IUTFU6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD62SSMI#issuecomment-531966257>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGH424VRS3QLK5NQSU4SCNLQJ73ILANCNFSM4IUTFU6A> .

-- Best, 國立交通大學生資所碩士班學生陸建利 Po-Yuan Chen, Master Student, Institute of Bioinformatics and Systems Biology, National Chiao Tung University

maryawood · 2019-10-01T16:12:38Z

Thank you for the suggestion! That is something that we will probably add in the future. Also, the latest release of neoepiscope should be able to handle the spanning deletions you had issues with before, so hopefully that will no longer cause problems for you if you update your installation of neoepiscope!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fail to include germline variant information #7

fail to include germline variant information #7

alex84425 commented Sep 8, 2019 •

edited

Loading

maryawood commented Sep 9, 2019

alex84425 commented Sep 9, 2019 via email

maryawood commented Sep 9, 2019

alex84425 commented Sep 9, 2019 via email

alex84425 commented Sep 9, 2019 via email

maryawood commented Sep 9, 2019

alex84425 commented Sep 10, 2019 via email

alex84425 commented Sep 13, 2019 via email

maryawood commented Sep 13, 2019

alex84425 commented Sep 15, 2019 via email

maryawood commented Sep 16, 2019

alex84425 commented Oct 1, 2019 via email

maryawood commented Oct 1, 2019

fail to include germline variant information #7

fail to include germline variant information #7

Comments

alex84425 commented Sep 8, 2019 • edited Loading

maryawood commented Sep 9, 2019

alex84425 commented Sep 9, 2019 via email

maryawood commented Sep 9, 2019

alex84425 commented Sep 9, 2019 via email

alex84425 commented Sep 9, 2019 via email

maryawood commented Sep 9, 2019

alex84425 commented Sep 10, 2019 via email

alex84425 commented Sep 13, 2019 via email

maryawood commented Sep 13, 2019

alex84425 commented Sep 15, 2019 via email

maryawood commented Sep 16, 2019

alex84425 commented Oct 1, 2019 via email

maryawood commented Oct 1, 2019

alex84425 commented Sep 8, 2019 •

edited

Loading