Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fail to include germline variant information #7

Open
alex84425 opened this issue Sep 8, 2019 · 13 comments
Open

fail to include germline variant information #7

alex84425 opened this issue Sep 8, 2019 · 13 comments

Comments

@alex84425
Copy link

alex84425 commented Sep 8, 2019

Hi, I try to generate noepeptide using this pipeline recently but encounter some difficulties.

firstly, I successfully run this pipeline with both somatic and germline variant which called by varscan2.

The command is showed below:

# merge varscan snp and indel variant, and I am not sure whether it is the correct to merge indel variant
cat ../../varscan_result/varscan_file_somatic_fetch_exon/001.vcf.*.Somatic.hc  |  grep -v ^\#  > 001.vcf.snp_indel.Somatic.hc
cat ../../varscan_result/varscan_file_Germline/001T.*.vcf > 001.vcf.snp_indel.Germline.hc
neoepiscope swap -i 001.vcf.snp_indel.Somatic.hc -o 001.vcf.snp_indel.Somatic.hc.sw
neoepiscope merge -g 001.vcf.snp_indel.Germline.hc  -s 001.vcf.snp_indel.Somatic.hc.sw  -o 001.merge.vcf


#hapcut2 need to sort variant
cat 001.merge.vcf   | sort -k1,1V -k2,2n  > 001.merge.sorted.vcf
mv 001.merge.sorted.vcf 001.merge.vcf

# phasing variant, and I am not sure that the difference between illumina read and 10X genomic, but I can only run correctly with the latter.
/home/alex2/git_file/HapCUT2/build/extractHAIRS  --bam ../../GATK_Recalibrator/001T.recal.bam  --VCF 001.merge.vcf  --out  001.merge.vcf.unlinked --10X 1 --indels 1
python3.6 /home/alex2/git_file/HapCUT2/utilities/LinkFragments.py  --bam ../../GATK_Recalibrator/001T.recal.bam   --VCF 001.merge.vcf --fragments 001.merge.vcf.unlinked  --out 001.merge.vcf.linked
/home/alex2/git_file/HapCUT2/build/HAPCUT2 --nf 1 --fragments 001.merge.vcf.linked  --VCF 001.merge.vcf  --output 001.merge.vcf.hp
# can not include germline info
neoepiscope prep -v 001.merge.vcf -c 001.merge.vcf.hp -o 001.merge.vcf.adhp
neoepiscope call -b hg19 -c 001.merge.vcf.adhp  -o 001.merge.vcf.out  -p netMHCpan 4 affinity -a HLA-A*24:02,HLA-A*24:02,HLA-B*54:01,HLA-B*40:02,HLA-C*03:04,HLA-C*01:02

However, I failed to run with somatic and germline variant which called by GATK pipeline. It shows some error in "neoepiscope call " step.

Traceback (most recent call last):
  File "/home/alex2/anaconda3/envs/exome-seq/bin/neoepiscope", line 10, in <module>
    sys.exit(main())
  File "/home/alex2/anaconda3/envs/exome-seq/lib/python3.6/site-packages/neoepiscope/__init__.py", line 765, in main
    protein_fasta=args.fasta,
  File "/home/alex2/anaconda3/envs/exome-seq/lib/python3.6/site-packages/neoepiscope/transcript.py", line 3149, in get_peptides_from_transcripts
    return_protein=True,
  File "/home/alex2/anaconda3/envs/exome-seq/lib/python3.6/site-packages/neoepiscope/transcript.py", line 2360, in neopeptides
    protein = seq_to_peptide(sequence[start_codon[0] :], reverse_strand=False)
  File "/home/alex2/anaconda3/envs/exome-seq/lib/python3.6/site-packages/neoepiscope/transcript.py", line 265, in seq_to_peptide
    codon = _codon_table[seq[i : i + 3]]
KeyError: 'G*A'

It seems that some wear character "*" appear in sequence, but I have no idea to solve this.
I am willing to share my .bam file and vcf to you.

data link:
https://drive.google.com/drive/folders/1O6PdYwImV0fEXDHPOemixV6cbVPhxatL?usp=sharing

@maryawood
Copy link
Collaborator

Hello, I'm sorry that you are having trouble! I have requested access to your google drive folder so I can take a look at the data and try to see what the problem is.

@alex84425
Copy link
Author

alex84425 commented Sep 9, 2019 via email

@maryawood
Copy link
Collaborator

Thank you, I am able to access the folder now! The file names in that directory do not match the ones you use in the commands. Could you tell me which file names in the google drive folder correspond to the file names in your commands?

@alex84425
Copy link
Author

alex84425 commented Sep 9, 2019 via email

@alex84425
Copy link
Author

alex84425 commented Sep 9, 2019 via email

@maryawood
Copy link
Collaborator

Which version of neoepiscope did you use for your analysis? The software should be able to handle cases where there are multiple alternate alleles as you described, but I would like to test out the commands using the same version as you so I can better see what happened

@alex84425
Copy link
Author

alex84425 commented Sep 10, 2019 via email

@alex84425
Copy link
Author

alex84425 commented Sep 13, 2019 via email

@maryawood
Copy link
Collaborator

Sorry for the delay, I had not had a chance to work on this yet! I just took a look today, and as suspected I got a similar error whether or not I retained variants with multiple alternate alleles. It appears that the issue is actually variants with '*' as the alternate allele, representing spanning deletions, which neoepiscope does not currently support. Thank you for bringing this to our attention! We will plan to incorporate a fix for this into an upcoming release of neoepiscope to increase the flexibility of the tool.

@alex84425
Copy link
Author

alex84425 commented Sep 15, 2019 via email

@maryawood
Copy link
Collaborator

No bother at all! We don't currently have built-in support for genomes other than human hg19/hg38, but it's pretty easy to get things set up on your own to use a different genome/species. Instead of using the --build option when running neoepiscope call, you will use the --dicts and --bowtie-index options with some data you download (and process a bit) yourself.

If you'd like to use a mouse model instead of a human model, you can download the mouse GTF file for your genome build of choice from GENCODE: https://www.gencodegenes.org/mouse/

Using neoepiscope index, you can then index the GTF file to create pickled dictionaries necessary for predicting neoepitopes. This only needs to be done once. Then whenever you run neoepiscope call, you can use the --dicts option on the command line to specify the directory containing those pickled dictionaries.

Additionally, you will need a bowtie index for your mouse genome, which you can download from the bowtie website: ftp://ftp.ccb.jhu.edu/pub/data/bowtie_indexes/

You can use this index with the --bowtie-index option when running neoepiscope call

Hope this helps!

@alex84425
Copy link
Author

alex84425 commented Oct 1, 2019 via email

@maryawood
Copy link
Collaborator

Thank you for the suggestion! That is something that we will probably add in the future. Also, the latest release of neoepiscope should be able to handle the spanning deletions you had issues with before, so hopefully that will no longer cause problems for you if you update your installation of neoepiscope!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants