Skip to content

Shared flows

Cristina Yenyxe Gonzalez Garcia edited this page Jan 11, 2017 · 6 revisions

This section illustrates sequences of steps that are shared by multiple jobs.

The following flows are currently defined:

  • Generate VEP annotation
  • Calculate population statistics (work in progress)

Please note this section is a work in progress and more details about the structure of each flow will be added in the future.

Generate VEP annotation

Variant annotations are generated using Ensembl VEP, a binary completely independent from the EVA pipeline. In fact, one could annotate each study with a different version of VEP.

In order to annotate variants that have been previously loaded, the database is traversed, looking for those lacking an annotation. The output of this is a tab-separated file following the format described here.

In addition to this tab-separated file, the following are also necessary to run VEP:

  • A FASTA file containing the sequence matched by the VCF
  • A VEP cache containing transcripts location, regulatory regions, SIFT/Polyphen scores, etc.

FASTA files and VEP caches ready to be used together can be found in the Ensembl FTP, here and here.

VEP creates a plain text file with the annotations, which is then read and loaded into the database along with the variants.

Annotation flow (Click on the diagram for fullscreen view)

Calculate population statistics (work in progress)

TODO

Clone this wiki locally