Shared flows

This section illustrates sequences of steps that are shared by multiple jobs.

The following flows are currently defined:

Generate VEP annotation
Calculate population statistics (work in progress)

Please note this section is a work in progress and more details about the structure of each flow will be added in the future.

Generate VEP annotation

Variant annotations are generated using Ensembl VEP, a binary completely independent from the EVA pipeline. In fact, one could annotate each study with a different version of VEP.

In order to annotate variants that have been previously loaded, the database is traversed, looking for those lacking an annotation. The output of this is a tab-separated file following the format described here.

In addition to this tab-separated file, the following are also necessary to run VEP:

A FASTA file containing the sequence matched by the VCF
A VEP cache containing transcripts location, regulatory regions, SIFT/Polyphen scores, etc.

FASTA files and VEP caches ready to be used together can be found in the Ensembl FTP, here and here.

VEP creates a plain text file with the annotations, which is then read and loaded into the database along with the variants.

(Click on the diagram for fullscreen view)

Calculate population statistics (work in progress)

TODO

Home

Pipeline design

Database

DBMS
Schema

Tutorials

Population statistics

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared flows

Generate VEP annotation

Calculate population statistics (work in progress)

Clone this wiki locally