gff3_to_fasta

Extract sequences from specific regions of genome based on gff file.

Usage

gff3_to_fasta.py [-h] [-g GFF] [-f FASTA] [-st SEQUENCE_TYPE] [-d DEFLINE] [-o OUTPUT_PREFIX] [-noQC] [-v]

GFF3: specify the file name with the -g argument
Fasta file: specify the file name with the -f argument. This file must be the Fasta file that the GFF3 seqids and coordinates refer to. For more information, refer to the GFF3 specification.
Output prefix: specify with the -o argument. All resulting fasta files will contain this prefix.

Specify the input, output file names and options using short arguments:
- python2.7 bin/gff3_to_fasta.py -g example_file/example.gff3 -f example_file/reference.fa -st all -d simple -o test_sequences

-h, --help
- show this help message and exit
-g GFF, --gff GFF
- Genome annotation file in GFF3 format
-f FASTA, --fasta FASTA
- Genome sequences in FASTA format
-st SEQUENCE_TYPE, --sequence_type SEQUENCE_TYPE
- Type of sequences you would like to extract:
  - "all" - FASTA files for all types of sequences listed below;
  - "gene" - gene sequence for each record;
  - "exon" - exon sequence for each record;
  - "pre_trans" - genomic region of a transcript model (premature transcript);
  - "trans" - spliced transcripts (only exons included);
  - "cds" - coding sequences;
  - "pep" - peptide sequences.
-d DEFLINE, --defline DEFLINE
- Defline format in the output FASTA file:
  - "simple" - only ID is shown in the defline;
  - "complete" - complete information of the feature is shown in the defline.
-o OUTPUT_PREFIX, --output_prefix OUTPUT_PREFIX
- Prefix of output file name
-noQC, --quality_control
- Specify this option if you do not want to excute quality control for gff file. (default: QC is executed)
-v, --version
- Show program version number and exit