Skip to content

Database free

aast242 edited this page Jul 21, 2021 · 2 revisions

Module Overview

The database_free module performs similar functions to the database module, but utilizes BLAST's -subject flag instead of the -db flag. This bypasses database creation and causes the program generate fewer files. This is the module is more useful when doing small searches where a database is unnecessary (i.e., a fungal genome as the subject and a few transposons as the query).

Usage: FACET db_free <subject.fasta> <query.fasta> [options]

Acceptable Module Aliases: db_free, dbf, database_free

Positional Arguments

subject

The subject FASTA file is the sequence that is used as the subject in BLASTn searches. If you are searching for transposable elements in a genome, this file should contain the genome sequence.

query

The query FASTA file is the sequence that is used to query the subject file. If you are searching for transposable elements in a genome, this file should contain transposable element sequences.

Optional arguments

--help\-h

Prints a help message with brief descriptions of each option and exits the program

--verbose\-v

Prints more information to stdout while FACET is running

--nocat\-c

Maintains redundancy between query IDs, but culls redundant alignments in the same query ID. This option only modifies the output if there is more than one query sequence. When this option is used, "_nocat" is included in output file names. See the About page for a more thorough explanation

--noclean\-n

Does not cull any alignments from BLASTn output. This option essentially turns FACET into a file conversion program. When this option is used, "_noclean" is included in output file names. See the About page for a more thorough explanation

--writegff\-g

Writes a GFF file containing alignment information. See more about FACET's GFF output here.

--writesam\-s

Writes a SAM file containing alignment information. See more about FACET's SAM/BAM output here.

--writebam\-b

Writes a BAM file containing alignment information. See more about FACET's SAM/BAM output here.

--verboseoutput\-q

Writes BTOP information to SAM/BAM files and CSV files. This can be very useful for visualizing RIP in fungal genomes (see this section of the wiki for an example of what that looks like)

--writefasta <int>

Writes a FASTA file containing subject sequences with alignments longer than <int>

All subject sequences will be written to the same FASTA file and location information will be written in the FASTA comment line after the ID

--nocsv\-x

Does not write a csv file containing alignment information

--writetic\-i

Writes a tic file containing alignment information. This file format was created by me and can be used to store coverage information from a genome x genome comparison. See tic files to learn more about the file format.

--buffer <int> [default: 25]

'Buffers' alignments to aid in finding the most biologically relevant alignments in a given BLAST report. The default value of 25 should be fine for most applications. If you are still seeing lots of overlapping alignments, there may be some sequence polymorphisms between the query and the subject.

See this section of the About page for a detailed explanation of the buffer

--force

By default, FACET exits if running the given command will overwrite existing user files. Using this flag forces the program to overwrite those files.

--outfmt <str>

This option allow users to specify an outformat for CSV files. This option functions similarly to BLASTn's -outfmt flag (see more here). Any outfmt can be used in any order as long as it contains the necessary flags for FACET to run (see --outfmt facet below). The two pre-defined outfmts are facet and 6

Specifying --outfmt 6 in FACET is the same as specifying -outfmt 6 in BLASTn

Specifying --outfmt facet in FACET (this is the default behavior) is the same as specifying -outfmt "6 sseqid sstart send qseqid qstart qend sstrand pident" in BLASTn

If the user is defining a custom outfmt, the flag is used like so: --outfmt "6 sseqid sstart send qseqid qstart qend". The quotation marks and 6 must be present for FACET to properly interpret the outfmt string!

--task <str> [default: megablast]

Allows the user to run different BLASTn tasks. The task name is appended to output file names so users can keep track of what commands were used to generate output files. This flag is similar to BLASTn's -task option. BLASTn's default task is megablast, but the flag can accept the following values: blastn, blastn-short, megablast, dc-megablast

--evalue <float> [default: 1e-15]

Sets the evalue cutoff for BLASTn alignment consideration. Any alignments with an evalue <= this value will be considered by FACET.

--notigcov\-t

Removes alignments covering the entire contig. This option is particularly useful when performing genome self-comparisons, as that would be one of the only times an alignment would span an entire contig (see this section of the wiki)

--large

This option speeds up larger searches by breaking up the query file into individual sequences, querying the database with each individual FASTA file, and concatenating the results after the database has been searched with all query sequences. The final output of FACET is not modified by using this method. This method is not the default behavior because the file operations involved can actually slow down smaller searches (e.g., searching for transposable elements in a fungal genome).

The --large flag really shines when used to perform large genome-genome comparisons. Using the flag sped up a self-comparison of the C. elegans genome from ~17 hours to ~8 minutes. That's over 120x faster!