Skip to content
aast242 edited this page Jul 21, 2021 · 2 revisions

Module Overview

The masker module performs a genome self-comparison using BLASTn (or takes an outfile) and utilizes those alignments to mask repetitive regions in a provided genome. The masked genome is output in the directory the program is called from.

Usage: FACET masker <genome.fasta> [options]

Acceptable Module Aliases: masker, m

Positional Arguments

genome

This is a FASTA file that will be masked either by performing a self-comparison or using alignments in an outfile.

Optional arguments

--help\-h

Prints a help message with brief descriptions of each option and exits the program

--outfile <str>

If the user has a BLASTn/FACET output file that they would like to mask the given genome with, they can provide a file using this option. The outfmt option must match the outfmt of the provided outfile!

--outfmt <str>

This option allow users to specify an outformat for CSV files. This option functions similarly to BLASTn's -outfmt flag (see more here). Any outfmt can be used in any order as long as it contains the necessary flags for FACET to run (see --outfmt facet below). The two pre-defined outfmts are facet and 6

Specifying --outfmt 6 in FACET is the same as specifying -outfmt 6 in BLASTn

Specifying --outfmt facet in FACET (this is the default behavior) is the same as specifying -outfmt "6 sseqid sstart send qseqid qstart qend sstrand pident" in BLASTn

If the user is defining a custom outfmt, the flag is used like so: --outfmt "6 sseqid sstart send qseqid qstart qend". The quotation marks and 6 must be present for FACET to properly interpret the outfmt string!

--cov_depth <int> [default: 2]

The depth of coverage needed for a base to be considered repetitive. Any base that has a coverage >= the provided value will be masked with the mask character. The default value is 2 because a self-comparison will always generate an alignment that covers each contig (from the contig matching to itself).

Some important notes:

  • If you are trying to mask known repetitive elements in a genome (e.g., searching a genome for known repetitive elements using FACET and using the resulting CSV file as an input), this value should be set to 1.
  • If the organism you are performing a self-comparison with is diploid, this value should be set to 3, as a chromosome will align to itself and its homologous chromosome

--mask_char <char> [default: n]

The character that is used to mask a repetitive base.

--evalue <float> [default: 1e-15]

Sets the evalue cutoff for BLASTn alignment consideration. Any alignments with an evalue <= this value will be considered by FACET.

--num_threads <int> [default: 2]

The number of threads used for the BLASTn process. This flag is the same as BLASTn's -num_threads option. Speeds up BLASTn searches by using more CPUs

--force

By default, FACET exits if running the given command will overwrite existing user files. Using this flag forces the program to overwrite those files.

--overwritedb

FACET's default behavior is to create a database using the subject FASTA file. If a database with the same name is present, FACET will not overwrite it unless this option is specified. This reduces the number of times FACET has to create a database and decreases runtime in future runs.

--verbose\-v

Prints more information to stdout while FACET is running