-
Notifications
You must be signed in to change notification settings - Fork 59
Task: clean
This removes small contigs, and also contigs completely contained in another contig.
The general usage is
circlator clean [options] <in.fasta> <outprefix>
There are the folowing options:
-
--min_contig_length INT
: contigs shorter than this are discarded (unless specified using--keep
). Default: 2000. -
--min_contig_percent FLOAT
: if length of nucmer hit is at least this percentage of length of contig, then contig is removed. (unless specified using--keep
). Default: 95. -
--diagdiff INT
: nucmer diagdiff option. Default: 25. -
--min_nucmer_id FLOAT
: nucmer minimum percent identity. Default: 95. -
--min_nucmer_length INT
: minimum length of hit for nucmer to report. Default: 500. -
--breaklen INT
: breaklen option used by nucmer. Default: 500. -
--keep FILENAME
: file of contig names to keep in output file, one name per line. Contigs named in this file will be kept, regardless of whether or not they are contained in another contig. -
--verbose
: be verbose
The final cleaned FASTA file is called outprefix.fasta
and logging information is written to outprefix.log
. An example log file is:
[clean] contig1 user_kept
[clean] contig2 kept
[clean] contig3 small_removed
[clean] contig4 contained in contig2
In this example, contig1 was kept becuase it was specified in the file given by --keep
. Contig2 was not contained in any other contigs, so was kept. Contig3 was too short and therefore removed. Contig4 was removed because it was contained in contig2.
The other files are intermediate files made as part of the cleaning process. First, small contigs are removed and the remaining contigs are written to outprefix.remove_small.fa
. The nucmer show-coords output of running nucmer on this file against itself is outprefix.coords
.