Skip to content

Tutorial

nterhoeven edited this page Jan 12, 2018 · 7 revisions

Tutorial

This is a tutorial which leads through the process of analyzing the repeat content of the sugar beet Beta vulgaris.

Setting up

First, we need to get reper and the test data.

git clone https://github.com/nterhoeven/reper.git
docker pull nterhoeven/reper
alias reper="docker run --user=$(id -u):$(id -g) -it --rm -v $(pwd):/data nterhoeven/reper"

Move to the tutorial directory

cd reper/tutorial/

Here we find a prepared reper config file (reper.conf). And bash scripts containing the needed commands.

We now have to download the sequencing data from sra. The easiest way to do this is using fastq-dump from the sratools package (01_download-data.sh):

fastq-dump --defline-seq '@$sn[_$rn]/$ri' --split-files SRR952972

The next step is the configuration of the reference databases used for the classification of the detected repeats. For our data set we can use REdat and refseq. reper comes with scripts to automatically download and configure these databases (02_prepare-databases.sh):

reper configure-refseq
reper configure-REdat

start reper

Everything is set up now and we can start reper. With the settings specified in the config file, the analysis requires 24 threads and 100G memory and runs about 5-6 hours (03_run-reper.sh).

reper kmerCount

Now it is time to grab a coffee and let reper do the work.

analyzing the results

When reper finished successfully, you will find the following important files:

  • repeat-landscape-by-* These files contain an overview of the repeats found, their assigned class and estimated part of the genome size.
  • Trinity.fasta This file contains all repeat sequences found
  • Trinity.fasta.exemplars.classified This file contains the representative exemplar sequences for each cluster along with the classification

With the R script (plot-landscape.R) provided in the scripts subdirectory of reper, you can generate nice visualizations of the repeat landscape. Note, that this uses the library ggplot (04_generate-plots.sh).

Clone this wiki locally