diff --git a/algo/vidjil.cpp b/algo/vidjil.cpp index e8a5e0dbb..3be2db1ce 100644 --- a/algo/vidjil.cpp +++ b/algo/vidjil.cpp @@ -198,10 +198,10 @@ void usage(char *progname, bool advanced) << " -t trim V and J genes (resp. 5' and 3' regions) to keep at most nt (default: " << DEFAULT_TRIM << ") (0: no trim)" << endl << endl - << "Labeled windows (these windows will be kept even if -r/-% thresholds are not reached)" << endl - << " -W label the given window" << endl - << " -l label a set of windows given in " << endl - << " -F filter -- keep only the labeled windows" << endl + << "Labeled sequences (windows related to these sequences will be kept even if -r/-% thresholds are not reached)" << endl + << " -W label the given sequence" << endl + << " -l label a set of sequences given in " << endl + << " -F filter -- keep only the windows related to the labeled sequences" << endl << endl ; cerr << "Limits to report a clone (or a window)" << endl diff --git a/doc/algo.org b/doc/algo.org index 67168faf9..8963d7d44 100644 --- a/doc/algo.org +++ b/doc/algo.org @@ -365,32 +365,36 @@ used only for test and debug purposes, on very small datasets, and produce large file and takes huge computation times. -** Labeled windows and sequences of interest +** Sequences of interest -Vidjil allows to indicate that specific windows must be followed -(even if those windows are 'rare', below the =-r/-%= thresholds). - -Such windows can be provided either with =-W =, or with =-l =. -The file given by =-l= should have one window by line, as in the following example: +Vidjil allows to indicate that specific sequences should be followed and output, +even if those sequences are 'rare' (below the =-r/-%= thresholds). +Such sequences can be provided either with =-W =, or with =-l =. +The file given by =-l= should have one sequence by line, as in the following example: #+BEGIN_EXAMPLE GAGAGATGGACGGGATACGTAAAACGACATATGGTTCGGGGTTTGGTGCT my-clone-1 GAGAGATGGACGGAATACGTTAAACGACATATGGTTCGGGGTATGGTGCT my-clone-2 foo #+END_EXAMPLE -Windows and labels must be separed by one space. -The first column of the file is the window to be followed -while the remaining columns consist of the window's label. -In Vidjil output, the labels are output alongside their windows. - -With the =-F= option, /only/ the labeld windows are kept. This allows -to quickly filter a set of reads, looking for a known window, -with the =-FaW = options: -All the reads with this windows will be extracted to =out/seq/clone.fa-1=. +Sequences and labels must be separed by one space. +The first column of the file is the sequence to be followed +while the remaining columns consist of the sequence's label. +In Vidjil output, the labels are output alongside their sequences. -More generally when the provided sequence differs in length with the windows +A sequence given =-W = or with =-l = can be exactly the size +of the window (=-w=, that is 50 by default). In this case, it is guaranteed that +such a window will be output if it is detected in the reads. +More generally, when the provided sequence differs in length with the windows we will keep any windows that contain the sequence of interest or, conversely, we will keep any window that is contained in the sequence of interest. +This filtering will work as expected when the provided sequence overlaps +(at least partially) the CDR3 or its close neighborhood. + +With the =-F= option, /only/ the windows related to the given sequences are kept. +This allows to quickly filter a set of reads, looking for a known sequence or window, +with the =-FaW = options: +All the reads with the windows related to the sequence will be extracted to =out/seq/clone.fa-1=. ** Clone analysis: VDJ assignation and CDR3 detection