-
Notifications
You must be signed in to change notification settings - Fork 2
FAQ
It is possible that a minority of the TCR alignments result in non-productive TCRs. A TCR is considered non-productive if it has an out of frame junction sequence and/or an early stop codon and/or any defect described in the initiation codon, splicing sites and/or regulatory elements.
By default, we do not perform any specific filtering for non-productive TCRs, considering that these TCRs make up a small minority of the results. In addition, we have found it is more common to see non-productive rearrangements in alpha chains, but we often observe that these are clonal, so they may be the result of dual alpha-chain-expressing clonotypes rather than poor quality sequencing data.
In any case, the results table (barcode_UMI_results.csv) includes a column named productive (see output tables wiki), by which the user can filter non-productive TCRs.
Some T cells can have dual TCRs, i.e., they express two rearranged TCR alleles for the alpha and/or beta chains. In principle, this event is prevented by the process of allelic exclusion, by which only one allele is expressed and the other is silenced. However, allelic exclussion is not 100% efficient, resulting in T cells with dual TCRs. Note that it is more common to find dual TCR α-chains than dual TCR β-chains, due to different mechanisms of allelic exclusion.
In WAT3R, we consider the possibility of dual TCR α-chains expression. For this reason, we include the two most common TRA alleles per cell barcode in the final results table (barcode_results.csv), ranking them by UMI counts and read counts. However, we only report the single most common allele for TRB, as we consider dual TCR β-chains to be a more infrequent event.
If the user is interested in exploring dual TCR expression in both the alpha and beta chains, it is possible to do so using the barcode_UMI_results.csv table. This results table contains the TCR alignments at the transcript level. That is, it registers every TRA and TRB alignment for a cell, even if there are multiple (see output tables description). By default, we select the most common TRB alignment and the two most common TRA alignments to create barcode_results.csv.
For more information on dual TCRs, see the following review by Schuldt et al.
When analyzing a mixture of cells including non-T cells, some TCRs may be reported in non-T cells, which is most likely a technical artifact. In our PBMCs analysis, we found that 3% of non-T cells had TCR results in the TREK-seq data analyzed by WAT3R (bioRxiv preprint). We found that these non-T cells also expressed TCR transcripts in the whole transcriptome scRNA-seq data analyzed by Cell Ranger, indicating that some TCR transcripts were already present in the 10x Genomics 3’ barcoded cDNA from which both modalities originate (paper under review at Bioinformatics). We discuss five potential sources of such artifacts.
- Ambient mRNAs present in the cell suspension can be aberrantly barcoded; this contamination has been estimated to make up ~2-10% of transcripts (Young 2020, Yang 2020). While tools such as SoupX and DecontX can be used to eliminate ambient RNAs, these are not designed to work on enriched data such as the input for WAT3R.
- Doublets could cause incorrect detection of TCRs. Users can apply doublet removal tools to scRNA-seq data (and not TREK-seq/WAT3R) in a way that is consistent with the experimental goals (DePasquale 2019, Wolock 2019, McGinnis 2019). Removing these cell barcodes from consideration should also minimize false positive TCRs from T cell-containing doublets.
- Erroneous cell type classification could cause TCR detection in non-T cells. In our data, cell types were assigned to each cell barcode using standard bioinformatic pipelines (i.e. Seurat), but all methods of classification are prone to erroneous labeling in a small proportion of cells.
- Barcode swapping can occur due to a process called PCR recombination, PCR crossover, or chimera formation (Holcomb 2014). If a primer only extends halfway through a template, in the next cycle, this half product can hybridize to another DNA fragment, and be extended. In single-cell RNA-seq, the resulting fusion product can cause the association of a transcript sequence with an erroneous cell barcode. PCR recombination is more likely to occur during later PCR cycles when primer concentrations are low and PCR product concentrations are high, or due to short extension times, and can thus be minimized experimentally. By default, WAT3R selects UMI-cell barcode combinations with the most supporting reads, which also reduces the impact of PCR recombination.
- Barcode swapping can also occur on patterned Illumina flowcells such as the NovaSeq (Griffiths 2018, Larsson 2018, Sinha 2017). Currently, TREK-seq libraries are sequenced on the MiSeq which does not have this issue. In future versions of TREK-seq that can be sequenced on patterned flowcells, we plan to minimize the chance of barcode swapping by including dual library indices. Just like in the previous point, WAT3R's default selection of UMI-cell barcode combinations with the most supporting reads should reduce the impact of barcode swapping.
Given that TCR transcripts in non-T cells occur in both whole transcriptome and TCR-enriched libraries, coupled with these possibilities for technical artifacts, we believe that false positive non-T cells are a byproduct of upstream processes and not of the WAT3R pipeline.