404
+ +Page not found
+ + +diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 0000000..e69de29 diff --git a/404.html b/404.html new file mode 100644 index 0000000..eaeefe8 --- /dev/null +++ b/404.html @@ -0,0 +1,137 @@ + + +
+ + + + +Page not found
+ + +./scMaestro atac -h
+usage: scMaestro atac [-h] -f [FASTQPATH [FASTQPATH ...]] -g GENOME -r
+ REFERENCE FOLDER
+
+optional arguments:
+ -h, --help show this help message and exit
+ -f [FASTQPATH [FASTQPATH ...]], --fastqs [FASTQPATH [FASTQPATH ...]]
+ Path(s) to fastq files, multiple paths can be provided
+ together. eg. "-f path1 path2"
+ -g GENOME, --genome GENOME
+ Genome build, e.g. "hg38", "mm10"
+ -r REFERENCE FOLDER, --reference REFERENCE FOLDER
+ Reference folder for alignment. VDJ reference is
+ needed if subcommand "vdj" is used.
+
+
+ Cellranger is very resouce intensive. So resource allocation is critical for counts. By default,
+Cell Ranger runs locally by default (or when specified as --jobmode=local), using 90% of available memory and all of the available cores.
+ +scMaestro
is a workflow wrapper of a set of snakemake workflows. The snakemake workflow component SF_sc-smk-wl
is used as a submodule of the repo for workflow wrapper. With the snakemake workflow as a seperate github repo, the wrappers we designed for internal use (SF_scMaestro)(private repo) and external use (scMaetro) can share the same snakemake workflow. This separation allows us to open-source our Snakemake workflow together with the wrapper for external users.
One feature of scMaestro
is that the snakemake workflow will be copied to the analysis folder. The benefit that this provides a self-contained and reproducible environment for the analysis.
Another feature is that cMaestro
relies on singularity. Singularity images will be created at the beginning of snakemake workflow run. This feature again improves the reproducibility of the analysis.
The singularity-related arguments use-singularity
and singularity-args
for snakemake workflow can be found here.
fixedrna
Reference/Probe_set
+Singleplex or multiplex library
+./scMaestro fixedrna --help
+usage: scMaestro fixedrna [-h] -f [FASTQPATH [FASTQPATH ...]] -g GENOME -l
+ LIBRARY_CONFIG --probe_set PROBE_SET [--singleplex]
+ [--multiplex MULTIPLEX]
+
+optional arguments:
+ -h, --help show this help message and exit
+ -f [FASTQPATH [FASTQPATH ...]], --fastqs [FASTQPATH [FASTQPATH ...]]
+ Path(s) to fastq files, multiple paths can be provided
+ together. eg. "-f path1 path2"
+ -g GENOME, --genome GENOME
+ Genome build, e.g. "hg38", "mm10"
+ -l LIBRARY_CONFIG, --library_config LIBRARY_CONFIG
+ CSV file with the library configration.
+ --probe_set PROBE_SET
+ Probe set for FRP data
+ --singleplex Singleplex FRP
+ --multiplex MULTIPLEX
+ Mutiplexing information for FRP
+
+The parameters --singleplex
and --multiplex
are mutually excluesive and at least one of them has to be specified.
Here is an exmaple of config.py
file:
unaligned=["/mnt/ccrsf-static/Analysis/xies4/github_repos/pipeline_dev_test/raw_fastq/fixedrna/221102_VH00271_218_AAC3K5WM5/AAC3K5WM5/outs/fastq_path/AAC3K5WM5"]
+analysis="/mnt/ccrsf-static/Analysis/xies4/github_repos/pipeline_dev_test/test_fixedscRNA_singleplex/BaoTran_CS033083_1scRNAseq_FixedRNA_101422/Analysis"
+ref="hg38"
+projectname="BaoTran_CS033083_1scRNAseq_FixedRNA_101422"
+libraries="libraries.csv"
+probe_set="/mnt/ccrsf-ifx/Software/tools/GemCode/cellranger-7.0.1/probe_sets/Chromium_Human_Transcriptome_Probe_Set_v1.0_GRCh38-2020-A.csv"
+pipeline="fixedrna"
+aggregate=False # Default value for fixedrna
+
+Here is libraries.csv
:
Name,Flowcell,Sample,Type +LIB1,AAC3K5WM5,LIB1,Gene Expression +LIB2,AAC3K5WM5,LIB2,Gene Expression +LIB3,AAC3K5WM5,LIB3,Gene Expression
+For multiplexed libraries, here is the config file:
+unaligned=["/mnt/ccrsf-static/Analysis/xies4/github_repos/pipeline_dev_test/test_fixedscRNA/223LH3LT1/outs/fastq_path/223LH3LT1"]
+analysis="/mnt/ccrsf-static/Analysis/xies4/github_repos/pipeline_dev_test/test_fixedscRNA/Analysis"
+ref="hg38"
+projectname="DanielMcVicar_CS036898_4fixedscRNA_07302024"
+pipeline="fixedrna"
+probe_set="/mnt/ccrsf-ifx/Software/tools/GemCode/cellranger-8.0.1/probe_sets/Chromium_Human_Transcriptome_Probe_Set_v1.0.1_GRCh38-2020-A.csv"
+libraries="libraries.csv"
+multiplex="multiplex.csv"
+aggregate=False # Default value for fixedrna
+
+libraries.csv
is the sample as singleplex library.
Here is an example for multiplex.csv
Name,sample_id,probe_barcode_ids,description
+pool_1,Young_RepSox_3nM,BC001,Young_RepSox_3nM
+pool_1,Young_PC3EVNeut,BC002,Young_PC3EVNeut
+pool_2,Young_Enza_20uM,BC001,Young_Enza_20uM
+pool_2,Aged_PC3EV,BC002,Aged_PC3EV
+pool_3,Young_Combo,BC001,Young_Combo
+pool_3,MDA_PCa2b,BC004,MDA_PCa2b
+pool_4,Young_PC3EV,BC001,Young_PC3EV
+pool_4,Young_PC3Stat5,BC002,Young_PC3Stat5
+
+
+ rna
rna
usage./scMaestro rna -h
+usage: scMaestro rna [-h] -f [FASTQPATH [FASTQPATH ...]] -g GENOME -r
+ REFERENCE FOLDER [--fullanalysis]
+ [--force FORCE | --expect EXPECT] [--exclude-introns]
+
+optional arguments:
+ -h, --help show this help message and exit
+ -f [FASTQPATH [FASTQPATH ...]], --fastqs [FASTQPATH [FASTQPATH ...]]
+ Path(s) to fastq files, multiple paths can be provided
+ together. eg. "-f path1 path2"
+ -g GENOME, --genome GENOME
+ Genome build, e.g. "hg38", "mm10"
+ -r REFERENCE FOLDER, --reference REFERENCE FOLDER
+ Reference folder for alignment. VDJ reference is
+ needed if subcommand "vdj" is used.
+ --fullanalysis Run full analysis pipeline
+ --force FORCE Run Cell Ranger with --force-cell
+ --expect EXPECT Run Cell Ranger with --expect-cells
+ --exclude-introns Exclude intronic reads in count. To maximize
+ sensitivity for whole transcriptome 3’/5’ Single Cell
+ Gene Expression and 3’ Cell Multiplexing experiments,
+ introns will be included in the analysis by default
+ for cellranger (>v7.0.0) count and multi.
+
+If --fullanalysis
is enabled. Quality control, PCA, clustering, annotation will be performed for each sample seperately using Seurat
and SingleR
. An html report will be generated. An example can be found here.
scMaestro
, developed by CCRSF IFX, is aim to provide an end-to-end solution for single cell sequencing data.
To run scMastro
, activation of an conda environment singularity (>v4.1.5)
are required.
git clone --recurse-submodules https://github.com/CCRSF-IFX/scMaestro.git
+cd scMaestro/
+
+conda env create -n scMastro -f environment.yml
+
+You only need to run the command above once.
+conda activate scMastro
+
+scMastro
./scMastro
+
+usage: scMaestro [-h]
+ {rna,multi,vdj,atac,multiome,fixedrna,rerun,dryrun,unlock}
+ ...
+
+scMastro: Comprehensive workflow for processing single cell sequencing data
+
+positional arguments:
+ {rna,multi,vdj,atac,multiome,fixedrna,rerun,dryrun,unlock}
+ sub-command help
+ rna Snakemake pipeline for 10x scRNA-seq data analysis
+ multi Snakemake pipeline for 10x CellRanger multi data
+ analysis
+ vdj Snakemake pipeline for 10x VDJ data
+ atac Snakemake pipeline for 10x scATAC-seq data analysis
+ multiome Snakemake pipeline for 10x Multiome ATAC + GEX data
+ fixedrna Snakemake pipeline for 10x Fixed RNA profiling (FRP)
+ data
+ rerun Equavalent of snakemake --rerun
+ dryrun Equavalent of snakemake --dryrun
+ unlock Equavalent of snakemake --unlock
+
+optional arguments:
+ -h, --help show this help message and exit
+
+
+ multi
This pipeline is used in situations when analyzing cell multiplexing data or projects with samples that have both gene expression and VDJ captures. More information about this analysis and when to use it can be found at 10x support webpages:
+Cell Multiplexing with cellranger multi
+Gene Expression, V(D)J & Feature Barcode Analysis with cellranger multi
+./scMaestro multi -h
+usage: scMaestro multi [-h] -f [FASTQPATH [FASTQPATH ...]] -g GENOME -r
+ REFERENCE FOLDER [--chain {auto,TR,IG}]
+ [--fullanalysis] -l LIBRARY_CONFIG [--vdj_ref VDJ_REF]
+ [--cmo] [--count]
+
+optional arguments:
+ -h, --help show this help message and exit
+ -f [FASTQPATH [FASTQPATH ...]], --fastqs [FASTQPATH [FASTQPATH ...]]
+ Path(s) to fastq files, multiple paths can be provided
+ together. eg. "-f path1 path2"
+ -g GENOME, --genome GENOME
+ Genome build, e.g. "hg38", "mm10"
+ -r REFERENCE FOLDER, --reference REFERENCE FOLDER
+ Reference folder for alignment. VDJ reference is
+ needed if subcommand "vdj" is used.
+ --chain {auto,TR,IG} Force the analysis to be carried out for a particular
+ chain type.
+ --fullanalysis Run full analysis pipeline
+ -l LIBRARY_CONFIG, --library_config LIBRARY_CONFIG
+ CSV file with the library configration.
+ --vdj_ref VDJ_REF Reference folder for VDJ data
+ --cmo CMO information will be used for multi analysis
+ --count Run cellranger count for projects with HTO libraries
+ with more 10 individuals mixed.
+
+The libraries csv file would also need to be supplied in config.py
. For example:
libraries="libraries.csv"
+
+The libraries file would contain the final sample name, flowcell, demultiplex sample name, and library type for each sample. For example:
+Name,Flowcell,Sample,Type
+IL15_LNs,H7CNNBGXG,IL15_LNs,Gene Expression
+IL15_LNs,H7CT7BGXG,IL15_LNs_BC,Antibody Capture
+IL15_TUMOR_CD11,H7CNNBGXG,IL15_TUMOR_CD11,Gene Expression
+IL15_TUMOR_CD11,H7CT7BGXG,IL15_TUMOR_CD11_BC,Antibody Capture
+
+cellranger mkfastq
, and so should match the FASTQ files.The pipeline will use the provided libraries file to make a library file for each individual sample. This will then be processed again to create the configuration file in the format expected by CellRanger. Due to the lack of CRISPR projects at the facility, the pipeline was never tested for this technology. If this pipeline is run on a sample with CRISPR capture, check the generated sample configuration file to ensure that it matches what is expected.
+For multi task with VDJ libraries, “donor” and “origin” are required in "libraries.csv" for aggregation. The link (here) shows how the donor and origin information will be used in the data analysis:
+Name,Flowcell,Sample,Type,Donor,Origin
+
+The chain information is specified in feature_types
column. Valid specifications include VDJ
, VDJ-T
, VDJ-B
, or VDJ-T-GD
, and the combinations:
For multi
task with CRISPR Guide Capture libraries, feature_reference
column is required to be put in the fifth column in libraries.csv
.
Name,Flowcell,Sample,Type,Feature
+F1Test,AACCCHVM5,F1CRISPR_Library,Gene Expression,crispr_feature_reference1.csv
+F1Test,AACCCHVM5,F1GE_Library,CRISPR Guide Capture,crispr_feature_reference1.csv
+F2Test,AACCCHVM5,F2CRISPR_Library,Gene Expression,crispr_feature_reference2.csv
+F2Test,AACCCHVM5,F2GE_Library,CRISPR Guide Capture,crispr_feature_reference2.csv
+F3Test,AACCCHVM5,F3CRISPR_Library,Gene Expression,crispr_feature_reference3.csv
+F3Test,AACCCHVM5,F3GE_Library,CRISPR Guide Capture,crispr_feature_reference3.csv
+
+The descriptions of the feature reference CSV file can be found here. In case of varying feature references for different GEX/CRISPR pairs, the fifth column can be used to provide different references.
+If an antibody capture was used, then a feature reference file will need to be provided in the config.py
file with a features entry. There is currently no pipeline flag to add this, and it will need to be provided manually. For example:
features="features.csv"
+
+The feature reference file would contain (at minimum) a unique IDfor the feature, human readable name, read, pattern, sequence, and feature type. For example:
+id,name,sequence,feature_type,read,pattern
+CITE_CD64,CD64,AGCAATTAACGGGAG,Antibody Capture,R2,5PNNNNNNNNNN(BC)
+CITE_F4_80,F4_80,TTAACTTCAGCCCGT,Antibody Capture,R2,5PNNNNNNNNNN(BC)
+CITE_CD8a,CD8a,TACCCGTAATAGCGT,Antibody Capture,R2,5PNNNNNNNNNN(BC)
+CITE_XCR1,XCR1,TCCATTACCCACGTT,Antibody Capture,R2,5PNNNNNNNNNN(BC)
+
+++No space is allowed for the
+id
field and theid
must be unique.
Unless the information is provided, it is probably easiest to determine the pattern by checking the FASTQ file for the sequence location. More information can be found at: https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/feature-bc-analysis
+If a multiplexing capture was used, then the pipeline can be called with the cmo flag. This will add a cmo entry into the config.py file, which can then be edited to provide the cmo (cell multiplexing oligo) reference file. For example:
+cmo="cmo.csv"
+
+The cmo reference file has a very similar format to the feature reference file. The difference is that the feature type would be Multiplexing Capture. For example:
+id,name,sequence,feature_type,read,pattern
+HTO_1,HTO_1,GTCAACTCTTTAGCG,Multiplexing Capture,R2,5P(BC)
+HTO_2,HTO_2,TGATGGCCTATTGGG,Multiplexing Capture,R2,5P(BC)
+HTO_3,HTO_3,TTCCGCCTCTCTTTG,Multiplexing Capture,R2,5P(BC)
+HTO_4,HTO_4,AGTAAGTTCAGCGTA,Multiplexing Capture,R2,5P(BC)
+
+When the cmo information is filled into the sample configuration file, it will directly use the cmo ID as the multiplexing sample ID. This would need to be manually edited if a different entry would want to be included.
+Once all the supplemental information is filled, use the rerun option in the wrapper to start the pipeline.
+If the CellRanger only analysis was requested, then for 15 samples the pipeline will run the following jobs for single cell multi analysis.
+++If there are multiple libraries with different features used, you can set
+cmo
as a dictionary with the library Name as key and cmo csv file as value:
cmo={"CD8_d8_1": "cmo1.csv", "CD8_d8_2": "cmo2.csv", "CD8_d8_3": "cmo3.csv"}
+
+There is currently no downstream analysis pipeline, and so if it is requested the CellRanger only pipeline would still be run.
+++Note: The
+premrna
flag is deprecated and not used anymore. The exclude-introns can be used to disable include-introns flag in CellRanger.
++Note: The
+force
flag and expect flag can be used to use theforce-cells
flag andexpect-cells
when running the CellRanger multi analysis. Theforce
flag andexpect
flag are mutually exclusive.
++A Special case (multi with VDJ analysis for selective samples only):
++
+- There may be a project where some samples would require GEX, ADT, but not VDJ analysis while other samples from the same project would require all three analyses. For example: GEX1 and GEX2 are T-cell populations and that is the reason they have GEX and ADT only. While, only GEX3 and GEX4 have VDJ.
+
Cellranger multi does not support HTO libraries with > 12 multiplexing tags. For projects with this case, we can use --count flag to enable the snakemake pipeline to run ‘cellranger count’. +Please see the link below for an example of the command line and config.py:
+https://github.com/CCRSF-IFX/SF-project-tracking/issues/1#issuecomment-1934959534
+An example of ‘libraries.csv’ can be obtained in the link below: +https://github.com/CCRSF-IFX/SF-project-tracking/issues/1#issuecomment-1934963263
+The type of HTO libraries are set to “Custom”.
+An example of features.csv file can be obtained in the link below: +https://github.com/CCRSF-IFX/SF-project-tracking/issues/1#issuecomment-1934964902
+The feature_type of HTO barcodes are set to “Custom”.
+ +config.py
fileunaligned=["/scratch/ccrsf_scratch/scratch/Illumina_Demultiplex/NovaSeq/20240816_LH00584_0065_B22NK2FLT3/22NK2FLT3/outs/fastq_path/22NK2FLT3"]
+analysis="/mnt/ccrsf-static/singlecell_projects/PamelaSchwartzberg_CS037516_9sclibs_08072024/Analysis_demultiplex"
+ref="mm10"
+projectname="PamelaSchwartzberg_CS037516_9sclibs_08072024"
+yields="1411633.0"
+archive=True
+runs="20240816_LH00584_0065_B22NK2FLT3"
+libraries="libraries.csv"
+pipeline="multi"
+cmo="cmo.csv"
+aggregate=False
+
+libraries.csv
In libraries.csv
, only Gene Expression
and Multiple Capture
samples are included to run multi
pipeline. The purpose of this step is demultiplexing.
Name,Flowcell,Sample,Type
+CD8_d8_1,22NK2FLT3,1_cDNA_CD8_d8_1,Gene Expression
+CD8_d8_2,22NK2FLT3,2_cDNA_CD8_d8_2,Gene Expression
+CD8_d8_3,22NK2FLT3,3_cDNA_CD8_d8_3,Gene Expression
+CD8_d8_1,22NK2FLT3,4_HTO_CD8_d8_1,Multiplexing Capture
+CD8_d8_2,22NK2FLT3,5_HTO_CD8_d8_2,Multiplexing Capture
+CD8_d8_3,22NK2FLT3,6_HTO_CD8_d8_3,Multiplexing Capture
+
+cmo.csv
id,name,sequence,feature_type,read,pattern
+A0301,A0301,ACCCACCAGTAAGAC,Multiplexing Capture,R2,5PNNNNNNNNNN(BC)
+A0302,A0302,GGTCGAGAGCATTCA,Multiplexing Capture,R2,5PNNNNNNNNNN(BC)
+A0303,A0303,CTTGCCGCATGTCAT,Multiplexing Capture,R2,5PNNNNNNNNNN(BC)
+A0304,A0304,AAAGCATTCTTCACG,Multiplexing Capture,R2,5PNNNNNNNNNN(BC)
+A0305,A0305,CTTTGTCTTTGTGAG,Multiplexing Capture,R2,5PNNNNNNNNNN(BC)
+A0306,A0306,TATGCTGCCACGGTA,Multiplexing Capture,R2,5PNNNNNNNNNN(BC)
+A0307,A0307,GAGTCTGCCAGTATC,Multiplexing Capture,R2,5PNNNNNNNNNN(BC)
+A0308,A0308,TATAGAACGCCAGGC,Multiplexing Capture,R2,5PNNNNNNNNNN(BC)
+A0309,A0309,TGCCTATGAAACAAG,Multiplexing Capture,R2,5PNNNNNNNNNN(BC)
+A0310,A0310,CCGATTGTAACAGAC,Multiplexing Capture,R2,5PNNNNNNNNNN(BC)
+
+In this step, you will run the snakemake pipeline to: +1. Convert per-sample BAM file to FASTQ files +2. Prepare the folder for next step
+conda activate /mnt/ccrsf-ifx/Software/tools/conda_env4scwf
+module load samtools
+snakemake -s multi_5p_bam2fastq.py --configfile config.yaml --profile ./slurm/ -e cluster-generic -j 20
+
+indir: "/mnt/ccrsf-static/singlecell_projects/PamelaSchwartzberg_CS037516_9sclibs_08072024/Analysis_demultiplex/"
+outdir: "outdir"
+
+Job stats:
+job count
+------------------ -------
+all 1
+bamtofastq 30
+prep_raw_fq_folder 30
+total 61
+
+Only GEX and VDJ data is used in the example project.
+Prepare fastq files so that the folder structure meets what cellranger mkfastq
outputs.
Prepare libraries.csv
In the example we used, there are 3 GEX libraries and each GEX library has 10 samples multiplexed together. So in total, there are 30 samples. There are 3 corresponding VDJ libraries. Since we don't perform demultiplexing using HOT on VDJ data. So for the 10 samples from a particular GEX library, same VDJ data were provided.
+In the libraries.csv
file below, only samples from D8_d8_1
are shown.
Name,Flowcell,Sample,Type,force-cells
+CD8_d8_1_A0301,22NK2FLT3,CD8_d8_1_A0301,Gene Expression,1049
+CD8_d8_1_A0302,22NK2FLT3,CD8_d8_1_A0302,Gene Expression,857
+CD8_d8_1_A0303,22NK2FLT3,CD8_d8_1_A0303,Gene Expression,1042
+CD8_d8_1_A0304,22NK2FLT3,CD8_d8_1_A0304,Gene Expression,957
+CD8_d8_1_A0305,22NK2FLT3,CD8_d8_1_A0305,Gene Expression,986
+CD8_d8_1_A0306,22NK2FLT3,CD8_d8_1_A0306,Gene Expression,824
+CD8_d8_1_A0307,22NK2FLT3,CD8_d8_1_A0307,Gene Expression,1062
+CD8_d8_1_A0308,22NK2FLT3,CD8_d8_1_A0308,Gene Expression,897
+CD8_d8_1_A0309,22NK2FLT3,CD8_d8_1_A0309,Gene Expression,825
+CD8_d8_1_A0310,22NK2FLT3,CD8_d8_1_A0310,Gene Expression,1179
+CD8_d8_1_A0301,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,1049
+CD8_d8_1_A0302,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,857
+CD8_d8_1_A0303,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,1042
+CD8_d8_1_A0304,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,957
+CD8_d8_1_A0305,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,986
+CD8_d8_1_A0306,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,824
+CD8_d8_1_A0307,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,1062
+CD8_d8_1_A0308,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,897
+CD8_d8_1_A0309,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,825
+CD8_d8_1_A0310,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,1179
+
+Demultiplexing and Analyzing 5’ Immune Profiling Libraries Pooled with Hashtags
+ +' + escapeHtml(summary) +'
' + noResultsText + '
'); + } +} + +function doSearch () { + var query = document.getElementById('mkdocs-search-query').value; + if (query.length > min_search_length) { + if (!window.Worker) { + displayResults(search(query)); + } else { + searchWorker.postMessage({query: query}); + } + } else { + // Clear results for short queries + displayResults([]); + } +} + +function initSearch () { + var search_input = document.getElementById('mkdocs-search-query'); + if (search_input) { + search_input.addEventListener("keyup", doSearch); + } + var term = getSearchTermFromLocation(); + if (term) { + search_input.value = term; + doSearch(); + } +} + +function onWorkerMessage (e) { + if (e.data.allowSearch) { + initSearch(); + } else if (e.data.results) { + var results = e.data.results; + displayResults(results); + } else if (e.data.config) { + min_search_length = e.data.config.min_search_length-1; + } +} + +if (!window.Worker) { + console.log('Web Worker API not supported'); + // load index in main thread + $.getScript(joinUrl(base_url, "search/worker.js")).done(function () { + console.log('Loaded worker'); + init(); + window.postMessage = function (msg) { + onWorkerMessage({data: msg}); + }; + }).fail(function (jqxhr, settings, exception) { + console.error('Could not load worker.js'); + }); +} else { + // Wrap search in a web worker + var searchWorker = new Worker(joinUrl(base_url, "search/worker.js")); + searchWorker.postMessage({init: true}); + searchWorker.onmessage = onWorkerMessage; +} diff --git a/search/search_index.json b/search/search_index.json new file mode 100644 index 0000000..c08b33e --- /dev/null +++ b/search/search_index.json @@ -0,0 +1 @@ +{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"scMaestro scMaestro , developed by CCRSF IFX, is aim to provide an end-to-end solution for single cell sequencing data. Quick start To run scMastro , activation of an conda environment singularity (>v4.1.5) are required. Clone git repo git clone --recurse-submodules https://github.com/CCRSF-IFX/scMaestro.git cd scMaestro/ Install conda environment: conda env create -n scMastro -f environment.yml You only need to run the command above once. Activate conda environment conda activate scMastro Run scMastro ./scMastro usage: scMaestro [-h] {rna,multi,vdj,atac,multiome,fixedrna,rerun,dryrun,unlock} ... scMastro: Comprehensive workflow for processing single cell sequencing data positional arguments: {rna,multi,vdj,atac,multiome,fixedrna,rerun,dryrun,unlock} sub-command help rna Snakemake pipeline for 10x scRNA-seq data analysis multi Snakemake pipeline for 10x CellRanger multi data analysis vdj Snakemake pipeline for 10x VDJ data atac Snakemake pipeline for 10x scATAC-seq data analysis multiome Snakemake pipeline for 10x Multiome ATAC + GEX data fixedrna Snakemake pipeline for 10x Fixed RNA profiling (FRP) data rerun Equavalent of snakemake --rerun dryrun Equavalent of snakemake --dryrun unlock Equavalent of snakemake --unlock optional arguments: -h, --help show this help message and exit","title":"About"},{"location":"#scmaestro","text":"scMaestro , developed by CCRSF IFX, is aim to provide an end-to-end solution for single cell sequencing data.","title":"scMaestro"},{"location":"#quick-start","text":"To run scMastro , activation of an conda environment singularity (>v4.1.5) are required. Clone git repo git clone --recurse-submodules https://github.com/CCRSF-IFX/scMaestro.git cd scMaestro/ Install conda environment: conda env create -n scMastro -f environment.yml You only need to run the command above once. Activate conda environment conda activate scMastro Run scMastro ./scMastro usage: scMaestro [-h] {rna,multi,vdj,atac,multiome,fixedrna,rerun,dryrun,unlock} ... scMastro: Comprehensive workflow for processing single cell sequencing data positional arguments: {rna,multi,vdj,atac,multiome,fixedrna,rerun,dryrun,unlock} sub-command help rna Snakemake pipeline for 10x scRNA-seq data analysis multi Snakemake pipeline for 10x CellRanger multi data analysis vdj Snakemake pipeline for 10x VDJ data atac Snakemake pipeline for 10x scATAC-seq data analysis multiome Snakemake pipeline for 10x Multiome ATAC + GEX data fixedrna Snakemake pipeline for 10x Fixed RNA profiling (FRP) data rerun Equavalent of snakemake --rerun dryrun Equavalent of snakemake --dryrun unlock Equavalent of snakemake --unlock optional arguments: -h, --help show this help message and exit","title":"Quick start"},{"location":"atac/","text":"./scMaestro atac -h usage: scMaestro atac [-h] -f [FASTQPATH [FASTQPATH ...]] -g GENOME -r REFERENCE FOLDER optional arguments: -h, --help show this help message and exit -f [FASTQPATH [FASTQPATH ...]], --fastqs [FASTQPATH [FASTQPATH ...]] Path(s) to fastq files, multiple paths can be provided together. eg. \"-f path1 path2\" -g GENOME, --genome GENOME Genome build, e.g. \"hg38\", \"mm10\" -r REFERENCE FOLDER, --reference REFERENCE FOLDER Reference folder for alignment. VDJ reference is needed if subcommand \"vdj\" is used.","title":"Atac"},{"location":"atac/#_1","text":"","title":""},{"location":"atac/#_2","text":"./scMaestro atac -h usage: scMaestro atac [-h] -f [FASTQPATH [FASTQPATH ...]] -g GENOME -r REFERENCE FOLDER optional arguments: -h, --help show this help message and exit -f [FASTQPATH [FASTQPATH ...]], --fastqs [FASTQPATH [FASTQPATH ...]] Path(s) to fastq files, multiple paths can be provided together. eg. \"-f path1 path2\" -g GENOME, --genome GENOME Genome build, e.g. \"hg38\", \"mm10\" -r REFERENCE FOLDER, --reference REFERENCE FOLDER Reference folder for alignment. VDJ reference is needed if subcommand \"vdj\" is used.","title":""},{"location":"custom_setting/","text":"Resource allocation Cellranger is very resouce intensive. So resource allocation is critical for counts. By default, Cell Ranger runs locally by default (or when specified as --jobmode=local), using 90% of available memory and all of the available cores.","title":"Custom setting"},{"location":"custom_setting/#_1","text":"","title":""},{"location":"custom_setting/#resource-allocation","text":"Cellranger is very resouce intensive. So resource allocation is critical for counts. By default, Cell Ranger runs locally by default (or when specified as --jobmode=local), using 90% of available memory and all of the available cores.","title":"Resource allocation"},{"location":"design/","text":"Design of Single Cell Pipeline scMaestro is a workflow wrapper of a set of snakemake workflows. The snakemake workflow component SF_sc-smk-wl is used as a submodule of the repo for workflow wrapper . With the snakemake workflow as a seperate github repo, the wrappers we designed for internal use ( SF_scMaestro )( private repo ) and external use ( scMaetro ) can share the same snakemake workflow. This separation allows us to open-source our Snakemake workflow together with the wrapper for external users. One feature of scMaestro is that the snakemake workflow will be copied to the analysis folder. The benefit that this provides a self-contained and reproducible environment for the analysis. Another feature is that cMaestro relies on singularity. Singularity images will be created at the beginning of snakemake workflow run. This feature again improves the reproducibility of the analysis. The singularity-related arguments use-singularity and singularity-args for snakemake workflow can be found here .","title":"Design"},{"location":"design/#design-of-single-cell-pipeline","text":"scMaestro is a workflow wrapper of a set of snakemake workflows. The snakemake workflow component SF_sc-smk-wl is used as a submodule of the repo for workflow wrapper . With the snakemake workflow as a seperate github repo, the wrappers we designed for internal use ( SF_scMaestro )( private repo ) and external use ( scMaetro ) can share the same snakemake workflow. This separation allows us to open-source our Snakemake workflow together with the wrapper for external users. One feature of scMaestro is that the snakemake workflow will be copied to the analysis folder. The benefit that this provides a self-contained and reproducible environment for the analysis. Another feature is that cMaestro relies on singularity. Singularity images will be created at the beginning of snakemake workflow run. This feature again improves the reproducibility of the analysis. The singularity-related arguments use-singularity and singularity-args for snakemake workflow can be found here .","title":"Design of Single Cell Pipeline"},{"location":"fixedrna/","text":"Single Cell Fixed RNA Profiling workflow: fixedrna Information required before running the pipeline Reference/Probe_set Singleplex or multiplex library How to run ./scMaestro fixedrna --help usage: scMaestro fixedrna [-h] -f [FASTQPATH [FASTQPATH ...]] -g GENOME -l LIBRARY_CONFIG --probe_set PROBE_SET [--singleplex] [--multiplex MULTIPLEX] optional arguments: -h, --help show this help message and exit -f [FASTQPATH [FASTQPATH ...]], --fastqs [FASTQPATH [FASTQPATH ...]] Path(s) to fastq files, multiple paths can be provided together. eg. \"-f path1 path2\" -g GENOME, --genome GENOME Genome build, e.g. \"hg38\", \"mm10\" -l LIBRARY_CONFIG, --library_config LIBRARY_CONFIG CSV file with the library configration. --probe_set PROBE_SET Probe set for FRP data --singleplex Singleplex FRP --multiplex MULTIPLEX Mutiplexing information for FRP The parameters --singleplex and --multiplex are mutually excluesive and at least one of them has to be specified. Here is an exmaple of config.py file: unaligned=[\"/mnt/ccrsf-static/Analysis/xies4/github_repos/pipeline_dev_test/raw_fastq/fixedrna/221102_VH00271_218_AAC3K5WM5/AAC3K5WM5/outs/fastq_path/AAC3K5WM5\"] analysis=\"/mnt/ccrsf-static/Analysis/xies4/github_repos/pipeline_dev_test/test_fixedscRNA_singleplex/BaoTran_CS033083_1scRNAseq_FixedRNA_101422/Analysis\" ref=\"hg38\" projectname=\"BaoTran_CS033083_1scRNAseq_FixedRNA_101422\" libraries=\"libraries.csv\" probe_set=\"/mnt/ccrsf-ifx/Software/tools/GemCode/cellranger-7.0.1/probe_sets/Chromium_Human_Transcriptome_Probe_Set_v1.0_GRCh38-2020-A.csv\" pipeline=\"fixedrna\" aggregate=False # Default value for fixedrna Here is libraries.csv : Name,Flowcell,Sample,Type LIB1,AAC3K5WM5,LIB1,Gene Expression LIB2,AAC3K5WM5,LIB2,Gene Expression LIB3,AAC3K5WM5,LIB3,Gene Expression For multiplexed libraries, here is the config file: unaligned=[\"/mnt/ccrsf-static/Analysis/xies4/github_repos/pipeline_dev_test/test_fixedscRNA/223LH3LT1/outs/fastq_path/223LH3LT1\"] analysis=\"/mnt/ccrsf-static/Analysis/xies4/github_repos/pipeline_dev_test/test_fixedscRNA/Analysis\" ref=\"hg38\" projectname=\"DanielMcVicar_CS036898_4fixedscRNA_07302024\" pipeline=\"fixedrna\" probe_set=\"/mnt/ccrsf-ifx/Software/tools/GemCode/cellranger-8.0.1/probe_sets/Chromium_Human_Transcriptome_Probe_Set_v1.0.1_GRCh38-2020-A.csv\" libraries=\"libraries.csv\" multiplex=\"multiplex.csv\" aggregate=False # Default value for fixedrna libraries.csv is the sample as singleplex library. Here is an example for multiplex.csv Name,sample_id,probe_barcode_ids,description pool_1,Young_RepSox_3nM,BC001,Young_RepSox_3nM pool_1,Young_PC3EVNeut,BC002,Young_PC3EVNeut pool_2,Young_Enza_20uM,BC001,Young_Enza_20uM pool_2,Aged_PC3EV,BC002,Aged_PC3EV pool_3,Young_Combo,BC001,Young_Combo pool_3,MDA_PCa2b,BC004,MDA_PCa2b pool_4,Young_PC3EV,BC001,Young_PC3EV pool_4,Young_PC3Stat5,BC002,Young_PC3Stat5","title":"Subcommand `fixedrna`"},{"location":"fixedrna/#single-cell-fixed-rna-profiling-workflow-fixedrna","text":"","title":"Single Cell Fixed RNA Profiling workflow: fixedrna"},{"location":"fixedrna/#information-required-before-running-the-pipeline","text":"Reference/Probe_set Singleplex or multiplex library","title":"Information required before running the pipeline"},{"location":"fixedrna/#how-to-run","text":"./scMaestro fixedrna --help usage: scMaestro fixedrna [-h] -f [FASTQPATH [FASTQPATH ...]] -g GENOME -l LIBRARY_CONFIG --probe_set PROBE_SET [--singleplex] [--multiplex MULTIPLEX] optional arguments: -h, --help show this help message and exit -f [FASTQPATH [FASTQPATH ...]], --fastqs [FASTQPATH [FASTQPATH ...]] Path(s) to fastq files, multiple paths can be provided together. eg. \"-f path1 path2\" -g GENOME, --genome GENOME Genome build, e.g. \"hg38\", \"mm10\" -l LIBRARY_CONFIG, --library_config LIBRARY_CONFIG CSV file with the library configration. --probe_set PROBE_SET Probe set for FRP data --singleplex Singleplex FRP --multiplex MULTIPLEX Mutiplexing information for FRP The parameters --singleplex and --multiplex are mutually excluesive and at least one of them has to be specified. Here is an exmaple of config.py file: unaligned=[\"/mnt/ccrsf-static/Analysis/xies4/github_repos/pipeline_dev_test/raw_fastq/fixedrna/221102_VH00271_218_AAC3K5WM5/AAC3K5WM5/outs/fastq_path/AAC3K5WM5\"] analysis=\"/mnt/ccrsf-static/Analysis/xies4/github_repos/pipeline_dev_test/test_fixedscRNA_singleplex/BaoTran_CS033083_1scRNAseq_FixedRNA_101422/Analysis\" ref=\"hg38\" projectname=\"BaoTran_CS033083_1scRNAseq_FixedRNA_101422\" libraries=\"libraries.csv\" probe_set=\"/mnt/ccrsf-ifx/Software/tools/GemCode/cellranger-7.0.1/probe_sets/Chromium_Human_Transcriptome_Probe_Set_v1.0_GRCh38-2020-A.csv\" pipeline=\"fixedrna\" aggregate=False # Default value for fixedrna Here is libraries.csv : Name,Flowcell,Sample,Type LIB1,AAC3K5WM5,LIB1,Gene Expression LIB2,AAC3K5WM5,LIB2,Gene Expression LIB3,AAC3K5WM5,LIB3,Gene Expression For multiplexed libraries, here is the config file: unaligned=[\"/mnt/ccrsf-static/Analysis/xies4/github_repos/pipeline_dev_test/test_fixedscRNA/223LH3LT1/outs/fastq_path/223LH3LT1\"] analysis=\"/mnt/ccrsf-static/Analysis/xies4/github_repos/pipeline_dev_test/test_fixedscRNA/Analysis\" ref=\"hg38\" projectname=\"DanielMcVicar_CS036898_4fixedscRNA_07302024\" pipeline=\"fixedrna\" probe_set=\"/mnt/ccrsf-ifx/Software/tools/GemCode/cellranger-8.0.1/probe_sets/Chromium_Human_Transcriptome_Probe_Set_v1.0.1_GRCh38-2020-A.csv\" libraries=\"libraries.csv\" multiplex=\"multiplex.csv\" aggregate=False # Default value for fixedrna libraries.csv is the sample as singleplex library. Here is an example for multiplex.csv Name,sample_id,probe_barcode_ids,description pool_1,Young_RepSox_3nM,BC001,Young_RepSox_3nM pool_1,Young_PC3EVNeut,BC002,Young_PC3EVNeut pool_2,Young_Enza_20uM,BC001,Young_Enza_20uM pool_2,Aged_PC3EV,BC002,Aged_PC3EV pool_3,Young_Combo,BC001,Young_Combo pool_3,MDA_PCa2b,BC004,MDA_PCa2b pool_4,Young_PC3EV,BC001,Young_PC3EV pool_4,Young_PC3Stat5,BC002,Young_PC3Stat5","title":"How to run"},{"location":"gex/","text":"Subcommand rna Workflow rna usage ./scMaestro rna -h usage: scMaestro rna [-h] -f [FASTQPATH [FASTQPATH ...]] -g GENOME -r REFERENCE FOLDER [--fullanalysis] [--force FORCE | --expect EXPECT] [--exclude-introns] optional arguments: -h, --help show this help message and exit -f [FASTQPATH [FASTQPATH ...]], --fastqs [FASTQPATH [FASTQPATH ...]] Path(s) to fastq files, multiple paths can be provided together. eg. \"-f path1 path2\" -g GENOME, --genome GENOME Genome build, e.g. \"hg38\", \"mm10\" -r REFERENCE FOLDER, --reference REFERENCE FOLDER Reference folder for alignment. VDJ reference is needed if subcommand \"vdj\" is used. --fullanalysis Run full analysis pipeline --force FORCE Run Cell Ranger with --force-cell --expect EXPECT Run Cell Ranger with --expect-cells --exclude-introns Exclude intronic reads in count. To maximize sensitivity for whole transcriptome 3\u2019/5\u2019 Single Cell Gene Expression and 3\u2019 Cell Multiplexing experiments, introns will be included in the analysis by default for cellranger (>v7.0.0) count and multi. Option '--fullanalysis' If --fullanalysis is enabled. Quality control, PCA, clustering, annotation will be performed for each sample seperately using Seurat and SingleR . An html report will be generated. An example can be found here .","title":"Subcommand `rna`"},{"location":"gex/#subcommand-rna","text":"","title":"Subcommand rna"},{"location":"gex/#workflow-rna-usage","text":"./scMaestro rna -h usage: scMaestro rna [-h] -f [FASTQPATH [FASTQPATH ...]] -g GENOME -r REFERENCE FOLDER [--fullanalysis] [--force FORCE | --expect EXPECT] [--exclude-introns] optional arguments: -h, --help show this help message and exit -f [FASTQPATH [FASTQPATH ...]], --fastqs [FASTQPATH [FASTQPATH ...]] Path(s) to fastq files, multiple paths can be provided together. eg. \"-f path1 path2\" -g GENOME, --genome GENOME Genome build, e.g. \"hg38\", \"mm10\" -r REFERENCE FOLDER, --reference REFERENCE FOLDER Reference folder for alignment. VDJ reference is needed if subcommand \"vdj\" is used. --fullanalysis Run full analysis pipeline --force FORCE Run Cell Ranger with --force-cell --expect EXPECT Run Cell Ranger with --expect-cells --exclude-introns Exclude intronic reads in count. To maximize sensitivity for whole transcriptome 3\u2019/5\u2019 Single Cell Gene Expression and 3\u2019 Cell Multiplexing experiments, introns will be included in the analysis by default for cellranger (>v7.0.0) count and multi.","title":"Workflow rna usage"},{"location":"gex/#option-fullanalysis","text":"If --fullanalysis is enabled. Quality control, PCA, clustering, annotation will be performed for each sample seperately using Seurat and SingleR . An html report will be generated. An example can be found here .","title":"Option '--fullanalysis'"},{"location":"license/","text":"","title":"License"},{"location":"multi/","text":"multi This pipeline is used in situations when analyzing cell multiplexing data or projects with samples that have both gene expression and VDJ captures. More information about this analysis and when to use it can be found at 10x support webpages: Cell Multiplexing with cellranger multi Gene Expression, V(D)J & Feature Barcode Analysis with cellranger multi ./scMaestro multi -h usage: scMaestro multi [-h] -f [FASTQPATH [FASTQPATH ...]] -g GENOME -r REFERENCE FOLDER [--chain {auto,TR,IG}] [--fullanalysis] -l LIBRARY_CONFIG [--vdj_ref VDJ_REF] [--cmo] [--count] optional arguments: -h, --help show this help message and exit -f [FASTQPATH [FASTQPATH ...]], --fastqs [FASTQPATH [FASTQPATH ...]] Path(s) to fastq files, multiple paths can be provided together. eg. \"-f path1 path2\" -g GENOME, --genome GENOME Genome build, e.g. \"hg38\", \"mm10\" -r REFERENCE FOLDER, --reference REFERENCE FOLDER Reference folder for alignment. VDJ reference is needed if subcommand \"vdj\" is used. --chain {auto,TR,IG} Force the analysis to be carried out for a particular chain type. --fullanalysis Run full analysis pipeline -l LIBRARY_CONFIG, --library_config LIBRARY_CONFIG CSV file with the library configration. --vdj_ref VDJ_REF Reference folder for VDJ data --cmo CMO information will be used for multi analysis --count Run cellranger count for projects with HTO libraries with more 10 individuals mixed. The libraries csv file would also need to be supplied in config.py . For example: libraries=\"libraries.csv\" The libraries file would contain the final sample name, flowcell, demultiplex sample name, and library type for each sample. For example: Name,Flowcell,Sample,Type IL15_LNs,H7CNNBGXG,IL15_LNs,Gene Expression IL15_LNs,H7CT7BGXG,IL15_LNs_BC,Antibody Capture IL15_TUMOR_CD11,H7CNNBGXG,IL15_TUMOR_CD11,Gene Expression IL15_TUMOR_CD11,H7CT7BGXG,IL15_TUMOR_CD11_BC,Antibody Capture Final sample name will be the name that is given to CellRanger as the name of the sample. Flowcell is the flowcell that contains the FASTQ files for this set of data. This can be the full path or just a unique identifier since the script will pull the full path from the config.py file. Sample is the sample name that was used when demultiplexing using cellranger mkfastq , and so should match the FASTQ files. Type is the library type for each sample. Current supported options are: Gene Expression, VDJ, CRISPR Guide Capture, Antibody Capture, and Multiplexing Capture The pipeline will use the provided libraries file to make a library file for each individual sample. This will then be processed again to create the configuration file in the format expected by CellRanger. Due to the lack of CRISPR projects at the facility, the pipeline was never tested for this technology. If this pipeline is run on a sample with CRISPR capture, check the generated sample configuration file to ensure that it matches what is expected. For multi task with VDJ libraries, \u201cdonor\u201d and \u201corigin\u201d are required in \"libraries.csv\" for aggregation. The link (here) shows how the donor and origin information will be used in the data analysis: Donor: An individual from whom adaptive immune cells (T cells, B cells) are collected (e.g. a sister and a brother would each be considered unique donors for the purposes of V(D)J aggregation). Origin: The specific source from which a dataset of cells is derived. Name,Flowcell,Sample,Type,Donor,Origin The chain information is specified in feature_types column. Valid specifications include VDJ , VDJ-T , VDJ-B , or VDJ-T-GD , and the combinations: VDJ-T & VDJ-B VDJ-T-GD & VDJ-B VDJ-T & VDJ-T-GD & VDJ-B For multi task with CRISPR Guide Capture libraries, feature_reference column is required to be put in the fifth column in libraries.csv . Name,Flowcell,Sample,Type,Feature F1Test,AACCCHVM5,F1CRISPR_Library,Gene Expression,crispr_feature_reference1.csv F1Test,AACCCHVM5,F1GE_Library,CRISPR Guide Capture,crispr_feature_reference1.csv F2Test,AACCCHVM5,F2CRISPR_Library,Gene Expression,crispr_feature_reference2.csv F2Test,AACCCHVM5,F2GE_Library,CRISPR Guide Capture,crispr_feature_reference2.csv F3Test,AACCCHVM5,F3CRISPR_Library,Gene Expression,crispr_feature_reference3.csv F3Test,AACCCHVM5,F3GE_Library,CRISPR Guide Capture,crispr_feature_reference3.csv The descriptions of the feature reference CSV file can be found here . In case of varying feature references for different GEX/CRISPR pairs, the fifth column can be used to provide different references. If an antibody capture was used, then a feature reference file will need to be provided in the config.py file with a features entry. There is currently no pipeline flag to add this, and it will need to be provided manually. For example: features=\"features.csv\" The feature reference file would contain (at minimum) a unique ID for the feature, human readable name, read, pattern, sequence, and feature type. For example: id,name,sequence,feature_type,read,pattern CITE_CD64,CD64,AGCAATTAACGGGAG,Antibody Capture,R2,5PNNNNNNNNNN(BC) CITE_F4_80,F4_80,TTAACTTCAGCCCGT,Antibody Capture,R2,5PNNNNNNNNNN(BC) CITE_CD8a,CD8a,TACCCGTAATAGCGT,Antibody Capture,R2,5PNNNNNNNNNN(BC) CITE_XCR1,XCR1,TCCATTACCCACGTT,Antibody Capture,R2,5PNNNNNNNNNN(BC) No space is allowed for the id field and the id must be unique. Unless the information is provided, it is probably easiest to determine the pattern by checking the FASTQ file for the sequence location. More information can be found at: https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/feature-bc-analysis If a multiplexing capture was used, then the pipeline can be called with the cmo flag. This will add a cmo entry into the config.py file, which can then be edited to provide the cmo (cell multiplexing oligo) reference file. For example: cmo=\"cmo.csv\" The cmo reference file has a very similar format to the feature reference file. The difference is that the feature type would be Multiplexing Capture. For example: id,name,sequence,feature_type,read,pattern HTO_1,HTO_1,GTCAACTCTTTAGCG,Multiplexing Capture,R2,5P(BC) HTO_2,HTO_2,TGATGGCCTATTGGG,Multiplexing Capture,R2,5P(BC) HTO_3,HTO_3,TTCCGCCTCTCTTTG,Multiplexing Capture,R2,5P(BC) HTO_4,HTO_4,AGTAAGTTCAGCGTA,Multiplexing Capture,R2,5P(BC) When the cmo information is filled into the sample configuration file, it will directly use the cmo ID as the multiplexing sample ID. This would need to be manually edited if a different entry would want to be included. Once all the supplemental information is filled, use the rerun option in the wrapper to start the pipeline. If the CellRanger only analysis was requested, then for 15 samples the pipeline will run the following jobs for single cell multi analysis. If there are multiple libraries with different features used, you can set cmo as a dictionary with the library Name as key and cmo csv file as value: cmo={\"CD8_d8_1\": \"cmo1.csv\", \"CD8_d8_2\": \"cmo2.csv\", \"CD8_d8_3\": \"cmo3.csv\"} There is currently no downstream analysis pipeline, and so if it is requested the CellRanger only pipeline would still be run. Note: The premrna flag is deprecated and not used anymore. The exclude-introns can be used to disable include-introns flag in CellRanger. Note: The force flag and expect flag can be used to use the force-cells flag and expect-cells when running the CellRanger multi analysis. The force flag and expect flag are mutually exclusive. A Special case (multi with VDJ analysis for selective samples only): There may be a project where some samples would require GEX, ADT, but not VDJ analysis while other samples from the same project would require all three analyses. For example: GEX1 and GEX2 are T-cell populations and that is the reason they have GEX and ADT only. While, only GEX3 and GEX4 have VDJ. Cellranger multi does not support HTO libraries with > 12 multiplexing tags. For projects with this case, we can use --count flag to enable the snakemake pipeline to run \u2018cellranger count\u2019. Please see the link below for an example of the command line and config.py: https://github.com/CCRSF-IFX/SF-project-tracking/issues/1#issuecomment-1934959534 An example of \u2018libraries.csv\u2019 can be obtained in the link below: https://github.com/CCRSF-IFX/SF-project-tracking/issues/1#issuecomment-1934963263 The type of HTO libraries are set to \u201cCustom\u201d. An example of features.csv file can be obtained in the link below: https://github.com/CCRSF-IFX/SF-project-tracking/issues/1#issuecomment-1934964902 The feature_type of HTO barcodes are set to \u201cCustom\u201d.","title":"Subcommand `multi`"},{"location":"multi/#multi","text":"This pipeline is used in situations when analyzing cell multiplexing data or projects with samples that have both gene expression and VDJ captures. More information about this analysis and when to use it can be found at 10x support webpages: Cell Multiplexing with cellranger multi Gene Expression, V(D)J & Feature Barcode Analysis with cellranger multi ./scMaestro multi -h usage: scMaestro multi [-h] -f [FASTQPATH [FASTQPATH ...]] -g GENOME -r REFERENCE FOLDER [--chain {auto,TR,IG}] [--fullanalysis] -l LIBRARY_CONFIG [--vdj_ref VDJ_REF] [--cmo] [--count] optional arguments: -h, --help show this help message and exit -f [FASTQPATH [FASTQPATH ...]], --fastqs [FASTQPATH [FASTQPATH ...]] Path(s) to fastq files, multiple paths can be provided together. eg. \"-f path1 path2\" -g GENOME, --genome GENOME Genome build, e.g. \"hg38\", \"mm10\" -r REFERENCE FOLDER, --reference REFERENCE FOLDER Reference folder for alignment. VDJ reference is needed if subcommand \"vdj\" is used. --chain {auto,TR,IG} Force the analysis to be carried out for a particular chain type. --fullanalysis Run full analysis pipeline -l LIBRARY_CONFIG, --library_config LIBRARY_CONFIG CSV file with the library configration. --vdj_ref VDJ_REF Reference folder for VDJ data --cmo CMO information will be used for multi analysis --count Run cellranger count for projects with HTO libraries with more 10 individuals mixed. The libraries csv file would also need to be supplied in config.py . For example: libraries=\"libraries.csv\" The libraries file would contain the final sample name, flowcell, demultiplex sample name, and library type for each sample. For example: Name,Flowcell,Sample,Type IL15_LNs,H7CNNBGXG,IL15_LNs,Gene Expression IL15_LNs,H7CT7BGXG,IL15_LNs_BC,Antibody Capture IL15_TUMOR_CD11,H7CNNBGXG,IL15_TUMOR_CD11,Gene Expression IL15_TUMOR_CD11,H7CT7BGXG,IL15_TUMOR_CD11_BC,Antibody Capture Final sample name will be the name that is given to CellRanger as the name of the sample. Flowcell is the flowcell that contains the FASTQ files for this set of data. This can be the full path or just a unique identifier since the script will pull the full path from the config.py file. Sample is the sample name that was used when demultiplexing using cellranger mkfastq , and so should match the FASTQ files. Type is the library type for each sample. Current supported options are: Gene Expression, VDJ, CRISPR Guide Capture, Antibody Capture, and Multiplexing Capture The pipeline will use the provided libraries file to make a library file for each individual sample. This will then be processed again to create the configuration file in the format expected by CellRanger. Due to the lack of CRISPR projects at the facility, the pipeline was never tested for this technology. If this pipeline is run on a sample with CRISPR capture, check the generated sample configuration file to ensure that it matches what is expected. For multi task with VDJ libraries, \u201cdonor\u201d and \u201corigin\u201d are required in \"libraries.csv\" for aggregation. The link (here) shows how the donor and origin information will be used in the data analysis: Donor: An individual from whom adaptive immune cells (T cells, B cells) are collected (e.g. a sister and a brother would each be considered unique donors for the purposes of V(D)J aggregation). Origin: The specific source from which a dataset of cells is derived. Name,Flowcell,Sample,Type,Donor,Origin The chain information is specified in feature_types column. Valid specifications include VDJ , VDJ-T , VDJ-B , or VDJ-T-GD , and the combinations: VDJ-T & VDJ-B VDJ-T-GD & VDJ-B VDJ-T & VDJ-T-GD & VDJ-B For multi task with CRISPR Guide Capture libraries, feature_reference column is required to be put in the fifth column in libraries.csv . Name,Flowcell,Sample,Type,Feature F1Test,AACCCHVM5,F1CRISPR_Library,Gene Expression,crispr_feature_reference1.csv F1Test,AACCCHVM5,F1GE_Library,CRISPR Guide Capture,crispr_feature_reference1.csv F2Test,AACCCHVM5,F2CRISPR_Library,Gene Expression,crispr_feature_reference2.csv F2Test,AACCCHVM5,F2GE_Library,CRISPR Guide Capture,crispr_feature_reference2.csv F3Test,AACCCHVM5,F3CRISPR_Library,Gene Expression,crispr_feature_reference3.csv F3Test,AACCCHVM5,F3GE_Library,CRISPR Guide Capture,crispr_feature_reference3.csv The descriptions of the feature reference CSV file can be found here . In case of varying feature references for different GEX/CRISPR pairs, the fifth column can be used to provide different references. If an antibody capture was used, then a feature reference file will need to be provided in the config.py file with a features entry. There is currently no pipeline flag to add this, and it will need to be provided manually. For example: features=\"features.csv\" The feature reference file would contain (at minimum) a unique ID for the feature, human readable name, read, pattern, sequence, and feature type. For example: id,name,sequence,feature_type,read,pattern CITE_CD64,CD64,AGCAATTAACGGGAG,Antibody Capture,R2,5PNNNNNNNNNN(BC) CITE_F4_80,F4_80,TTAACTTCAGCCCGT,Antibody Capture,R2,5PNNNNNNNNNN(BC) CITE_CD8a,CD8a,TACCCGTAATAGCGT,Antibody Capture,R2,5PNNNNNNNNNN(BC) CITE_XCR1,XCR1,TCCATTACCCACGTT,Antibody Capture,R2,5PNNNNNNNNNN(BC) No space is allowed for the id field and the id must be unique. Unless the information is provided, it is probably easiest to determine the pattern by checking the FASTQ file for the sequence location. More information can be found at: https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/feature-bc-analysis If a multiplexing capture was used, then the pipeline can be called with the cmo flag. This will add a cmo entry into the config.py file, which can then be edited to provide the cmo (cell multiplexing oligo) reference file. For example: cmo=\"cmo.csv\" The cmo reference file has a very similar format to the feature reference file. The difference is that the feature type would be Multiplexing Capture. For example: id,name,sequence,feature_type,read,pattern HTO_1,HTO_1,GTCAACTCTTTAGCG,Multiplexing Capture,R2,5P(BC) HTO_2,HTO_2,TGATGGCCTATTGGG,Multiplexing Capture,R2,5P(BC) HTO_3,HTO_3,TTCCGCCTCTCTTTG,Multiplexing Capture,R2,5P(BC) HTO_4,HTO_4,AGTAAGTTCAGCGTA,Multiplexing Capture,R2,5P(BC) When the cmo information is filled into the sample configuration file, it will directly use the cmo ID as the multiplexing sample ID. This would need to be manually edited if a different entry would want to be included. Once all the supplemental information is filled, use the rerun option in the wrapper to start the pipeline. If the CellRanger only analysis was requested, then for 15 samples the pipeline will run the following jobs for single cell multi analysis. If there are multiple libraries with different features used, you can set cmo as a dictionary with the library Name as key and cmo csv file as value: cmo={\"CD8_d8_1\": \"cmo1.csv\", \"CD8_d8_2\": \"cmo2.csv\", \"CD8_d8_3\": \"cmo3.csv\"} There is currently no downstream analysis pipeline, and so if it is requested the CellRanger only pipeline would still be run. Note: The premrna flag is deprecated and not used anymore. The exclude-introns can be used to disable include-introns flag in CellRanger. Note: The force flag and expect flag can be used to use the force-cells flag and expect-cells when running the CellRanger multi analysis. The force flag and expect flag are mutually exclusive. A Special case (multi with VDJ analysis for selective samples only): There may be a project where some samples would require GEX, ADT, but not VDJ analysis while other samples from the same project would require all three analyses. For example: GEX1 and GEX2 are T-cell populations and that is the reason they have GEX and ADT only. While, only GEX3 and GEX4 have VDJ. Cellranger multi does not support HTO libraries with > 12 multiplexing tags. For projects with this case, we can use --count flag to enable the snakemake pipeline to run \u2018cellranger count\u2019. Please see the link below for an example of the command line and config.py: https://github.com/CCRSF-IFX/SF-project-tracking/issues/1#issuecomment-1934959534 An example of \u2018libraries.csv\u2019 can be obtained in the link below: https://github.com/CCRSF-IFX/SF-project-tracking/issues/1#issuecomment-1934963263 The type of HTO libraries are set to \u201cCustom\u201d. An example of features.csv file can be obtained in the link below: https://github.com/CCRSF-IFX/SF-project-tracking/issues/1#issuecomment-1934964902 The feature_type of HTO barcodes are set to \u201cCustom\u201d.","title":"multi"},{"location":"multi_5p/","text":"Demultiplexing and Analyzing 5\u2019 Immune Profiling Libraries Pooled with Hashtags Step1: demultiplexing Prepare config.py file unaligned=[\"/scratch/ccrsf_scratch/scratch/Illumina_Demultiplex/NovaSeq/20240816_LH00584_0065_B22NK2FLT3/22NK2FLT3/outs/fastq_path/22NK2FLT3\"] analysis=\"/mnt/ccrsf-static/singlecell_projects/PamelaSchwartzberg_CS037516_9sclibs_08072024/Analysis_demultiplex\" ref=\"mm10\" projectname=\"PamelaSchwartzberg_CS037516_9sclibs_08072024\" yields=\"1411633.0\" archive=True runs=\"20240816_LH00584_0065_B22NK2FLT3\" libraries=\"libraries.csv\" pipeline=\"multi\" cmo=\"cmo.csv\" aggregate=False Prepare libraries.csv In libraries.csv , only Gene Expression and Multiple Capture samples are included to run multi pipeline. The purpose of this step is demultiplexing. Name,Flowcell,Sample,Type CD8_d8_1,22NK2FLT3,1_cDNA_CD8_d8_1,Gene Expression CD8_d8_2,22NK2FLT3,2_cDNA_CD8_d8_2,Gene Expression CD8_d8_3,22NK2FLT3,3_cDNA_CD8_d8_3,Gene Expression CD8_d8_1,22NK2FLT3,4_HTO_CD8_d8_1,Multiplexing Capture CD8_d8_2,22NK2FLT3,5_HTO_CD8_d8_2,Multiplexing Capture CD8_d8_3,22NK2FLT3,6_HTO_CD8_d8_3,Multiplexing Capture Prepare cmo.csv id,name,sequence,feature_type,read,pattern A0301,A0301,ACCCACCAGTAAGAC,Multiplexing Capture,R2,5PNNNNNNNNNN(BC) A0302,A0302,GGTCGAGAGCATTCA,Multiplexing Capture,R2,5PNNNNNNNNNN(BC) A0303,A0303,CTTGCCGCATGTCAT,Multiplexing Capture,R2,5PNNNNNNNNNN(BC) A0304,A0304,AAAGCATTCTTCACG,Multiplexing Capture,R2,5PNNNNNNNNNN(BC) A0305,A0305,CTTTGTCTTTGTGAG,Multiplexing Capture,R2,5PNNNNNNNNNN(BC) A0306,A0306,TATGCTGCCACGGTA,Multiplexing Capture,R2,5PNNNNNNNNNN(BC) A0307,A0307,GAGTCTGCCAGTATC,Multiplexing Capture,R2,5PNNNNNNNNNN(BC) A0308,A0308,TATAGAACGCCAGGC,Multiplexing Capture,R2,5PNNNNNNNNNN(BC) A0309,A0309,TGCCTATGAAACAAG,Multiplexing Capture,R2,5PNNNNNNNNNN(BC) A0310,A0310,CCGATTGTAACAGAC,Multiplexing Capture,R2,5PNNNNNNNNNN(BC) Step 2: Convert per sample bam files to FASTQs for the GEX data In this step, you will run the snakemake pipeline to: 1. Convert per-sample BAM file to FASTQ files 2. Prepare the folder for next step conda activate /mnt/ccrsf-ifx/Software/tools/conda_env4scwf module load samtools snakemake -s multi_5p_bam2fastq.py --configfile config.yaml --profile ./slurm/ -e cluster-generic -j 20 indir: \"/mnt/ccrsf-static/singlecell_projects/PamelaSchwartzberg_CS037516_9sclibs_08072024/Analysis_demultiplex/\" outdir: \"outdir\" Job stats: job count ------------------ ------- all 1 bamtofastq 30 prep_raw_fq_folder 30 total 61 Step3: Final: Run cellranger multi for GEX, FB, TCR, and BCR data Only GEX and VDJ data is used in the example project. Prepare fastq files so that the folder structure meets what cellranger mkfastq outputs. Prepare libraries.csv In the example we used, there are 3 GEX libraries and each GEX library has 10 samples multiplexed together. So in total, there are 30 samples. There are 3 corresponding VDJ libraries. Since we don't perform demultiplexing using HOT on VDJ data. So for the 10 samples from a particular GEX library, same VDJ data were provided. In the libraries.csv file below, only samples from D8_d8_1 are shown. Name,Flowcell,Sample,Type,force-cells CD8_d8_1_A0301,22NK2FLT3,CD8_d8_1_A0301,Gene Expression,1049 CD8_d8_1_A0302,22NK2FLT3,CD8_d8_1_A0302,Gene Expression,857 CD8_d8_1_A0303,22NK2FLT3,CD8_d8_1_A0303,Gene Expression,1042 CD8_d8_1_A0304,22NK2FLT3,CD8_d8_1_A0304,Gene Expression,957 CD8_d8_1_A0305,22NK2FLT3,CD8_d8_1_A0305,Gene Expression,986 CD8_d8_1_A0306,22NK2FLT3,CD8_d8_1_A0306,Gene Expression,824 CD8_d8_1_A0307,22NK2FLT3,CD8_d8_1_A0307,Gene Expression,1062 CD8_d8_1_A0308,22NK2FLT3,CD8_d8_1_A0308,Gene Expression,897 CD8_d8_1_A0309,22NK2FLT3,CD8_d8_1_A0309,Gene Expression,825 CD8_d8_1_A0310,22NK2FLT3,CD8_d8_1_A0310,Gene Expression,1179 CD8_d8_1_A0301,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,1049 CD8_d8_1_A0302,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,857 CD8_d8_1_A0303,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,1042 CD8_d8_1_A0304,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,957 CD8_d8_1_A0305,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,986 CD8_d8_1_A0306,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,824 CD8_d8_1_A0307,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,1062 CD8_d8_1_A0308,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,897 CD8_d8_1_A0309,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,825 CD8_d8_1_A0310,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,1179 Reference Demultiplexing and Analyzing 5\u2019 Immune Profiling Libraries Pooled with Hashtags","title":"Demultiplexing and Analyzing 5\u2019 Immune Profiling Libraries Pooled with Hashtags"},{"location":"multi_5p/#demultiplexing-and-analyzing-5-immune-profiling-libraries-pooled-with-hashtags","text":"","title":"Demultiplexing and Analyzing 5\u2019 Immune Profiling Libraries Pooled with Hashtags"},{"location":"multi_5p/#step1-demultiplexing","text":"","title":"Step1: demultiplexing"},{"location":"multi_5p/#prepare-configpy-file","text":"unaligned=[\"/scratch/ccrsf_scratch/scratch/Illumina_Demultiplex/NovaSeq/20240816_LH00584_0065_B22NK2FLT3/22NK2FLT3/outs/fastq_path/22NK2FLT3\"] analysis=\"/mnt/ccrsf-static/singlecell_projects/PamelaSchwartzberg_CS037516_9sclibs_08072024/Analysis_demultiplex\" ref=\"mm10\" projectname=\"PamelaSchwartzberg_CS037516_9sclibs_08072024\" yields=\"1411633.0\" archive=True runs=\"20240816_LH00584_0065_B22NK2FLT3\" libraries=\"libraries.csv\" pipeline=\"multi\" cmo=\"cmo.csv\" aggregate=False","title":"Prepare config.py file"},{"location":"multi_5p/#prepare-librariescsv","text":"In libraries.csv , only Gene Expression and Multiple Capture samples are included to run multi pipeline. The purpose of this step is demultiplexing. Name,Flowcell,Sample,Type CD8_d8_1,22NK2FLT3,1_cDNA_CD8_d8_1,Gene Expression CD8_d8_2,22NK2FLT3,2_cDNA_CD8_d8_2,Gene Expression CD8_d8_3,22NK2FLT3,3_cDNA_CD8_d8_3,Gene Expression CD8_d8_1,22NK2FLT3,4_HTO_CD8_d8_1,Multiplexing Capture CD8_d8_2,22NK2FLT3,5_HTO_CD8_d8_2,Multiplexing Capture CD8_d8_3,22NK2FLT3,6_HTO_CD8_d8_3,Multiplexing Capture","title":"Prepare libraries.csv"},{"location":"multi_5p/#prepare-cmocsv","text":"id,name,sequence,feature_type,read,pattern A0301,A0301,ACCCACCAGTAAGAC,Multiplexing Capture,R2,5PNNNNNNNNNN(BC) A0302,A0302,GGTCGAGAGCATTCA,Multiplexing Capture,R2,5PNNNNNNNNNN(BC) A0303,A0303,CTTGCCGCATGTCAT,Multiplexing Capture,R2,5PNNNNNNNNNN(BC) A0304,A0304,AAAGCATTCTTCACG,Multiplexing Capture,R2,5PNNNNNNNNNN(BC) A0305,A0305,CTTTGTCTTTGTGAG,Multiplexing Capture,R2,5PNNNNNNNNNN(BC) A0306,A0306,TATGCTGCCACGGTA,Multiplexing Capture,R2,5PNNNNNNNNNN(BC) A0307,A0307,GAGTCTGCCAGTATC,Multiplexing Capture,R2,5PNNNNNNNNNN(BC) A0308,A0308,TATAGAACGCCAGGC,Multiplexing Capture,R2,5PNNNNNNNNNN(BC) A0309,A0309,TGCCTATGAAACAAG,Multiplexing Capture,R2,5PNNNNNNNNNN(BC) A0310,A0310,CCGATTGTAACAGAC,Multiplexing Capture,R2,5PNNNNNNNNNN(BC)","title":"Prepare cmo.csv"},{"location":"multi_5p/#step-2-convert-per-sample-bam-files-to-fastqs-for-the-gex-data","text":"In this step, you will run the snakemake pipeline to: 1. Convert per-sample BAM file to FASTQ files 2. Prepare the folder for next step conda activate /mnt/ccrsf-ifx/Software/tools/conda_env4scwf module load samtools snakemake -s multi_5p_bam2fastq.py --configfile config.yaml --profile ./slurm/ -e cluster-generic -j 20 indir: \"/mnt/ccrsf-static/singlecell_projects/PamelaSchwartzberg_CS037516_9sclibs_08072024/Analysis_demultiplex/\" outdir: \"outdir\" Job stats: job count ------------------ ------- all 1 bamtofastq 30 prep_raw_fq_folder 30 total 61","title":"Step 2: Convert per sample bam files to FASTQs for the GEX data"},{"location":"multi_5p/#step3-final-run-cellranger-multi-for-gex-fb-tcr-and-bcr-data","text":"Only GEX and VDJ data is used in the example project. Prepare fastq files so that the folder structure meets what cellranger mkfastq outputs. Prepare libraries.csv In the example we used, there are 3 GEX libraries and each GEX library has 10 samples multiplexed together. So in total, there are 30 samples. There are 3 corresponding VDJ libraries. Since we don't perform demultiplexing using HOT on VDJ data. So for the 10 samples from a particular GEX library, same VDJ data were provided. In the libraries.csv file below, only samples from D8_d8_1 are shown. Name,Flowcell,Sample,Type,force-cells CD8_d8_1_A0301,22NK2FLT3,CD8_d8_1_A0301,Gene Expression,1049 CD8_d8_1_A0302,22NK2FLT3,CD8_d8_1_A0302,Gene Expression,857 CD8_d8_1_A0303,22NK2FLT3,CD8_d8_1_A0303,Gene Expression,1042 CD8_d8_1_A0304,22NK2FLT3,CD8_d8_1_A0304,Gene Expression,957 CD8_d8_1_A0305,22NK2FLT3,CD8_d8_1_A0305,Gene Expression,986 CD8_d8_1_A0306,22NK2FLT3,CD8_d8_1_A0306,Gene Expression,824 CD8_d8_1_A0307,22NK2FLT3,CD8_d8_1_A0307,Gene Expression,1062 CD8_d8_1_A0308,22NK2FLT3,CD8_d8_1_A0308,Gene Expression,897 CD8_d8_1_A0309,22NK2FLT3,CD8_d8_1_A0309,Gene Expression,825 CD8_d8_1_A0310,22NK2FLT3,CD8_d8_1_A0310,Gene Expression,1179 CD8_d8_1_A0301,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,1049 CD8_d8_1_A0302,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,857 CD8_d8_1_A0303,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,1042 CD8_d8_1_A0304,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,957 CD8_d8_1_A0305,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,986 CD8_d8_1_A0306,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,824 CD8_d8_1_A0307,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,1062 CD8_d8_1_A0308,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,897 CD8_d8_1_A0309,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,825 CD8_d8_1_A0310,22NK2FLT3,7_VDJ_CD8_d8_1,VDJ-T,1179","title":"Step3: Final: Run cellranger multi for GEX, FB, TCR, and BCR data"},{"location":"multi_5p/#reference","text":"Demultiplexing and Analyzing 5\u2019 Immune Profiling Libraries Pooled with Hashtags","title":"Reference"},{"location":"todo/","text":"Todo list [ ] Add --fullanalysis ability for atac , vdj , and multiome . [ ] Improve documentation","title":"todo"},{"location":"todo/#todo-list","text":"[ ] Add --fullanalysis ability for atac , vdj , and multiome . [ ] Improve documentation","title":"Todo list"},{"location":"vdj/","text":"vdj pipeline usage: scMaestro vdj [-h] -f [FASTQPATH ...] -r REFERENCE FOLDER -g GENOME [-n] [--rerun] [--unlock] [--chain {auto,TR,IG}] options: -h, --help show this help message and exit -f [FASTQPATH ...], --fastqs [FASTQPATH ...] Path(s) to fastq files, multiple paths can be provided together. eg. \"-f path1 path2\" -r REFERENCE FOLDER, --reference REFERENCE FOLDER Reference genome folder for alignment -g GENOME, --genome GENOME Genome build, e.g. \"hg38\", \"mm10\" -n, --dryrun dry run --rerun dry run then prompt for submitting jobs that need to be rerun --unlock unlock working directory --chain {auto,TR,IG} Force the analysis to be carried out for a particular chain type. By default, Cell Ranger will try to automatically determine the chain type from the data. When this fails, the complete error message will look something like the following. In order for Cell Ranger to automatically determine chain type, the sample library must meet the listed conditions. V(D)J Chain detection failed for Sample foo in \"/mnt/scratch/inputs/x/y\". Total Reads = 1000000 Reads mapped to TR = 49211 Reads mapped to IG = 3 In order to distinguish between the TR and the IG chain the following conditions need to be satisfied: - A minimum of 10000 total reads - A minimum of 5.0% of the total reads needs to map to TR or IG - The number of reads mapped to TR should be at least 3.0x compared to the number of reads mapped to IG or vice versa Please check the input data and/or specify the chain via the --chain argument. To overcome this error on the command line, for cellranger vdj explicitly set the --chain parameter to either IG for BCRs or TR for TCRs. Documentation is at https://support.10xgenomics.com/single-cell-vdj/software/pipelines/latest/using/vdj#opt-arg-exp. Reference https://kb.10xgenomics.com/hc/en-us/articles/10840673911693-How-to-solve-V-D-J-Chain-detection-failed-error-on-10x-Cloud","title":"Subcommand `vdj`"},{"location":"vdj/#vdj-pipeline","text":"usage: scMaestro vdj [-h] -f [FASTQPATH ...] -r REFERENCE FOLDER -g GENOME [-n] [--rerun] [--unlock] [--chain {auto,TR,IG}] options: -h, --help show this help message and exit -f [FASTQPATH ...], --fastqs [FASTQPATH ...] Path(s) to fastq files, multiple paths can be provided together. eg. \"-f path1 path2\" -r REFERENCE FOLDER, --reference REFERENCE FOLDER Reference genome folder for alignment -g GENOME, --genome GENOME Genome build, e.g. \"hg38\", \"mm10\" -n, --dryrun dry run --rerun dry run then prompt for submitting jobs that need to be rerun --unlock unlock working directory --chain {auto,TR,IG} Force the analysis to be carried out for a particular chain type. By default, Cell Ranger will try to automatically determine the chain type from the data. When this fails, the complete error message will look something like the following. In order for Cell Ranger to automatically determine chain type, the sample library must meet the listed conditions. V(D)J Chain detection failed for Sample foo in \"/mnt/scratch/inputs/x/y\". Total Reads = 1000000 Reads mapped to TR = 49211 Reads mapped to IG = 3 In order to distinguish between the TR and the IG chain the following conditions need to be satisfied: - A minimum of 10000 total reads - A minimum of 5.0% of the total reads needs to map to TR or IG - The number of reads mapped to TR should be at least 3.0x compared to the number of reads mapped to IG or vice versa Please check the input data and/or specify the chain via the --chain argument. To overcome this error on the command line, for cellranger vdj explicitly set the --chain parameter to either IG for BCRs or TR for TCRs. Documentation is at https://support.10xgenomics.com/single-cell-vdj/software/pipelines/latest/using/vdj#opt-arg-exp.","title":"vdj pipeline"},{"location":"vdj/#reference","text":"https://kb.10xgenomics.com/hc/en-us/articles/10840673911693-How-to-solve-V-D-J-Chain-detection-failed-error-on-10x-Cloud","title":"Reference"}]} \ No newline at end of file diff --git a/search/worker.js b/search/worker.js new file mode 100644 index 0000000..8628dbc --- /dev/null +++ b/search/worker.js @@ -0,0 +1,133 @@ +var base_path = 'function' === typeof importScripts ? '.' : '/search/'; +var allowSearch = false; +var index; +var documents = {}; +var lang = ['en']; +var data; + +function getScript(script, callback) { + console.log('Loading script: ' + script); + $.getScript(base_path + script).done(function () { + callback(); + }).fail(function (jqxhr, settings, exception) { + console.log('Error: ' + exception); + }); +} + +function getScriptsInOrder(scripts, callback) { + if (scripts.length === 0) { + callback(); + return; + } + getScript(scripts[0], function() { + getScriptsInOrder(scripts.slice(1), callback); + }); +} + +function loadScripts(urls, callback) { + if( 'function' === typeof importScripts ) { + importScripts.apply(null, urls); + callback(); + } else { + getScriptsInOrder(urls, callback); + } +} + +function onJSONLoaded () { + data = JSON.parse(this.responseText); + var scriptsToLoad = ['lunr.js']; + if (data.config && data.config.lang && data.config.lang.length) { + lang = data.config.lang; + } + if (lang.length > 1 || lang[0] !== "en") { + scriptsToLoad.push('lunr.stemmer.support.js'); + if (lang.length > 1) { + scriptsToLoad.push('lunr.multi.js'); + } + if (lang.includes("ja") || lang.includes("jp")) { + scriptsToLoad.push('tinyseg.js'); + } + for (var i=0; i < lang.length; i++) { + if (lang[i] != 'en') { + scriptsToLoad.push(['lunr', lang[i], 'js'].join('.')); + } + } + } + loadScripts(scriptsToLoad, onScriptsLoaded); +} + +function onScriptsLoaded () { + console.log('All search scripts loaded, building Lunr index...'); + if (data.config && data.config.separator && data.config.separator.length) { + lunr.tokenizer.separator = new RegExp(data.config.separator); + } + + if (data.index) { + index = lunr.Index.load(data.index); + data.docs.forEach(function (doc) { + documents[doc.location] = doc; + }); + console.log('Lunr pre-built index loaded, search ready'); + } else { + index = lunr(function () { + if (lang.length === 1 && lang[0] !== "en" && lunr[lang[0]]) { + this.use(lunr[lang[0]]); + } else if (lang.length > 1) { + this.use(lunr.multiLanguage.apply(null, lang)); // spread operator not supported in all browsers: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Spread_operator#Browser_compatibility + } + this.field('title'); + this.field('text'); + this.ref('location'); + + for (var i=0; i < data.docs.length; i++) { + var doc = data.docs[i]; + this.add(doc); + documents[doc.location] = doc; + } + }); + console.log('Lunr index built, search ready'); + } + allowSearch = true; + postMessage({config: data.config}); + postMessage({allowSearch: allowSearch}); +} + +function init () { + var oReq = new XMLHttpRequest(); + oReq.addEventListener("load", onJSONLoaded); + var index_path = base_path + '/search_index.json'; + if( 'function' === typeof importScripts ){ + index_path = 'search_index.json'; + } + oReq.open("GET", index_path); + oReq.send(); +} + +function search (query) { + if (!allowSearch) { + console.error('Assets for search still loading'); + return; + } + + var resultDocuments = []; + var results = index.search(query); + for (var i=0; i < results.length; i++){ + var result = results[i]; + doc = documents[result.ref]; + doc.summary = doc.text.substring(0, 200); + resultDocuments.push(doc); + } + return resultDocuments; +} + +if( 'function' === typeof importScripts ) { + onmessage = function (e) { + if (e.data.init) { + init(); + } else if (e.data.query) { + postMessage({ results: search(e.data.query) }); + } else { + console.error("Worker - Unrecognized message: " + e); + } + }; +} diff --git a/sitemap.xml b/sitemap.xml new file mode 100644 index 0000000..0f8724e --- /dev/null +++ b/sitemap.xml @@ -0,0 +1,3 @@ + +[ ] Add --fullanalysis
ability for atac
, vdj
, and multiome
.
[ ] Improve documentation
+vdj
pipelineusage: scMaestro vdj [-h] -f [FASTQPATH ...] -r REFERENCE FOLDER -g GENOME [-n] [--rerun] [--unlock]
+ [--chain {auto,TR,IG}]
+
+options:
+ -h, --help show this help message and exit
+ -f [FASTQPATH ...], --fastqs [FASTQPATH ...]
+ Path(s) to fastq files, multiple paths can be provided together. eg. "-f path1
+ path2"
+ -r REFERENCE FOLDER, --reference REFERENCE FOLDER
+ Reference genome folder for alignment
+ -g GENOME, --genome GENOME
+ Genome build, e.g. "hg38", "mm10"
+ -n, --dryrun dry run
+ --rerun dry run then prompt for submitting jobs that need to be rerun
+ --unlock unlock working directory
+ --chain {auto,TR,IG} Force the analysis to be carried out for a particular chain type.
+
+By default, Cell Ranger will try to automatically determine the chain type from the data. When this fails, the complete error message will look something like the following. In order for Cell Ranger to automatically determine chain type, the sample library must meet the listed conditions.
+V(D)J Chain detection failed for Sample foo in "/mnt/scratch/inputs/x/y".
+
+Total Reads = 1000000
+Reads mapped to TR = 49211
+Reads mapped to IG = 3
+
+In order to distinguish between the TR and the IG chain the following conditions need to be satisfied:
+- A minimum of 10000 total reads
+- A minimum of 5.0% of the total reads needs to map to TR or IG
+- The number of reads mapped to TR should be at least 3.0x compared to the number of reads mapped to IG or vice versa
+Please check the input data and/or specify the chain via the --chain argument.
+
+To overcome this error on the command line, for cellranger vdj explicitly set the --chain parameter to either IG for BCRs or TR for TCRs. Documentation is at https://support.10xgenomics.com/single-cell-vdj/software/pipelines/latest/using/vdj#opt-arg-exp.
+https://kb.10xgenomics.com/hc/en-us/articles/10840673911693-How-to-solve-V-D-J-Chain-detection-failed-error-on-10x-Cloud
+ +