From 214edcc7b44a7b0db911901d35968ca60f5d6310 Mon Sep 17 00:00:00 2001 From: "Stephanie J. Spielman" Date: Tue, 27 Feb 2024 11:34:27 -0500 Subject: [PATCH 1/6] add figure 3 caption --- content/100.figure-table-legends.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/content/100.figure-table-legends.md b/content/100.figure-table-legends.md index 929854c..892668b 100644 --- a/content/100.figure-table-legends.md +++ b/content/100.figure-table-legends.md @@ -38,3 +38,24 @@ Points are colored by whether the cell was kept or removed, as determined by bot F. UMAP embeddings of log-normalized RNA expression values where each cell is colored by the number of genes detected. G. UMAP embeddings of log-normalized RNA expression values for the top four most variable genes, colored by the given gene's expression. In the actual summary QC report, the top 12 most highly variable genes are shown. + + +![**Figure 3. ScPCA Portal project download file structure and merged object workflow.**](https://raw.githubusercontent.com/AlexsLemonade/scpca-paper-figures/main/figures/compiled_figures/pngs/figure_3.png?sanitize=true){#fig:fig3 width="7in"} +A. File download structure for an ScPCA Portal project download in `SCE` format. +The download folder is named according to both the project ID and the date it was downloaded. +Download folders contain one folder for each sample ID, each containing the three versions (unfiltered, filtered, and processed) of the expression data as well as the summary QC report and cell type report all named according to the ScPCA library ID. +The `single_cell_metadata.tsv` file contains sample metadata for all samples included in the download. +The `README.md` file provides information about the contents of each download file, additional contact and citation information, and terms of use for data downloaded from the ScPCA Portal. +The files `bulk_quant.tsv` and `bulk_metadata.tsv` are only present for projects that also have bulk RNA-Seq data and contain, respectively, a gene by sample matrix of raw gene expression as quantified by `salmon`, and associated metadata for all samples with bulk RNA-Seq data. +B. File download structure for an ScPCA Portal merged project download in `SCE` format. +The download folder is named according to both the project ID and the date it was downloaded. +Download folders contain a single merged object containing all samples in the given project as well as a summary report briefly detailing the contents of the merged object. +All summary QC and cell type reports for each individual library are also provided in the `individual_reports` folder arranged by their sample ID. +As in panel (A), additional files `single_cell_metadata.tsv`, `bulk_quant.tsv`, `bulk_metadata.tsv`, and `README.md` are also included. +C. Overview of the merged workflow. +Processed `SCE` objects associated with a given project are merged into a single object, including ADT counts from CITE-seq data if present, and a merged summary report is generated. +Merged objects are available for download either in `SCE` or `AnnData` format. +D. Example of UMAPs as shown in the merged summary report. +A grid of UMAPs is shown for each library in the merged object, with cells in the library of interest shown in red and all other cells belonging to other libraries shown in gray. +The UMAP is constructed from the merged object such that all libraries contribute an equal weight, but no batch correction was performed. +The libraries pictured are a subset of libraries in the ScPCA project `SCPCP000003`. From e3d0e4b3382d6b7f22735e15a7681c10b83bf177 Mon Sep 17 00:00:00 2001 From: "Stephanie J. Spielman" Date: Tue, 27 Feb 2024 11:40:15 -0500 Subject: [PATCH 2/6] fig4 caption --- content/100.figure-table-legends.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/content/100.figure-table-legends.md b/content/100.figure-table-legends.md index 18a7f39..de62605 100644 --- a/content/100.figure-table-legends.md +++ b/content/100.figure-table-legends.md @@ -41,6 +41,7 @@ In the actual summary QC report, the top 12 most highly variable genes are shown ![**Figure 3. ScPCA Portal project download file structure and merged object workflow.**](https://raw.githubusercontent.com/AlexsLemonade/scpca-paper-figures/main/figures/compiled_figures/pngs/figure_3.png?sanitize=true){#fig:fig3 width="7in"} + A. File download structure for an ScPCA Portal project download in `SCE` format. The download folder is named according to both the project ID and the date it was downloaded. Download folders contain one folder for each sample ID, each containing the three versions (unfiltered, filtered, and processed) of the expression data as well as the summary QC report and cell type report all named according to the ScPCA library ID. @@ -59,3 +60,17 @@ D. Example of UMAPs as shown in the merged summary report. A grid of UMAPs is shown for each library in the merged object, with cells in the library of interest shown in red and all other cells belonging to other libraries shown in gray. The UMAP is constructed from the merged object such that all libraries contribute an equal weight, but no batch correction was performed. The libraries pictured are a subset of libraries in the ScPCA project `SCPCP000003`. + + +![**Figure 4. Cell type annotation `scpca-nf` subworkflow.**](https://raw.githubusercontent.com/AlexsLemonade/scpca-paper-figures/main/figures/compiled_figures/pngs/figure_4.png?sanitize=true){#fig:fig4 width="7in"} + +A. Expanded view of the cell type annotation subworkflow in `scpca-nf`, as introduced in Figure 2A. +Cell type annotation is performed on the `Processed SCE Object`. +A `celldex` [@doi:10.1038/s41590-018-0276-y] reference dataset with ontology labels is used as input for annotation with `SingleR` [@doi:10.1038/s41590-018-0276-y], and a list of marker genes compiled from `PanglaoDB` [@doi:10.1093/database/baz046] is used as input for annotation with `CellAssign` [@doi:10.1038/s41592-019-0529-1]. +Results from cell type annotation are then added to the `Processed SCE Object`, and a cell type summary report with information about reference sources, comparisons among cell type annotation methods, and diagnostic plots is created. +Although not shown in this panel, cell type annotations are also included in the `Processed AnnData Object` created from the `Processed SCE Object` (Figure 2A). + +B. Example heatmap as shown in the cell type summary report comparing annotations with `SingleR` and `CellAssign`. +Heatmap cells are colored by the Jacard similarity index. +A value of 1 means that there is complete overlap between which cells are annotated with the two labels being compared, and a value of 0 means that there is no overlap between which cells are annotated with the two labels being compared. +The heatmap shown is from library `SCPCL000498`. From d10700b5702d50f85296676b09a96c1109fb13fe Mon Sep 17 00:00:00 2001 From: Stephanie Spielman Date: Wed, 28 Feb 2024 09:20:51 -0500 Subject: [PATCH 3/6] Apply suggestions from code review Co-authored-by: Ally Hawkins <54039191+allyhawkins@users.noreply.github.com> --- content/100.figure-table-legends.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/100.figure-table-legends.md b/content/100.figure-table-legends.md index 5873c90..bd5e0ba 100644 --- a/content/100.figure-table-legends.md +++ b/content/100.figure-table-legends.md @@ -61,15 +61,15 @@ The UMAP is constructed from the merged object such that all libraries contribut The libraries pictured are a subset of libraries in the ScPCA project `SCPCP000003`. -![**Figure 4. Cell type annotation `scpca-nf` subworkflow.**](https://raw.githubusercontent.com/AlexsLemonade/scpca-paper-figures/main/figures/compiled_figures/pngs/figure_4.png?sanitize=true){#fig:fig4 width="7in"} +![**Figure 4. Cell type annotation with `scpca-nf`.**](https://raw.githubusercontent.com/AlexsLemonade/scpca-paper-figures/main/figures/compiled_figures/pngs/figure_4.png?sanitize=true){#fig:fig4 width="7in"} -A. Expanded view of the cell type annotation subworkflow in `scpca-nf`, as introduced in Figure 2A. +A. Expanded view of the process for adding cell type annotations within `scpca-nf`, as introduced in Figure 2A. Cell type annotation is performed on the `Processed SCE Object`. A `celldex` [@doi:10.1038/s41590-018-0276-y] reference dataset with ontology labels is used as input for annotation with `SingleR` [@doi:10.1038/s41590-018-0276-y], and a list of marker genes compiled from `PanglaoDB` [@doi:10.1093/database/baz046] is used as input for annotation with `CellAssign` [@doi:10.1038/s41592-019-0529-1]. Results from cell type annotation are then added to the `Processed SCE Object`, and a cell type summary report with information about reference sources, comparisons among cell type annotation methods, and diagnostic plots is created. Although not shown in this panel, cell type annotations are also included in the `Processed AnnData Object` created from the `Processed SCE Object` (Figure 2A). B. Example heatmap as shown in the cell type summary report comparing annotations with `SingleR` and `CellAssign`. -Heatmap cells are colored by the Jacard similarity index. +Heatmap cells are colored by the Jaccard similarity index. A value of 1 means that there is complete overlap between which cells are annotated with the two labels being compared, and a value of 0 means that there is no overlap between which cells are annotated with the two labels being compared. The heatmap shown is from library `SCPCL000498`. From 0667fdf017b21b593d6e873019bb0692e682171a Mon Sep 17 00:00:00 2001 From: "Stephanie J. Spielman" Date: Wed, 28 Feb 2024 09:22:12 -0500 Subject: [PATCH 4/6] singlecellexperiment (sce) --- content/100.figure-table-legends.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/100.figure-table-legends.md b/content/100.figure-table-legends.md index bd5e0ba..a8a1643 100644 --- a/content/100.figure-table-legends.md +++ b/content/100.figure-table-legends.md @@ -41,7 +41,7 @@ In the actual summary QC report, the top 12 most highly variable genes are shown ![**Figure 3. ScPCA Portal project download file structure and merged object workflow.**](https://raw.githubusercontent.com/AlexsLemonade/scpca-paper-figures/main/figures/compiled_figures/pngs/figure_3.png?sanitize=true){#fig:fig3 width="7in"} -A. File download structure for an ScPCA Portal project download in `SCE` format. +A. File download structure for an ScPCA Portal project download in `SingleCellExperiment` (`SCE`) format. The download folder is named according to both the project ID and the date it was downloaded. Download folders contain one folder for each sample ID, each containing the three versions (unfiltered, filtered, and processed) of the expression data as well as the summary QC report and cell type report all named according to the ScPCA library ID. The `single_cell_metadata.tsv` file contains sample metadata for all samples included in the download. @@ -61,7 +61,7 @@ The UMAP is constructed from the merged object such that all libraries contribut The libraries pictured are a subset of libraries in the ScPCA project `SCPCP000003`. -![**Figure 4. Cell type annotation with `scpca-nf`.**](https://raw.githubusercontent.com/AlexsLemonade/scpca-paper-figures/main/figures/compiled_figures/pngs/figure_4.png?sanitize=true){#fig:fig4 width="7in"} +![**Figure 4. Cell type annotation in `scpca-nf`.**](https://raw.githubusercontent.com/AlexsLemonade/scpca-paper-figures/main/figures/compiled_figures/pngs/figure_4.png?sanitize=true){#fig:fig4 width="7in"} A. Expanded view of the process for adding cell type annotations within `scpca-nf`, as introduced in Figure 2A. Cell type annotation is performed on the `Processed SCE Object`. From 8e8d7faa89bcfbb28da58bd1ad9ce823f6520e65 Mon Sep 17 00:00:00 2001 From: "Stephanie J. Spielman" Date: Wed, 28 Feb 2024 12:14:24 -0500 Subject: [PATCH 5/6] new lines between panels for legibility, at least for now --- content/100.figure-table-legends.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/content/100.figure-table-legends.md b/content/100.figure-table-legends.md index a8a1643..febda80 100644 --- a/content/100.figure-table-legends.md +++ b/content/100.figure-table-legends.md @@ -4,11 +4,13 @@ A. Barplots showing sample counts across four main cancer groupings in the ScPCA Portal, with each bar displaying the number of samples for each cancer type. Each bar is shaded based on the number of samples with each disease timing, and total sample counts for each cancer type are shown to the right of each bar. + B. Barplot showing sample counts across types of modalities present in the ScPCA Portal. All samples in the portal are shown under the "All Samples" heading. Samples under the "Samples with additional modalities" heading represent a subset of the total samples with the given additional modality. Colors shown for each additional modality indicate the suspension type that the single-cell or single-nuclei sample is associated with. For example, 75 single-cell samples and 43 single-nuclei samples have accompanying Bulk RNA-seq data. + C. Example of a project card as displayed on the "Browse" page of the ScPCA Portal. This project card is associated with project `SCPCP000009`. Project cards include information about the number of samples, technologies and modalities, additional sample metadata information, submitter-provided diagnoses, as well as submitter-provided abstract. @@ -26,15 +28,21 @@ The object undergoes cell type annotation and is exported as the `Processed SCE A summary QC report and a supplemental cell type report are prepared and exported. Finally, all `SCE` files are converted to `AnnData` format and exported. Panels B-G show example figures that appear in the summary QC report, shown here for `SCPCL000001`, as follows. + B. The total UMI count for each cell in the `Filtered SCE Object`, ordered by rank. Points are colored by the percentage of cells that pass the empty droplets filter. + C. The number of genes detected in each cell passing the empty droplets filter against the total UMI count. Points are colored by the percentage of mitochondrial reads in the cell. + D. `miQC` model diagnostic plot showing the percent of mitochondrial reads in each cell against the number of genes detected in the `Filtered SCE Object`. Points are colored by the probability that the cell is compromised as determined by `miQC`. + E. The percent of mitochondrial reads in each cell against the number of genes detected in each cell. Points are colored by whether the cell was kept or removed, as determined by both `miQC` and a minimum unique gene count cutoff, prior to normalization and dimensionality reduction. + F. UMAP embeddings of log-normalized RNA expression values where each cell is colored by the number of genes detected. + G. UMAP embeddings of log-normalized RNA expression values for the top four most variable genes, colored by the given gene's expression. In the actual summary QC report, the top 12 most highly variable genes are shown. @@ -47,14 +55,17 @@ Download folders contain one folder for each sample ID, each containing the thre The `single_cell_metadata.tsv` file contains sample metadata for all samples included in the download. The `README.md` file provides information about the contents of each download file, additional contact and citation information, and terms of use for data downloaded from the ScPCA Portal. The files `bulk_quant.tsv` and `bulk_metadata.tsv` are only present for projects that also have bulk RNA-Seq data and contain, respectively, a gene by sample matrix of raw gene expression as quantified by `salmon`, and associated metadata for all samples with bulk RNA-Seq data. + B. File download structure for an ScPCA Portal merged project download in `SCE` format. The download folder is named according to both the project ID and the date it was downloaded. Download folders contain a single merged object containing all samples in the given project as well as a summary report briefly detailing the contents of the merged object. All summary QC and cell type reports for each individual library are also provided in the `individual_reports` folder arranged by their sample ID. As in panel (A), additional files `single_cell_metadata.tsv`, `bulk_quant.tsv`, `bulk_metadata.tsv`, and `README.md` are also included. + C. Overview of the merged workflow. Processed `SCE` objects associated with a given project are merged into a single object, including ADT counts from CITE-seq data if present, and a merged summary report is generated. Merged objects are available for download either in `SCE` or `AnnData` format. + D. Example of UMAPs as shown in the merged summary report. A grid of UMAPs is shown for each library in the merged object, with cells in the library of interest shown in red and all other cells belonging to other libraries shown in gray. The UMAP is constructed from the merged object such that all libraries contribute an equal weight, but no batch correction was performed. From bc2e28f19cac24c226db3028bc262d2f544ec2e2 Mon Sep 17 00:00:00 2001 From: "Stephanie J. Spielman" Date: Wed, 28 Feb 2024 12:16:10 -0500 Subject: [PATCH 6/6] actually make use of the figure tags --- content/100.figure-table-legends.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/100.figure-table-legends.md b/content/100.figure-table-legends.md index febda80..d895c19 100644 --- a/content/100.figure-table-legends.md +++ b/content/100.figure-table-legends.md @@ -74,11 +74,11 @@ The libraries pictured are a subset of libraries in the ScPCA project `SCPCP0000 ![**Figure 4. Cell type annotation in `scpca-nf`.**](https://raw.githubusercontent.com/AlexsLemonade/scpca-paper-figures/main/figures/compiled_figures/pngs/figure_4.png?sanitize=true){#fig:fig4 width="7in"} -A. Expanded view of the process for adding cell type annotations within `scpca-nf`, as introduced in Figure 2A. +A. Expanded view of the process for adding cell type annotations within `scpca-nf`, as introduced in Figure {@fig:fig2}A. Cell type annotation is performed on the `Processed SCE Object`. A `celldex` [@doi:10.1038/s41590-018-0276-y] reference dataset with ontology labels is used as input for annotation with `SingleR` [@doi:10.1038/s41590-018-0276-y], and a list of marker genes compiled from `PanglaoDB` [@doi:10.1093/database/baz046] is used as input for annotation with `CellAssign` [@doi:10.1038/s41592-019-0529-1]. Results from cell type annotation are then added to the `Processed SCE Object`, and a cell type summary report with information about reference sources, comparisons among cell type annotation methods, and diagnostic plots is created. -Although not shown in this panel, cell type annotations are also included in the `Processed AnnData Object` created from the `Processed SCE Object` (Figure 2A). +Although not shown in this panel, cell type annotations are also included in the `Processed AnnData Object` created from the `Processed SCE Object` (Figure {@fig:fig2}A). B. Example heatmap as shown in the cell type summary report comparing annotations with `SingleR` and `CellAssign`. Heatmap cells are colored by the Jaccard similarity index.