Skip to content

Commit

Permalink
Merge pull request #60 from AlexsLemonade/allyhawkins/outline-celltyp…
Browse files Browse the repository at this point in the history
…e-results

Outline cell type annotations results section
  • Loading branch information
allyhawkins authored Mar 4, 2024
2 parents c72242e + 72f3d1a commit 402ffd4
Showing 1 changed file with 52 additions and 0 deletions.
52 changes: 52 additions & 0 deletions content/03.results.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,5 +177,57 @@ If the merged object contains an `altExp` with merged ADT data, two `AnnData` ob
In the UMAP, each panel represents a different library included in the merged object, with all cells from the specified library shown in color, while all other cells are gray.
An example of this UMAP showing a subset of libraries from a ScPCA project is available in Figure 3D.

## Annotating cell types

1. Why including cell type annotations is helpful to users
- Cell typing is often difficult and can require specific domain expertise
- Sometimes, we have cell type annotations from submitters -- this is the ideal case
- Briefly, where are the cell type annotations included in downloads
- We can save users time even if there are limitations to the annotations we include
- What we looked for in methods
- We include two because observing consistent cell type annotations across methods can indicate higher confidence in the cell type annotation.

2. Methods we used
- `SingleR` requires a trained model from an existing bulk or single-cell RNA-seq dataset.
- We used the `BlueprintEncodeData` dataset from `celldex` as the reference for all ScPCA samples.
- This dataset is publicly available, contains various normal cell types, and includes both human-readable cell type names and cell ontology labels. This reference dataset does not include tumor cells.
- `CellAssign` requires a marker gene by cell type matrix that includes associated marker genes for all cell types in the reference.
- We built organ-specific references using the publicly available marker gene list from `PanglaoDB`.
- References were unique to each project based on the disease type and tissue type from which the sample was obtained, e.g., for all leukemia samples we used a blood-specific reference and for all brain cancers we used a brain-specific reference.
- Each of these references includes any normal cell types that are included in `PanglaoDB` and also part of that organ. Similar to the reference used with `SingleR`, these references do not contain any tumor cells.
- Since many cancers may have infiltrating immune cells, all immune cells were included in each organ-specific reference.


3. cell type workflow
- As the last step in `scpca-nf`, cell type annotations will be added to all processed objects (Fig. 4A).
- Briefly explain how `SingleR` is used in the pipeline
- Briefly explain how CellAssign is used in the pipeline
- The cell type annotations from each method, along with any associated statistics, are added to the processed `SingleCellExperiment` object output by `scpca-nf`.
- These objects are then converted to `AnnData` objects, so cell type annotations are included in both data formats provided by `scpca-nf`.


4. Report
- An additional cell type report with information about reference sources, comparisons among cell type annotation methods, and diagnostic plots is also output by `scpca-nf`.
- Tables summarizing the number of cells assigned to each cell type for each method are shown alongside UMAPs coloring cells by the assigned cell type.
- As methods can provide different cell type annotations, a comparison between the two methods, `SingleR` and `CellAssign` is included in the report.
- To compare cell type annotation methods, a Jaccard similarity index is calculated between pairs of labels from each method.
- This index ranges from 0-1, with a value close to 1 indicating high agreement and a high proportion of overlapping cells and values close to 0 indicating a low proportion of non-overlapping cells.
- The jaccard similarity index is displayed in a heatmap, an example of which is shown in Fig. 4A.

5. Report diagnostic plots
- The report also includes a diagnostic plot evaluating the confidence of cell type annotations determined by each method.
- `SingleR` assigns a score to each cell for all possible cell types in the reference. The final cell type annotation is associated with the label that has the highest score for that cell.
- To evaluate confidence in `SingleR` cell type annotations, `scpca-nf` calculates a delta median statistic as the difference between the top score and the median score for each cell.
- A higher delta median statistic for a cell indicates higher confidence in the final cell type annotation.
- An example plot that summarizes this statistic across all cell types identified with `SingleR` is shown in Supplement Fig. 4A.
- `CellAssign` assigns a probability or likelihood to each cell type label for each cell. The cell type label with the highest probability is assigned as the cell type for that cell.
- These values range from 0 to 1, with larger values indicating greater confidence in a given cell type label, so reliable labels should have most values close to 1.
- An example of a plot displaying the distribution of all probabilities for each cell type is shown in Supplemental Figure 4B.

6. Submitter cell types
- We compare the automated methods to submitter cell types
- Included in the cell type report is a table summarizing the submitter cell type annotations, a UMAP coloring each cell by the submitter annotation, and a plot comparing submitter annotations to both `SingleR` and `CellAssign`.
- The same Jaccard similarity index used when comparing `SingleR` to `CellAssign` is calculated between submitter annotations and `SingleR` annotations and then submitter annotations and `CellAssign`.
- A heatmap displaying the index is included in the report and an example is shown in Supplemental Figure 5.


0 comments on commit 402ffd4

Please sign in to comment.