Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Methods for cell type annotation and AnnData conversion #61

Merged
merged 10 commits into from
Mar 7, 2024
29 changes: 25 additions & 4 deletions content/04.methods.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,10 +101,24 @@ We created a decoy-aware reference created from spliced cDNA sequences with the
The trimmed reads were then provided as input to `salmon quant` for selective alignment.
In addition to using the default parameters for `salmon quant`, we applied the `--seqBias` and `--gcBias` flags to correct for sequence-specific biases due to random hexamer priming and fragment-level GC biases, respectively.

### Cell type annotation
### Cell type annotation

- Implementation of SingleR and CellAssign
- Description of metrics used (e.g., what is the delta median and where does the probability come from)
If cell types were obtained from the submitter of the dataset, the submitter-provided annotations were incorporated into all `SingleCellExperiment` objects (unfiltered, filtered, and processed).
Cell type labels determined by both `SingleR`[@doi:10.1038/s41590-018-0276-y] and `CellAssign`[@doi:10.1038/s41592-019-0529-1] were added to processed `SingleCellExperiment` objects.

To build the references used for assigning cell types, a separate workflow within `scpca-nf` was run, `build-celltype-index.nf`.
For `SingleR` we used the `BlueprintEncodeData` from the `celldex` package [@doi:10.3324/haematol.2013.094243;@doi: 10.1038/nature11247;@doi:10.18129/B9.bioc.celldex] to train the `SingleR` classification model with `SingleR::trainSingleR()`.
The model and the processed `SingleCellExperiment` object were input to `SingleR::classifySingleR()`.
The `SingleR` output of cell type annotations and a score matrix for each cell and all possible cell types were added to the processed `SingleCellExperiment` object output.
To evaluate confidence in `SingleR` cell type assignments, we also calculated a delta median statistic for each cell by subtracting the median cell type score from the maximum score for that cell [@url:https://bioconductor.org/books/release/SingleRBook/annotation-diagnostics.html#based-on-the-deltas-across-cells].

For `CellAssign`, marker gene references were created using the marker gene lists available on `PanglaoDB` [@doi:10.1093/database/baz046].
Organ-specific references were built using all cell types in a specified organ listed in `PanglaoDB` to accommodate all ScPCA projects encompassing a variety of disease and tissue type.
If a set of disease types in a given project encompassed cells that may be present in multiple organ groups, multiple organs were combined - e.g., for sarcomas that appear in bone or soft tissue, we created a reference containing bone, connective tissue, smooth muscle, and immune cells.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this a good level of detail for the text, but we might want a supplemental table of the organ sets we used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good idea so I'm going to file an issue regarding this in the figures repo.


Given the processed `SingleCellExperiment` object and organ-specific reference, `scvi.external.CellAssign` was used to train the model and predict the assigned cell type.
For each cell, `CellAssign` calculates a probability of assignment to each cell type in the reference.
The probability matrix and a prediction based on the most probable cell type were added as cell type annotations to the processed `SingleCellExperiment` object output.

### Generating merged data

Expand All @@ -124,6 +138,13 @@ If any libraries included in the ScPCA project contain additional ADT data, the
By contrast, if any libraries included in the ScPCA project are multiplexed and contain HTO data, the HTO data is not merged and will not be present in the merged `SingleCellExperiment` object.

### Converting SingleCellExperiment objects to AnnData objects
- use of zellkonverter

`zellkonverter::writeH5AD()` was used to convert `SingleCellExperiment` objects to `AnnData` format and export the objects as `.hdf5` files.
For any `SingleCellExperiment` objects containing an `altExp` (e.g., ADT data), the RNA and ADT data were exported and saved separately as RNA (`_rna.hdf5`) and ADT (`_adt.hdf5`) files.
Multiplexed libraries were not converted to `AnnData` objects, due to the potential for ambiguity in sample origin assignments.

All merged `SingleCellExperiment` objects were converted to `AnnData` objects and saved as `.hdf5` files.
If a merged `SingleCellExperiment` object contained any ADT data, the RNA and ADT data were exported and saved separately as RNA (`_rna.hdf5`) and ADT (`_adt.hdf5`).
In contrast, if a merged `SingleCellExperiment` object contained HTO data due to the presence of any multiplexed libraries in the merged object, the HTO data was removed from the `SingleCellExperiment` object and not included in the exported `AnnData` object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this, I kind of feel like we probably should just not merge the multiplexed data... The logic written out like this seems very strange.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean not include multiplexed libraries in any merged objects? Because we have both regular libraries and multiplexed libraries in the same project so we would need to adjust the workflow to remove any multiplexed libraries before merging.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is really only one project, right? I kind of feel like we could just skip the whole thing in that case. (This is a discussion largely for somewhere else)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting that I filed https://github.com/AlexsLemonade/ScPCA-admin/issues/832 to walk through some options around this. I think we leave it for now.


### Code and data availability