-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Methods for cell type annotation and AnnData conversion #61
Changes from 6 commits
dabfc49
21dac65
7ecb13f
f871952
7b73643
e766aa4
ab59eca
6d812df
91099b7
1876413
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -103,8 +103,26 @@ In addition to using the default parameters for `salmon quant`, we applied the ` | |
|
||
### Cell type annotation | ||
|
||
- Implementation of SingleR and CellAssign | ||
- Description of metrics used (e.g., what is the delta median and where does the probability come from) | ||
If cell types were obtained from the submitter of the dataset, the submitter-provided annotations were incorporated into all `SingleCellExperiment` objects (unfiltered, filtered, and processed). | ||
Cell type labels determined by both `SingleR`[@doi:10.1038/s41590-018-0276-y] and `CellAssign`[@doi:10.1038/s41592-019-0529-1] were added to processed `SingleCellExperiment` objects. | ||
|
||
To build the references used for assigning cell types, a separate workflow within `scpca-nf` was run, `build-celltype-index.nf`. | ||
For `SingleR` we used the `BlueprintEncodeData` from the `celldex` package [@doi:10.3324/haematol.2013.094243;@doi: 10.1038/nature11247;@doi:10.18129/B9.bioc.celldex] to train the `SingleR` classification model with `SingleR::trainSingleR()`. | ||
The model and the processed `SingleCellExperiment` object were input to `SingleR::classifySingleR()`. | ||
The `SingleR` output of cell type annotations and a score matrix for each cell and all possible cell types were added to the processed `SingleCellExperiment` object output. | ||
To evaluate confidence in `SingleR` cell type assignments, we also calculated a delta median statistic for each cell by subtracting the median cell type score from the maximum score for that cell [@url:https://bioconductor.org/books/release/SingleRBook/annotation-diagnostics.html#based-on-the-deltas-across-cells]. | ||
|
||
The delta median statistic is helpful in evaluating how confident `SingleR` is in assigning each cell to a specific cell type, where low delta median values indicate ambiguous assignments and high delta median values indicate confident assignments. | ||
To identify the most appropriate reference to use with `SingleR`, we annotated a handful of samples across multiple disease types with all human-specific references available in the `celldex` package. | ||
`BlueprintEncodeData` had the most consistently high delta median statistic distribution across samples from multiple disease types and was chosen as the reference to use for all ScPCA samples. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I recommend taking this out for now and filing an issue, blocked by AlexsLemonade/scpca-paper-figures#41, to move talking about picking a cell type annotation method and references into results. |
||
|
||
For `CellAssign`, marker gene references were created using the marker gene lists available on `PanglaoDB` [@doi:10.1093/database/baz046]. | ||
Organ-specific references were built using all cell types in a specified organ listed in `PanglaoDB` to accommodate all ScPCA projects encompassing a variety of disease and tissue type. | ||
If a set of disease types in a given project encompassed cells that may be present in multiple organ groups, multiple organs were combined - e.g., for sarcomas that appear in bone or soft tissue, we created a reference containing bone, connective tissue, smooth muscle, and immune cells. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this a good level of detail for the text, but we might want a supplemental table of the organ sets we used. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that's a good idea so I'm going to file an issue regarding this in the figures repo. |
||
|
||
Given the processed `SingleCellExperiment` object and organ-specific reference, `scvi.external.CellAssign` was used to train the model and predict the assigned cell type. | ||
For each cell type in the reference, `CellAssign` calculates the probability that each cell is assigned to that cell type. | ||
The probability matrix and a prediction based on the most likely cell type were added as cell type annotations to the processed `SingleCellExperiment` object output by `scpca-nf`. | ||
allyhawkins marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### Generating merged data | ||
|
||
|
@@ -124,6 +142,13 @@ If any libraries included in the ScPCA project contain additional ADT data, the | |
By contrast, if any libraries included in the ScPCA project are multiplexed and contain HTO data, the HTO data is not merged and will not be present in the merged `SingleCellExperiment` object. | ||
|
||
### Converting SingleCellExperiment objects to AnnData objects | ||
- use of zellkonverter | ||
|
||
`zellkonverter::writeH5AD()` was used to convert `SingleCellExperiment` objects to `AnnData` format and export the objects as `.hdf5` files. | ||
For any `SingleCellExperiment` objects containing an `altExp` (e.g., ADT data), the RNA and ADT data were exported and saved separately as RNA (`_rna.hdf5`) and ADT (`_adt.hdf5`) files. | ||
Multiplexed libraries were not converted to `AnnData` objects, due to the potential for ambiguity in sample origin assignments. | ||
|
||
All merged `SingleCellExperiment` objects were converted to `AnnData` objects and saved as `.hdf5` files. | ||
If a merged `SingleCellExperiment` object contained any ADT data, the RNA and ADT data were exported and saved separately as RNA (`_rna.hdf5`) and ADT (`_adt.hdf5`). | ||
In contrast, if a merged `SingleCellExperiment` object contained HTO data due to the presence of any multiplexed libraries in the merged object, the HTO data was removed from the `SingleCellExperiment` object and not included in the exported `AnnData` object. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looking at this, I kind of feel like we probably should just not merge the multiplexed data... The logic written out like this seems very strange. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you mean not include multiplexed libraries in any merged objects? Because we have both regular libraries and multiplexed libraries in the same project so we would need to adjust the workflow to remove any multiplexed libraries before merging. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is really only one project, right? I kind of feel like we could just skip the whole thing in that case. (This is a discussion largely for somewhere else) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Noting that I filed https://github.com/AlexsLemonade/ScPCA-admin/issues/832 to walk through some options around this. I think we leave it for now. |
||
|
||
### Code and data availability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I too am not sure how much detail we want here! I think this is okay as far as it goes: I'm not sure if this maybe should actually be a result though? Probably doesn't need to be, but in some ways I think evaluating the applicability of cell typing methods to compendia is something worth talking a bit about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean including a figure around this? I already had this thought, but wanted to wait until we wrote up the text to decide what exactly to include. See AlexsLemonade/scpca-paper-figures#41.
So maybe we do want to include a supplemental figure that looks at the delta median statistic across a few samples and a few references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to tag in @jaclyn-taroni to take a look at this and see what she thinks about including a figure showing reference comparisons and about the level of detail presented here. Just noting that this figure might look a bit messy, but we could make one and then make a decision on if it will help prove a point or just bring more questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it makes sense to try a supplemental figure showing reference comparisons, which is discussed in the cell type annotation section of the results. I propose that we split the "Annotation cell types" section into two subsections: evaluating the methods themselves and the workflow part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaclyn-taroni do you mean creating two sections in the results or here in the methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one in results that uses this header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the typo, so "Annotating cell types"