-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Methods for cell type annotation and AnnData conversion #61
Conversation
…methods' into allyhawkins/cell-type-methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good overall, but I think we need a bit more detail about the references, if that does not appear elsewhere. I also hav a few smaller comments, and a clarification about CellAssign scores.
content/04.methods.md
Outdated
Organ-specific references were built using all cell types in a specified organ listed in `PanglaoDB`. | ||
References for each ScPCA project were assigned based on the tissue from which the sample was obtained. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we might want a bit more detail here about how we made some of our decisions here, and the fact that we were often combining organs?
|
||
All merged `SingleCellExperiment` objects were converted to `AnnData` objects and saved as `.hdf5` files. | ||
If a merged `SingleCellExperiment` object contains any ADT data, the RNA and ADT data was exported and saved separately as RNA (`_rna.hdf5`) and ADT (`_adt.hdf5`). | ||
In contrast, if a merged `SingleCellExperiment` object contained HTO data due to the presence of any multiplexed libraries in the merged object, the HTO data was removed from the `SingleCellExperiment` object and not included in the exported `AnnData` object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this, I kind of feel like we probably should just not merge the multiplexed data... The logic written out like this seems very strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean not include multiplexed libraries in any merged objects? Because we have both regular libraries and multiplexed libraries in the same project so we would need to adjust the workflow to remove any multiplexed libraries before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is really only one project, right? I kind of feel like we could just skip the whole thing in that case. (This is a discussion largely for somewhere else)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting that I filed https://github.com/AlexsLemonade/ScPCA-admin/issues/832 to walk through some options around this. I think we leave it for now.
Co-authored-by: Joshua Shapiro <[email protected]>
Click the link below to download the manuscript build as a ZIP file. |
Click the link below to download the manuscript build as a ZIP file. |
@jashapiro I added some wording regarding why we picked the I'm also not sure how much detail you want on the celldex reference. I added a little bit and related it back to the delta median statistic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These updates look good, but I think I probably want someone else to weigh in on my first comment here. In particular, how much of the cell typing journey do we want to present in this paper? Do we want to comment about how difficult it is on a compendia-level basis, and particularly for cancer cells? I think this is probably something worth highlighting at the very least in the discussion, but we also might a bit of our little benchmarks in this paper.
That said, it opens an avenue for critique/suggestions of more experiments that maybe we don't want to highlight.
content/04.methods.md
Outdated
The delta median statistic is helpful in evaluating how confident `SingleR` is in assigning each cell to a specific cell type, where low delta median values indicate ambiguous assignments and high delta median values indicate confident assignments. | ||
To identify the most appropriate reference to use with `SingleR`, we annotated a handful of samples across multiple disease types with all human-specific references available in the `celldex` package. | ||
`BlueprintEncodeData` had the most consistently high delta median statistic distribution across samples from multiple disease types and was chosen as the reference to use for all ScPCA samples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I too am not sure how much detail we want here! I think this is okay as far as it goes: I'm not sure if this maybe should actually be a result though? Probably doesn't need to be, but in some ways I think evaluating the applicability of cell typing methods to compendia is something worth talking a bit about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean including a figure around this? I already had this thought, but wanted to wait until we wrote up the text to decide what exactly to include. See AlexsLemonade/scpca-paper-figures#41.
So maybe we do want to include a supplemental figure that looks at the delta median statistic across a few samples and a few references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to tag in @jaclyn-taroni to take a look at this and see what she thinks about including a figure showing reference comparisons and about the level of detail presented here. Just noting that this figure might look a bit messy, but we could make one and then make a decision on if it will help prove a point or just bring more questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it makes sense to try a supplemental figure showing reference comparisons, which is discussed in the cell type annotation section of the results. I propose that we split the "Annotation cell types" section into two subsections: evaluating the methods themselves and the workflow part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it makes sense to try a supplemental figure showing reference comparisons, which is discussed in the cell type annotation section of the results. I propose that we split the "Annotation cell types" section into two subsections: evaluating the methods themselves and the workflow part.
@jaclyn-taroni do you mean creating two sections in the results or here in the methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one in results that uses this header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the typo, so "Annotating cell types"
|
||
For `CellAssign`, marker gene references were created using the marker gene lists available on `PanglaoDB` [@doi:10.1093/database/baz046]. | ||
Organ-specific references were built using all cell types in a specified organ listed in `PanglaoDB` to accommodate all ScPCA projects encompassing a variety of disease and tissue type. | ||
If a set of disease types in a given project encompassed cells that may be present in multiple organ groups, multiple organs were combined - e.g., for sarcomas that appear in bone or soft tissue, we created a reference containing bone, connective tissue, smooth muscle, and immune cells. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this a good level of detail for the text, but we might want a supplemental table of the organ sets we used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a good idea so I'm going to file an issue regarding this in the figures repo.
|
||
All merged `SingleCellExperiment` objects were converted to `AnnData` objects and saved as `.hdf5` files. | ||
If a merged `SingleCellExperiment` object contains any ADT data, the RNA and ADT data was exported and saved separately as RNA (`_rna.hdf5`) and ADT (`_adt.hdf5`). | ||
In contrast, if a merged `SingleCellExperiment` object contained HTO data due to the presence of any multiplexed libraries in the merged object, the HTO data was removed from the `SingleCellExperiment` object and not included in the exported `AnnData` object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is really only one project, right? I kind of feel like we could just skip the whole thing in that case. (This is a discussion largely for somewhere else)
Co-authored-by: Joshua Shapiro <[email protected]>
Click the link below to download the manuscript build as a ZIP file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have already returned my comment about doing AlexsLemonade/scpca-paper-figures#41 and moving text around choices to the results. Let's remove that from this PR for now and file a ticket.
I agree with leaving the merged section as is for now.
content/04.methods.md
Outdated
The delta median statistic is helpful in evaluating how confident `SingleR` is in assigning each cell to a specific cell type, where low delta median values indicate ambiguous assignments and high delta median values indicate confident assignments. | ||
To identify the most appropriate reference to use with `SingleR`, we annotated a handful of samples across multiple disease types with all human-specific references available in the `celldex` package. | ||
`BlueprintEncodeData` had the most consistently high delta median statistic distribution across samples from multiple disease types and was chosen as the reference to use for all ScPCA samples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend taking this out for now and filing an issue, blocked by AlexsLemonade/scpca-paper-figures#41, to move talking about picking a cell type annotation method and references into results.
I removed the delta median discussion and filed #70, @jashapiro did you want to take another look at this or are we good to go? |
Click the link below to download the manuscript build as a ZIP file. |
Click the link below to download the manuscript build as a ZIP file. |
Closes #42
Closes #44
Stacked on #58
This PR adds in the methods section for cell type annotation and then conversion of all objects to AnnData objects.
For the cell type annotation section, what do we think about this level of detail? I included information about the delta median statistic since that's something we calculate. Are there other details regarding either building the references or running cell typing that I'm missing?