Methods for cell type annotation and AnnData conversion #61

allyhawkins · 2024-03-01T19:41:30Z

Closes #42
Closes #44
Stacked on #58

This PR adds in the methods section for cell type annotation and then conversion of all objects to AnnData objects.
For the cell type annotation section, what do we think about this level of detail? I included information about the delta median statistic since that's something we calculate. Are there other details regarding either building the references or running cell typing that I'm missing?

…methods' into allyhawkins/cell-type-methods

jashapiro

This looks good overall, but I think we need a bit more detail about the references, if that does not appear elsewhere. I also hav a few smaller comments, and a clarification about CellAssign scores.

content/04.methods.md

jashapiro · 2024-03-04T19:18:54Z

content/04.methods.md

+Organ-specific references were built using all cell types in a specified organ listed in `PanglaoDB`.
+References for each ScPCA project were assigned based on the tissue from which the sample was obtained. 


I feel like we might want a bit more detail here about how we made some of our decisions here, and the fact that we were often combining organs?

content/04.methods.md

jashapiro · 2024-03-04T19:44:29Z

content/04.methods.md

+
+All merged `SingleCellExperiment` objects were converted to `AnnData` objects and saved as `.hdf5` files.
+If a merged `SingleCellExperiment` object contains any ADT data, the RNA and ADT data was exported and saved separately as RNA (`_rna.hdf5`) and ADT (`_adt.hdf5`). 
+In contrast, if a merged `SingleCellExperiment` object contained HTO data due to the presence of any multiplexed libraries in the merged object, the HTO data was removed from the `SingleCellExperiment` object and not included in the exported `AnnData` object. 


Looking at this, I kind of feel like we probably should just not merge the multiplexed data... The logic written out like this seems very strange.

Do you mean not include multiplexed libraries in any merged objects? Because we have both regular libraries and multiplexed libraries in the same project so we would need to adjust the workflow to remove any multiplexed libraries before merging.

It is really only one project, right? I kind of feel like we could just skip the whole thing in that case. (This is a discussion largely for somewhere else)

Noting that I filed https://github.com/AlexsLemonade/ScPCA-admin/issues/832 to walk through some options around this. I think we leave it for now.

content/04.methods.md

Co-authored-by: Joshua Shapiro <[email protected]>

…-methods

github-actions · 2024-03-04T20:04:55Z

Click the link below to download the manuscript build as a ZIP file.
This build is associated with commit f871952.

Manuscript build

github-actions · 2024-03-04T20:36:08Z

Click the link below to download the manuscript build as a ZIP file.
This build is associated with commit e766aa4.

Manuscript build

allyhawkins · 2024-03-04T20:38:02Z

@jashapiro I added some wording regarding why we picked the BlueprintEncodeData reference and some more information on building the organ specific references. It was a little hard without explaining the organs used in every reference. We could be really specific and include a table with all references used and all organs that were used to create that reference?

I'm also not sure how much detail you want on the celldex reference. I added a little bit and related it back to the delta median statistic.

jashapiro

These updates look good, but I think I probably want someone else to weigh in on my first comment here. In particular, how much of the cell typing journey do we want to present in this paper? Do we want to comment about how difficult it is on a compendia-level basis, and particularly for cancer cells? I think this is probably something worth highlighting at the very least in the discussion, but we also might a bit of our little benchmarks in this paper.

That said, it opens an avenue for critique/suggestions of more experiments that maybe we don't want to highlight.

jashapiro · 2024-03-04T21:04:18Z

content/04.methods.md

+The delta median statistic is helpful in evaluating how confident `SingleR` is in assigning each cell to a specific cell type, where low delta median values indicate ambiguous assignments and high delta median values indicate confident assignments. 
+To identify the most appropriate reference to use with `SingleR`, we annotated a handful of samples across multiple disease types with all human-specific references available in the `celldex` package. 
+`BlueprintEncodeData` had the most consistently high delta median statistic distribution across samples from multiple disease types and was chosen as the reference to use for all ScPCA samples.


I too am not sure how much detail we want here! I think this is okay as far as it goes: I'm not sure if this maybe should actually be a result though? Probably doesn't need to be, but in some ways I think evaluating the applicability of cell typing methods to compendia is something worth talking a bit about.

You mean including a figure around this? I already had this thought, but wanted to wait until we wrote up the text to decide what exactly to include. See AlexsLemonade/scpca-paper-figures#41.

So maybe we do want to include a supplemental figure that looks at the delta median statistic across a few samples and a few references.

I'm going to tag in @jaclyn-taroni to take a look at this and see what she thinks about including a figure showing reference comparisons and about the level of detail presented here. Just noting that this figure might look a bit messy, but we could make one and then make a decision on if it will help prove a point or just bring more questions.

Yes, I think it makes sense to try a supplemental figure showing reference comparisons, which is discussed in the cell type annotation section of the results. I propose that we split the "Annotation cell types" section into two subsections: evaluating the methods themselves and the workflow part.

Yes, I think it makes sense to try a supplemental figure showing reference comparisons, which is discussed in the cell type annotation section of the results. I propose that we split the "Annotation cell types" section into two subsections: evaluating the methods themselves and the workflow part.

@jaclyn-taroni do you mean creating two sections in the results or here in the methods?

The one in results that uses this header

Without the typo, so "Annotating cell types"

jashapiro · 2024-03-04T21:05:07Z

content/04.methods.md

+
+For `CellAssign`, marker gene references were created using the marker gene lists available on `PanglaoDB` [@doi:10.1093/database/baz046]. 
+Organ-specific references were built using all cell types in a specified organ listed in `PanglaoDB` to accommodate all ScPCA projects encompassing a variety of disease and tissue type. 
+If a set of disease types in a given project encompassed cells that may be present in multiple organ groups, multiple organs were combined - e.g., for sarcomas that appear in bone or soft tissue, we created a reference containing bone, connective tissue, smooth muscle, and immune cells.


I think this a good level of detail for the text, but we might want a supplemental table of the organ sets we used.

I think that's a good idea so I'm going to file an issue regarding this in the figures repo.

jashapiro · 2024-03-04T21:06:42Z

content/04.methods.md

+
+All merged `SingleCellExperiment` objects were converted to `AnnData` objects and saved as `.hdf5` files.
+If a merged `SingleCellExperiment` object contains any ADT data, the RNA and ADT data was exported and saved separately as RNA (`_rna.hdf5`) and ADT (`_adt.hdf5`). 
+In contrast, if a merged `SingleCellExperiment` object contained HTO data due to the presence of any multiplexed libraries in the merged object, the HTO data was removed from the `SingleCellExperiment` object and not included in the exported `AnnData` object. 


It is really only one project, right? I kind of feel like we could just skip the whole thing in that case. (This is a discussion largely for somewhere else)

content/04.methods.md

Co-authored-by: Joshua Shapiro <[email protected]>

github-actions · 2024-03-05T17:27:55Z

Click the link below to download the manuscript build as a ZIP file.
This build is associated with commit ab59eca.

Manuscript build

jaclyn-taroni

I have already returned my comment about doing AlexsLemonade/scpca-paper-figures#41 and moving text around choices to the results. Let's remove that from this PR for now and file a ticket.

I agree with leaving the merged section as is for now.

jaclyn-taroni · 2024-03-06T13:55:51Z

content/04.methods.md

+The delta median statistic is helpful in evaluating how confident `SingleR` is in assigning each cell to a specific cell type, where low delta median values indicate ambiguous assignments and high delta median values indicate confident assignments. 
+To identify the most appropriate reference to use with `SingleR`, we annotated a handful of samples across multiple disease types with all human-specific references available in the `celldex` package. 
+`BlueprintEncodeData` had the most consistently high delta median statistic distribution across samples from multiple disease types and was chosen as the reference to use for all ScPCA samples.


I recommend taking this out for now and filing an issue, blocked by AlexsLemonade/scpca-paper-figures#41, to move talking about picking a cell type annotation method and references into results.

…-methods

allyhawkins · 2024-03-06T16:15:11Z

I removed the delta median discussion and filed #70, @jashapiro did you want to take another look at this or are we good to go?
We can revisit the merged objects after our discussion today.

github-actions · 2024-03-06T16:17:46Z

Click the link below to download the manuscript build as a ZIP file.
This build is associated with commit 91099b7.

Manuscript build

github-actions · 2024-03-07T14:47:19Z

Click the link below to download the manuscript build as a ZIP file.
This build is associated with commit 1876413.

Manuscript build

allyhawkins and others added 3 commits March 1, 2024 09:08

start cell typing methods

dabfc49

fill out cell typing and add zellkonverter

21dac65

Merge remote-tracking branch 'origin/allyhawkins/spatial-bulk-merged-…

7ecb13f

…methods' into allyhawkins/cell-type-methods

allyhawkins requested a review from jashapiro March 1, 2024 19:41

allyhawkins mentioned this pull request Mar 4, 2024

Methods section for metadata and ontologies #65

Merged

jashapiro reviewed Mar 4, 2024

View reviewed changes

Base automatically changed from allyhawkins/spatial-bulk-merged-methods to main March 4, 2024 19:50

allyhawkins and others added 2 commits March 4, 2024 14:02

Apply suggestions from code review

f871952

Co-authored-by: Joshua Shapiro <[email protected]>

Merge remote-tracking branch 'origin/main' into allyhawkins/cell-type…

7b73643

…-methods

justify singler and panglao references

e766aa4

allyhawkins requested a review from jashapiro March 4, 2024 20:38

jashapiro reviewed Mar 4, 2024

View reviewed changes

allyhawkins mentioned this pull request Mar 5, 2024

Include supplemental table with organs used to build PanglaoDB refs AlexsLemonade/scpca-paper-figures#75

Closed

rewording

ab59eca

Co-authored-by: Joshua Shapiro <[email protected]>

allyhawkins requested a review from jaclyn-taroni March 5, 2024 17:46

jaclyn-taroni approved these changes Mar 6, 2024

View reviewed changes

allyhawkins mentioned this pull request Mar 6, 2024

Supplemental figure for how we chose cell type annotation methods AlexsLemonade/scpca-paper-figures#41

Closed

allyhawkins added 2 commits March 6, 2024 10:14

remove section about references going to results

6d812df

Merge remote-tracking branch 'origin/main' into allyhawkins/cell-type…

91099b7

…-methods

Merge branch 'main' into allyhawkins/cell-type-methods

1876413

allyhawkins merged commit 1f262a0 into main Mar 7, 2024
1 check passed

allyhawkins deleted the allyhawkins/cell-type-methods branch March 7, 2024 14:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Methods for cell type annotation and AnnData conversion #61

Methods for cell type annotation and AnnData conversion #61

allyhawkins commented Mar 1, 2024

jashapiro left a comment

jashapiro Mar 4, 2024

jashapiro Mar 4, 2024

allyhawkins Mar 4, 2024

jashapiro Mar 4, 2024

allyhawkins Mar 5, 2024

github-actions bot commented Mar 4, 2024

github-actions bot commented Mar 4, 2024

allyhawkins commented Mar 4, 2024

jashapiro left a comment

jashapiro Mar 4, 2024

allyhawkins Mar 5, 2024

allyhawkins Mar 5, 2024

jaclyn-taroni Mar 6, 2024

allyhawkins Mar 6, 2024

jaclyn-taroni Mar 6, 2024

jaclyn-taroni Mar 6, 2024

jashapiro Mar 4, 2024

allyhawkins Mar 5, 2024

jashapiro Mar 4, 2024

github-actions bot commented Mar 5, 2024

jaclyn-taroni left a comment

jaclyn-taroni Mar 6, 2024

allyhawkins commented Mar 6, 2024

github-actions bot commented Mar 6, 2024

github-actions bot commented Mar 7, 2024

		Organ-specific references were built using all cell types in a specified organ listed in `PanglaoDB`.
		References for each ScPCA project were assigned based on the tissue from which the sample was obtained.

Methods for cell type annotation and AnnData conversion #61

Methods for cell type annotation and AnnData conversion #61

Conversation

allyhawkins commented Mar 1, 2024

jashapiro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Mar 4, 2024

github-actions bot commented Mar 4, 2024

allyhawkins commented Mar 4, 2024

jashapiro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Mar 5, 2024

jaclyn-taroni left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

allyhawkins commented Mar 6, 2024

github-actions bot commented Mar 6, 2024

github-actions bot commented Mar 7, 2024