Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scAdvanced GSEA notebook that uses pseudobulk RMS DE results #808

Merged
merged 8 commits into from
Nov 25, 2024

Conversation

jaclyn-taroni
Copy link
Member

Closes #711

Here, I am updating the GSEA notebook in the advanced single-cell module to use the output of the 03-differential_expression notebook in that module: pseudobulk DE results comparing expression in myoblasts between ERMS and ARMS samples.

I've written this using Hallmarks gene sets, which are fairly comprehensive and designed to work with GSEA. There are also only 50 of them.

I'm also deleting the ORA and old GSEA notebook.

My rationale for teaching GSEA and not ORA is that many pathway methods that work on the individual cell level are rank-based (including AUCell – #806), and teaching a seminal FCS method, run in a way that is fairly quick, makes sense to me. Accordingly, I plan to make the pathway analysis slides to talk about FCS methods more generally (like ssGSEA).


Normalized enrichment scores (NES) are enrichment scores that are scaled to make gene sets that contain different number of genes comparable.

Pathways with significant, highly positive NES are enriched in ERMS myoblasts, whereas pathways with significant, highly negative NES are enriched in ARMS myoblasts.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before figuring out who to request for a full review, I'm going to ask @jashapiro to take a look at this interpretation as the instructor of the 03 notebook in the upcoming workshop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct. The scores are relative to ARMS, so:

  • Positive values ---> ERMS is higher than ARMS ---> Enriched for ERMS.
  • Negative values ---> ERMS is lower than ARMS ---> Enriched for ARMS.

Copy link
Member

@sjspielman sjspielman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine to me, I didn't really have any code comments.

The main thing I'm not 100% sure on about the lesson swap overall is the fact that we lose teaching gene identifier conversion since it was only in the ORA notebook. I think this is important to touch on since it comes up a lot! I wonder we can still at least introduce the concept in this notebook - where we say "no need to do gene conversion!" Maybe we also say, "but if we had to, we might use AnnotationDBI, and here's a nice vignette about that: https://hbctraining.github.io/DGE_workshop_salmon_online/lessons/AnnotationDbi_lesson.html".

scRNA-seq-advanced/04-gene_set_enrichment_analysis.Rmd Outdated Show resolved Hide resolved
scRNA-seq-advanced/04-gene_set_enrichment_analysis.Rmd Outdated Show resolved Hide resolved

#### Other resources

* For another example using `clusterProfiler` for GSEA, see [_Intro to DGE: Functional Analysis._ from Harvard Chan Bioinformatics Core Training.](https://hbctraining.github.io/DGE_workshop/lessons/09_functional_analysis.html)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized this training has been read-only for a little while.

I think this is probably their more maintained version? https://hbctraining.github.io/Training-modules/DGE-functional-analysis/lessons/02_functional_analysis.html

scRNA-seq-advanced/04-gene_set_enrichment_analysis.Rmd Outdated Show resolved Hide resolved
scRNA-seq-advanced/04-gene_set_enrichment_analysis.Rmd Outdated Show resolved Hide resolved
@jaclyn-taroni
Copy link
Member Author

The main thing I'm not 100% sure on about the lesson swap overall is the fact that we lose teaching gene identifier conversion since it was only in the ORA notebook. I think this is important to touch on since it comes up a lot!

If this were the scRNA-seq training, I would have retained it. However, these pathway instruction notebooks need to be on the shorter side, and I think it's fair to assume that some participants for this offering will be familiar with gene identifier conversion. I think we can revisit in https://github.com/AlexsLemonade/exercise-notebook-answers/issues/227.

@jaclyn-taroni
Copy link
Member Author

Thank you, @sjspielman. This is ready for another look!

Copy link
Member

@sjspielman sjspielman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming test passes, and I have no reason to think it won't!

@jaclyn-taroni jaclyn-taroni merged commit 6d4bf3c into master Nov 25, 2024
2 checks passed
@jaclyn-taroni jaclyn-taroni deleted the jaclyn-taroni/711-use-rms-de branch November 25, 2024 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update pathway analysis to use DE results?
2 participants