Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expression Level of Target and Perturbation Scores Positively Correlated? #9

Open
MujeebQadiri opened this issue Sep 3, 2024 · 5 comments

Comments

@MujeebQadiri
Copy link

Hi!

First off, thank you for this analysis tool!

I am analyzing some Perturb-Seq data and have followed the vignette. I ran Mixscale_Scatterplot and interestingly the perturbation score seems positively correlated to the expression of my knocked-down transcription factors. I would expect a the highest perturbation score to be correlated with very low expression of the target gene.

I'm interested to know what you believe may be happening or whether I have made an error in my code. Thank you!

plot_zoom_png

Here is my code:

library(Seurat)
library(ggridges)
library(ggplot2)
library(Mixscale)
library(dplyr)

file <- "experimental"
load(paste0("data/",file,"_scMAGeCK_meta_filtered.RData"))

[email protected]$perturb_status <- case_when(is.na([email protected]$gene) ~ "NP",
                                                      [email protected]$gene == "Non-Targeting" ~ "NT",
                                                      .default = "KO")

# standard pre-processing
filtered_seurat = NormalizeData(filtered_seurat)
filtered_seurat = FindVariableFeatures(filtered_seurat, nfeatures = 2000)
filtered_seurat = ScaleData(filtered_seurat)
filtered_seurat = RunPCA(filtered_seurat)
Idents(filtered_seurat) <- "gene"

# calculate Perturbation signatures 
seurat_obj <- CalcPerturbSig(
  object = filtered_seurat, 
  assay = "RNA", 
  slot = "data", 
  gd.class ="gene", 
  nt.cell.class = "Non-Targeting", 
  reduction = "pca", 
  ndims = 40, 
  num.neighbors = 20, 
  new.assay.name = "PRTB", 
  split.by = NULL) 

# run mixscale
seurat_obj = RunMixscale(
  object = seurat_obj, 
  assay = "PRTB", 
  slot = "scale.data", 
  labels = "gene", 
  nt.class.name = "Non-Targeting", 
  min.de.genes = 5, 
  logfc.threshold = 0.2,
  de.assay = "RNA",
  max.de.genes = 100, 
  new.class.name = "mixscale_score", 
  fine.mode = F, 
  verbose = F, 
  split.by = NULL)

# Visualize
RidgePlot(
  seurat_obj,
  features = "mixscale_score",
  group.by = "gene") + NoLegend()

# Check if the scores correlate with the expression level of the target gene itself
Mixscale_ScatterPlot(object = seurat_obj, 
                     nt.class.name = "Non-Targeting", 
                     slct.ident = unique(seurat_obj$gene)[unique(seurat_obj$gene) != "Non-Targeting"][1:8], 
                     nbin = 10, 
                     facet_wrap = "gene") + NoLegend()

Best,
Mujeeb

@longmanz
Copy link
Collaborator

longmanz commented Sep 4, 2024

Hi Mujeeb,
Thank you for using our tool. The positive correlation is indeed unexpected. Based on your results, I notice a few points that might worth looking into:

  1. the expression of these transcription factors (TFs) seem relatively low (less than 0.5). Is that as expected? Have you checked their expression level in the NT group? You may do a quick check by running DoHeatmap() for each TFs.
  2. Do you expect to observe strong batch effect in your data? You can check this by plotting a standard UMAP. Our tool currently does not work well when there is a strong batch effect.
  3. Have you checked the RidgePlot of the Mixscale scores for each perturbation? Do the distributions look normal (bi-modal)?
  4. When running RunMixscale, you may consider using a higher logfc.threshold (e.g., 0.5). This will reduce the number of small/moderate DE genes that are used for Mixscale score calculation. I suspect that these small effect DE genes mask the true signals.

@MujeebQadiri
Copy link
Author

MujeebQadiri commented Sep 10, 2024

Hi Mujeeb, Thank you for using our tool. The positive correlation is indeed unexpected. Based on your results, I notice a few points that might worth looking into:

  1. the expression of these transcription factors (TFs) seem relatively low (less than 0.5). Is that as expected? Have you checked their expression level in the NT group? You may do a quick check by running DoHeatmap() for each TFs.
  2. Do you expect to observe strong batch effect in your data? You can check this by plotting a standard UMAP. Our tool currently does not work well when there is a strong batch effect.
  3. Have you checked the RidgePlot of the Mixscale scores for each perturbation? Do the distributions look normal (bi-modal)?
  4. When running RunMixscale, you may consider using a higher logfc.threshold (e.g., 0.5). This will reduce the number of small/moderate DE genes that are used for Mixscale score calculation. I suspect that these small effect DE genes mask the true signals.

Hi, thank you for your reply!

  1. It seems as though the expression of some of these transcription factors is low in both the NT and perturbed cells. I will check with the scientist I am working with to see if this is expected for their experiment.

image

image

  1. I have two conditions, stimulated and control -- however, these are being processed separately and in both cases this trend can be observed.

  2. I have checked the RidgePlot and the distribution looks bimodal for most of the sgRNAs (Stat92E, foxo, REPTOR) although very weakly.

image

  1. I tried using a logfc threshold of 0.5 however the trend remains the same.

image
image

Best,
Mujeeb

@MujeebQadiri
Copy link
Author

Perhaps a bit off-topic but thought I should note: for guide RNA calling, is it sufficient to use the Cell Ranger algorithm to determine guide-cell assignment?

My workflow for pre-processing includes filtering the CRISPR Gene Expression matrix produced by Cell Ranger by the UMI threshold (for each sgRNA to remove ambient guide RNA) and assigning each cell's identity according to whether a sgRNA passes the threshold (and accordingly, identifying multiplets).

Best,
Mujeeb

@longmanz
Copy link
Collaborator

Hi Mujeeb,
Thank you for your information. Based on your results, the positive correlation seems to be real, but it is unclear to me why that would be the case.

  1. Have you run Run_wmvRegDE() and visualize the DEGs for each perturbation (Mixscale_DoHeatmap()) ? It is possible that the expression of these TFs is not a good indicator of perturbation strength (since they are lowly expressed), and so the correlation between the mixscale scores and the expression is weak.

  2. Another possibility is that there is some batch effect between your perturbed cells and NT cells. In that case, the shift of expression is not driven by perturbation, so there is weak correlation between the mixscale score and perturbation strength.

@MujeebQadiri
Copy link
Author

Hi Mujeeb, Thank you for your information. Based on your results, the positive correlation seems to be real, but it is unclear to me why that would be the case.

  1. Have you run Run_wmvRegDE() and visualize the DEGs for each perturbation (Mixscale_DoHeatmap()) ? It is possible that the expression of these TFs is not a good indicator of perturbation strength (since they are lowly expressed), and so the correlation between the mixscale scores and the expression is weak.
  2. Another possibility is that there is some batch effect between your perturbed cells and NT cells. In that case, the shift of expression is not driven by perturbation, so there is weak correlation between the mixscale score and perturbation strength.

I agree with you, this may be the case because the TFs are so lowly expressed. Looking at the wmvRegDE, many of the downstream targets we expect to be affected are significantly down-regulated.

Thank you for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants