Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add plots to QC report #57

Merged
merged 10 commits into from
Oct 4, 2021
Merged

Add plots to QC report #57

merged 10 commits into from
Oct 4, 2021

Conversation

jashapiro
Copy link
Member

@jashapiro jashapiro commented Sep 30, 2021

Here I am mostly adding the Total vs. gene count plot to the QC report. I also fixed the knee plot and made the session info hidden by default, so closing all three of those issues here.

I was also playing around with AlexsLemonade/scpca-nf#508, which I ended up abandoning, but in that process I added a bit of formatting to the tables, including making them not full-width, and striping the rows.

A few things to focus on that I was thinking about as I did them:

  • Please also look at all axis labels and legend titles! (Should I make the knee plot use non-scientific notation?)

  • For consistency among reports, I made the mitochondrial percentage go from 0-100% for all reports. This may make lower numbers harder to see in mostly-good plots (where no cells are above 50%, say). I thought the consistency was more important, but we could tweak the color scale if we think it matters.

  • The other potentially controversial aesthetic decision I made was to move the figure legends into the figure panels, at least for the knee & UMI/gene plots. In both of these plots, the location of the data is predictable, so in the vast majority of cases the positions I chose will not overlap data. Still, there is a chance they will, so if there is worry, I can move them back out. (when we add the miQC plot, the top right ought to work as a clear space)

I am attaching an example report (zipped, b/c github), and the two main figures I worked on below, for reference:
image
image

closes #39
closes #40
closes #19
closes #55

@jashapiro
Copy link
Member Author

Adding in the miQC plot as well, which looks like this (not included in the example report above):

image

Same story here where I moved the legend inside the plot. In most cases the upper right should have no genes, so it ought to be safe, but I can move back outside if that is preferred.

@jashapiro jashapiro requested a review from allyhawkins October 1, 2021 21:03
@jashapiro jashapiro changed the title 39 Add plot to QC: Total count vs. gene counts by mito content Add plots to QC report Oct 1, 2021
Copy link
Member

@allyhawkins allyhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is coming together really nicely! I just wanted to comment on a few minor things:

  1. I like the placement of the legends and am not overly concerned about if they overlap data. I think for the majority of data where you've placed them should be outside of where the data lie.
  2. We should be consistent with the type of case we choose to use for the axis titles. I think we could use sentence case as we do most everywhere else, and I think you are mostly using that now so I suggested a few changes to make that consistent throughout.
  3. You mentioned in the comment that you made the legend go from 0-100% for mito content. I think we should do the same thing with plotting the miQC plot. Since we always anticipate that axis to be on the same scale.

Comment on lines 137 to 153
ggplot(unfiltered_celldata, aes(x = rank, y = sum, color = filter_status))+
geom_point(aes(shape = filter_status)) +
scale_y_log10() +
scale_color_manual(values = c("navyblue", "grey")) +
geom_point(aes(shape = filter_status),alpha = 0.2, size = 2) +
scale_y_log10() +
scale_x_log10() +
scale_color_manual(values = c("navyblue", "grey40")) +
scale_shape_manual(values = c(16, 1)) + # filled and unfilled circles
labs(
x = "Rank",
y = "UMI count",
color = "Filter status",
shape = "Filter status"
)
) +
guides(color = guide_legend(override.aes = list(alpha = 1))) +
theme(legend.position = c(0, 0),
legend.justification = c(0,0),
legend.background = element_rect(color = "grey20", size = 0.25),
legend.box.margin = margin(rep(5, 4)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think changing the axis from scientific notation would be ideal if we could do that. For both the x and y axis. Also I'm not crazy about the colors because I personally think the purple and grey are hard to see the transition. What if we did red and grey?

geom_point(aes(shape = filter_status),alpha = 0.2, size = 2) +
scale_y_log10() +
scale_x_log10() +
scale_color_manual(values = c("navyblue", "grey40")) +
scale_shape_manual(values = c(16, 1)) + # filled and unfilled circles
labs(
x = "Rank",
y = "UMI count",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
y = "UMI count",
y = "Total UMI count",

Can we change this to say total? I know its minor but its slightly more descriptive.


## UMI - expressed gene count plot

```{r, fig.cap="Total UMI x genes expressed"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of this title. What about changing to something simple like Cell Metrics. I also don't know if we need the line after that says "Total UMI x genes expressed." I'm not sure it adds anymore information than the plot does.

Comment on lines 168 to 170
labs(x = "Total UMI Count",
y = "Number of Genes Expressed",
color = "Mitochondrial\nPercent") +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
labs(x = "Total UMI Count",
y = "Number of Genes Expressed",
color = "Mitochondrial\nPercent") +
labs(x = "Total UMI count",
y = "Number of genes expressed",
color = "Mitochondrial\npercent") +

If you like my suggestion of keeping all axis titles in sentence case then this should be changed.

Comment on lines 185 to 189
miQC::plotModel(filtered_sce, model = metadata(filtered_sce)$miQC_model) +
theme(legend.position = c(1, 1),
legend.justification = c(1,1),
legend.background = element_rect(color = "grey20", size = 0.25),
legend.box.margin = margin(rep(5, 4)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
miQC::plotModel(filtered_sce, model = metadata(filtered_sce)$miQC_model) +
theme(legend.position = c(1, 1),
legend.justification = c(1,1),
legend.background = element_rect(color = "grey20", size = 0.25),
legend.box.margin = margin(rep(5, 4)))
miQC::plotModel(filtered_sce, model = metadata(filtered_sce)$miQC_model) +
theme(legend.position = c(1, 1),
legend.justification = c(1,1),
legend.background = element_rect(color = "grey20", size = 0.25),
legend.box.margin = margin(rep(5, 4))) +
coord_cartesian(ylim = c(0,100)) +
labs(x = "Number of genes expressed",
y = "Mitochondrial percent")

To keep things consistent for mitochondrial percentages, I think it would be nice to always plot with the same y-axis range. Also to keep labels consistent, seem they are the same values being plotted maybe we should keep the same titles across the two plots?

@jashapiro
Copy link
Member Author

I updated legends and titles as suggested, using sentence case everywhere, and unifying labels.

I moved what I had put as figure captions to alt text because it was laregely redundant, as noted, but I thought we should have some kind of figure label beyond the title.

For the knee plot, I chose a more contrasting color scheme, but I didn't want to go to red, because that would imply (to me) that those points failed some QC. So I went with a dark green, and lightened up the grey. I also went with smaller points for the filtered cells. Let me know what you think.
image

Here is a full report file...
SCPCL000001_qc.html.zip

Copy link
Member

@allyhawkins allyhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the changes and think the green in the knee plot looks a lot better! I don't necessarily think we need the change in size and have previously been taught not to double encode variables where we have both size and color showing if a cell is considered passed or excluded but I'll leave that final call up to you. I do think the changes highlight the passed cells which is what we want so I can see an argument for either one.

I had one other minor axis wording change to keep the labels consistent between the Cell read metrics and miQC plot, but other than that I think this looks good!

inst/rmd/qc_report.rmd Outdated Show resolved Hide resolved
@jashapiro
Copy link
Member Author

I don't necessarily think we need the change in size and have previously been taught not to double encode variables where we have both size and color showing if a cell is considered passed or excluded but I'll leave that final call up to you.

Yeah, I don't usually do this, but I thought it was worth it in this case, as I think it solves a problem where the two kinds of points overlap.

@allyhawkins
Copy link
Member

Yeah, I don't usually do this, but I thought it was worth it in this case, as I think it solves a problem where the two kinds of points overlap.

I would agree, it does help show the separation a lot more. I think for the purposes of this plot its okay to keep it in.

@jashapiro jashapiro merged commit f76a63c into main Oct 4, 2021
@jashapiro jashapiro deleted the jashapiro/39-umi-gene-plot branch September 14, 2023 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants