-
Notifications
You must be signed in to change notification settings - Fork 188
plot_ordination
The operation of this function also depends a lot on the
and
functions. See their wiki-pages for further details and examples.
Load the necessary packages and data.
library("phyloseq"); library("ggplot2")
data(GlobalPatterns)
"Trim" data. This is useful for plotting, and in this case, also useful for making examples that run in a short amount of time. Your reasoning and decisions in trimming are extremely important, and up to you. I am using several different methods of trimming here, for illustration and because the extent of data reduction is useful for my purposes. However, I make no assertion that these are the "right" approach(es) for your data, but rather, I highly recommend that you think hard about any trimming you do, and only commit to including it in your final analysis pipeline if you can defend the choices and have checked that they are robust.
Need to clean the zeros from GlobalPatterns. While we're at it, let's remove any OTUs observed less than 5 times, cumulative for all samples
GP <- prune_species(speciesSums(GlobalPatterns)>5, GlobalPatterns)
Still >13,000 OTUs. Now let's filter taxa that don't show up at least twice in 5 or more samples
wh0 <- genefilterSample(GP, filterfunSample(function(x){x > 2}), A=5)
GP <- prune_species(wh0, GP)
Do an additional trimming by cumulative abundance of phyla
top.TaxaGroup <- sort(
tapply(speciesSums(GP), taxTab(GP)[, "Phylum"], sum, na.rm = TRUE),
decreasing = TRUE
)
top.TaxaGroup <- top.TaxaGroup[top.TaxaGroup > 1*70000]
Now prune further, to just the most-abundant phyla
GP1 <- subset_species(GP, Phylum %in% names(top.TaxaGroup))
We will want to investigate a major prior among the samples, which is that some are human-associated microbiomes, and some are not. Define a human-associated versus non-human categorical variable:
human <- getVariable(GP1, "SampleType") %in% c("Feces", "Mock", "Skin", "Tongue")
Add new human variable to sample data:
sampleData(GP1)$human <- factor(human)
Now perform an unconstrained correspondence analysis on the abundance table of OTUs in samples...
GP.ca <- ordinate(GP1, "CCA")
Let's start by plotting just the species/taxa, and shading the points
by Phylum. Note that even in our "trimmed" dataset there are
nspecies(GP1)
=4,133 OTUs.
(p1 <- plot_ordination(GP1, GP.ca, type="taxa", color="Phylum"))
This is a big complicated looking plot, but that's not necessarily good.
There is actually a lot of overplotting/occlusion, which means that
the high number of points is getting in the way of our visual
understanding of the data. There are several additional examples for
dealing with this in the last 5 or so graphics examples below.
Next, let's plot only the samples, and shade the points by "SampleType"
while also modifying the shape according to whether they are human-associated.
There are a few additional ggplot2 layers added to make the plot even nicer...
```R
plot_ordination(GP1, GP.ca, type="samples", color="SampleType", shape="human") + geom_line() + geom_point(size=5)
p2 <- last_plot()
Now let's try combining both the samples and species together in one "biplot".
plot_ordination(GP1, GP.ca, type="biplot", color="Phylum")
p3 <- last_plot()
Hmmm, the overlap problem is affecting this again, let's try the "split" organization of the biplot, in which the samples/species are separated on two panels...
(p4 <- plot_ordination(GP1, GP.ca, type="split", color="Phylum", shape="human", label="SampleType") )
Let's manually assign a new color-scheme, so that we can set "samples" to black These are relatively general guidelines for manually assigning one (or more) colors to a ggplot-graphic. Most of the time manual assignment of the colors/color-scheme is not necessary.
my.cols <- rainbow(length(levels(p4$data$Phylum)))
names(my.cols) <- levels(p4$data$Phylum)
my.cols["samples"] <- "black"
p5 <- p4 + scale_colour_manual(values=my.cols)
(p5 <- p5 + opts(panel.background = theme_rect(colour="gray76", fill="gray76")) )
Let's try labelling the species-name of the OTUs, if assigned...
plot_ordination(GP1, GP.ca, type="split", color="Phylum", label="Species")
p6
Let's return to the observation that the first two axes are explained by some difference in the microbiome communities of human and non-human samples. Which taxa contribute to this? Are there major differences at the Phylum level (for instance)?
plot_ordination(GP1, GP.ca, type="taxa", color="Phylum") + facet_wrap(~Phylum, nrow=3)
p1b <- last_plot()
ggplot(p1$data, aes(x=CA1, y=CA2, color=Phylum)) + geom_density2d() + facet_wrap(~Phylum, nrow=3)
p1c <- last_plot()
Can use hexagon-binning as another alternative for dealing with overplotting.
ggplot(p1$data, aes(x=CA1, y=CA2)) + stat_binhex() + facet_wrap(~Phylum, nrow=3)
p1d <- last_plot()
Here is an example using boxplots to summarize/compare the Phylum-level groups of OTUs on each axis.
ggplot(p1$data, aes(x=Phylum, y=CA1, color=Phylum)) + geom_boxplot() +
opts( axis.text.x = theme_text(angle = -90, hjust = 0) )
p1e <- last_plot()
ggplot(p1$data, aes(x=Phylum, y=CA2, color=Phylum)) + geom_boxplot() +
opts( axis.text.x = theme_text(angle = -90, hjust = 0) )