Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differential heat trees give conflicting relative abundance results #369

Open
fionnualabulman opened this issue Jan 10, 2025 · 0 comments
Open

Comments

@fionnualabulman
Copy link

fionnualabulman commented Jan 10, 2025

Hello and thanks for the package.

I have been using differential heat trees to compare the relative abundance of AMF taxa between three land uses, combining samples taken at multiple sites from each land use. I have noticed a strange result on the heat tree matrix which does not match the relative abundance data contained in the dataframe created by psmelt(physeq).

This barplot below is created with the psmelt df and shows that native forest has a higher mean relative abundance of the genus Gigaspora (28%) than maize fields (4%). Kiwifruit orchard samples have 0% mean relative abundance of Gigaspora.
metacoder_inquiry

However when I use the same physeq object to plot a differential heat tree with metacoder, maize is shown to have a higher relative abundance of gigaspora than native forest. I am also surprised that no difference is shown in the abundance of gigaspora between native forest and kiwifruit orchards (coloured grey). All differences are found non-significant by the Wilcoxon rank-sum test.
metacoder_amf_soil

This is the diff_table entry for the comparison in question. Is the "Inf" value for log2_median_ratio to blame?

# A tibble: 231 × 7
taxon_id | treatment_1| treatment_2 | log2_median_ratio | median_diff | mean_diff | wilcox_p_value
 bi | maize | native | Inf | 0.0302 | -0.0532 | 0.767  

Do you have any idea what could be causing these inconsistencies? Once zero abundance OTUs are removed from the dataframe, Gigaspora is represented by 5 OTUs in 2 maize field sites and only 1 OTU from one site in native forest, but I believed I was plotting the taxon abundance not the OTUs so this shouldn't be an issue. All Gigaspora OTUs represent a single species. Below is the code I have used to create the metacoder plot:

#CREATE` OBJ ----
{otu_table <- (otu_table(physeq))
otu_table <- t(otu_table)
sample_data <- sample_data(physeq)
sample_data <- as.data.frame(sample_data)
names(sample_data)[names(sample_data) == "sample_name"] <- "SampleID"

# Ensure `otu_table` and `tax_table` are dataframes
otu_table_df <- as.data.frame(otu_table)
tax_table_df <- as.data.frame(tax_table)

# Add row names as a new column named 'OTU'
otu_table_df <- otu_table_df %>%
  rownames_to_column(var = "OTU")
tax_table_df <- tax_table_df %>%
  rownames_to_column(var = "OTU")

#add phyla to tax table
tax_table_df <- tax_table_df %>%
  mutate(Phylum = "Glomeromycota")

#create taxonomy column
tax_table_df$taxonomy <- with(tax_table_df, 
                          paste(Phylum, Class, Order, Family, Genus, Species, sep = ";"))

otu_data <- otu_table_df
tax_data <- tax_table_df
tax_data

#remove NA from all ranks except Genus and Species
tax_data <- tax_data %>%
  filter(!is.na(Class) & !is.na(Order) & !is.na(Family))
tax_data

# Join
otu_data <- left_join(otu_data, tax_data, by = "OTU")

# Create a sequential OTU_ID column and place first
otu_data$OTU_ID <- seq_along(otu_data[, 1])
otu_data <- otu_data %>%
  select(OTU_ID, everything())

#remove OTU sequences
otu_data <- otu_data[, !colnames(otu_data) %in% "OTU"]

#make taxmap
obj <- parse_tax_data(otu_data,
                      class_cols = "taxonomy",
                      class_sep = ";")

names(obj$data) <- "otu_counts"

#rarefy (physeq has already been rarefied)
obj$data$otu_rarefied <- rarefy_obs(obj, "otu_counts", other_cols = TRUE)

#remove 0 read OTUs
no_reads <- rowSums(obj$data$otu_rarefied[, sample_data$SampleID]) == 0
obj <- filter_obs(obj, "otu_rarefied", ! no_reads)

#convert read counts to proportions
obj$data$otu_props <- calc_obs_props(obj, "otu_counts", other_cols = TRUE)
obj$data$tax_abund <- calc_taxon_abund(obj, "otu_props")

obj$data$type_abund <- calc_group_mean(obj, "tax_abund",
                                       cols = sample_data$SampleID,
                                       groups = sample_data$system)

obj$data$diff_table <- compare_groups(obj, data = "tax_abund",
                                      cols = sample_data$SampleID,
                                      groups = sample_data$system)
print(obj$data$diff_table)

#correct for multiple comparisons
obj <- mutate_obs(obj, "diff_table",
                  wilcox_p_value = p.adjust(wilcox_p_value, method = "fdr"))

range(obj$data$diff_table$wilcox_p_value, finite = TRUE) 

dont_print <- c("NA")

jpeg(filename = "heat_tree_matrix_system_soil.jpg")
obj %>%
  metacoder::filter_taxa(supertaxa = TRUE, reassign_obs = c(diff_table = FALSE)) %>%
  heat_tree_matrix(data = "diff_table",
                   node_label = ifelse(taxon_names %in% dont_print, "", taxon_names),
                   node_size = n_obs,
                   node_color = log2_median_ratio, # difference between groups
                   node_color_trans = "linear",
                   node_color_interval = c(-3, 3), # symmetric interval
                   edge_color_interval = c(-3, 3), # symmetric interval
                   node_color_range = diverging_palette(), # diverging colors
                   node_color_axis_label = "Log 2 ratio of median counts",
                   node_size_axis_label = "Number of OTUs",
                   layout = "da", initial_layout = "re",
                   key_size = 0.6,
                   seed = 2)
dev.off()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant