Density scaling and bandwidth in type_ridge() #271

zeileis · 2024-11-27T22:33:35Z

In get_density() within type_ridge() the density estimates are scaled to the same maximum which I believe is incorrect. Also the bandwidths are by default computed separately whereas ggridges employs a joint bandwidth estimate which typically seems to be better. But maybe I'm overlooking something here, Vincent @vincentarelbundock ?

For illustration let's consider a sample with two groups with very different variances:

set.seed(0)
d <- data.frame(
  y = rep(1, 100),
  x = c(rnorm(50), rnorm(50, mean = 3, sd = 0.2)),
  z = factor(rep(1:2, each = 50))
)

Using ggridges a joint bandwith of 0.218 is selected and the resulting densities have very different maxima - as you would expect if both should have the area 1.

ggplot(d, aes(x = x, y = z)) + geom_density_ridges() + theme_minimal()
ggplot(d, aes(x = x, y = y, group = z)) + geom_density_ridges() + theme_minimal()

In contrast, because get_density() scales the density to have the same maximum, the second group has a much smaller area than the first in tinyplot(..., type = "ridge").

tinyplot(z ~ x, data = d, type = "ridge", grid = TRUE)
tinyplot(y ~ x | z, data = d, type = "ridge", grid = TRUE)

However, if I drop the scaling in line https://github.com/grantmcdermott/tinyplot/blob/main/R/type_ridge.R#L99 and select the same bandwidth of 0.218, then I get virtually the same result as in ggridges.

tinyplot(z ~ x, data = d, type = type_ridge(bw = 0.218), grid = TRUE)
tinyplot(y ~ x | z, data = d, type = type_ridge(bw = 0.218), grid = TRUE)

Note that type = "density" agrees here and chooses the same joint bandwidth without applying any scaling of the maximum:

tinyplot(~ x | z, data = d, type = "density", grid = TRUE)

The text was updated successfully, but these errors were encountered:

vincentarelbundock · 2024-11-27T22:58:08Z

Oh that's interesting. I chose this purely for visual reason, and had no principled reason. Sorry!

I'll defer to you on best defaults.

zeileis · 2024-11-28T02:45:09Z

OK, good, thanks for the quick feedback. Removing the scaling to the same maximum is straightforward.

Regarding the bandwidth: Both ggridges and type_density() seem to use the average of the individual bandwidths per group, see: https://github.com/wilkelab/ggridges/blob/master/R/stats.R#L109-L112 and https://github.com/grantmcdermott/tinyplot/blob/main/R/type_density.R#L83-L86

As the code is so similar: Is the type_density code inspired by ggridges or are they both inspired by something else? Why is this simply using the mean rather than the weighted mean (so that larger groups would receive more weight)?

Should we then always enforce that the same bandwith is used throughout? Or should we allow different bandwidth per ridgeline? If the latter, how should we specify this, with an additional argument or does anyone have a better idea?

grantmcdermott · 2024-12-20T16:27:10Z

As the code is so similar: Is the type_density code inspired by ggridges or are they both inspired by something else?

I wrote (what eventually became) the type_density code a long time ago, so I can't recall my exact thinking. But I think you be be correct in that I compared what I was doing with the ggridges code to make sure they were consistent. IIRC that was prompted by your suggestion to use a common bandwith across groups. See point 2 here.

Why is this simply using the mean rather than the weighted mean (so that larger groups would receive more weight)?

I don't have a good reason. I think I just did what was expedient. Should we switch to weighted.mean (both for type_density and type_ridges)?

zeileis · 2024-12-20T22:26:57Z

OK, thanks for the explanation. In any case, I think that type_density and type_ridge should use the same default bandwidth. And I lean towards weighted.mean but I haven't explored this more systematically to check whether it really works better in unbalanced data.

grantmcdermott · 2025-01-09T05:46:12Z

Note: with #284 we ultimately opted with individual bandwidths in (regular) grouped density plots... albeit with the possibility for users to override via the joint.bw argument.

Should we update the type_ridge code to do so as well?

zeileis · 2025-01-09T12:14:13Z

Yes, type_density() and type_ridge() should be consistent in their handling of the bandwidths, I think.

I'll try to have a closer look at the details in type_density() later tonight and then follow-up again.

zeileis linked a pull request Nov 28, 2024 that will close this issue

ylevels reordering in type_ridge() #270

Open

grantmcdermott mentioned this issue Dec 20, 2024

CRAN 0.3.0 release #280

Open

grantmcdermott mentioned this issue Dec 21, 2024

type_density() using the new type_data/type_draw system #243

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Density scaling and bandwidth in type_ridge() #271

Density scaling and bandwidth in type_ridge() #271

zeileis commented Nov 27, 2024

vincentarelbundock commented Nov 27, 2024

zeileis commented Nov 28, 2024

grantmcdermott commented Dec 20, 2024

zeileis commented Dec 20, 2024

grantmcdermott commented Jan 9, 2025

zeileis commented Jan 9, 2025

Density scaling and bandwidth in type_ridge() #271

Density scaling and bandwidth in type_ridge() #271

Comments

zeileis commented Nov 27, 2024

vincentarelbundock commented Nov 27, 2024

zeileis commented Nov 28, 2024

grantmcdermott commented Dec 20, 2024

zeileis commented Dec 20, 2024

grantmcdermott commented Jan 9, 2025

zeileis commented Jan 9, 2025