-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Better handling of integers distribution in TableReport #1164
Comments
thanks @Vincent-Maladiere . to help look for a solution, here is a minimal reproducer of the issue that does not require generating a report: from matplotlib import pyplot as plt
import numpy as np
x = np.arange(9)
fig, ax = plt.subplots()
ax.hist(x) |
also to try out solutions, could you share the "Year" column you used above? |
I think maybe when there are few unique values we shouldn't plot a histogram but a stem plot instead: https://matplotlib.org/stable/plot_types/basic/stem.html#sphx-glr-plot-types-basic-stem-py or treat the variable as categorical and do a bar plot 🤔 if there was some way to detect that the actual values don't matter too much besides their ordering |
FWIW, the misalignment between bins and labels is something I've seen in general matplotlib use, so I don't know how it could be addressed specifically in the TableReport
I like the idea of using stem plots |
Maybe we could derive good heuristics using np.hist and plt.bar instead of plt.hist directly |
sure, I don't think it will make much of a difference -- plt.hist just forwards all arguments to np.hist |
What I meant is that we might have a better control of the bins by decoupling the hist computing from the bar plot. I don't have anything against stem plot though, as long as they are easy to see on small plots |
one question with the stem plots is how to handle outliers -- add a red stem on the side of the axis? |
btw here's another example in the "day of the week" column in this other issue |
This is where I prefer bars as well, although a red stem thingy looks fine I guess |
Problem Description
The xticks locations of integer distributions are often off, spacing the bars irregularly, which looks visually inconsistent.
The years in the plot above are floats, but converting to integers doesn't help.
Feature Description
We could display the bars with regularity for integers (and floats?), especially when the number of bins is < 10. We can come up with simple heuristic/fix at first
Alternative Solutions
.
Additional Context
skrub 0.4.0 :))
The text was updated successfully, but these errors were encountered: