-
Notifications
You must be signed in to change notification settings - Fork 795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs(example): Adds Confidence Interval Ellipses #3747
base: main
Are you sure you want to change the base?
Conversation
example showing bivariate deviation ellipses of petal length and width of three iris species
Happy with the end result, but not comfortable merging so much complexity I don't understand yet #3715
`scipy` is only used for one example in the user guide, but this will be the second https://docs.scipy.org/doc/scipy/release/1.15.0-notes.html#other-changes
Cannot express how relieved I am to see the CI finally green 😅 |
def pd_ellipse( | ||
df: pd.DataFrame, col_x: str, col_y: str, col_group: str | ||
) -> pd.DataFrame: | ||
cols = col_x, col_y | ||
groups = [] | ||
# TODO: Rewrite in a more readable way | ||
categories = df[col_group].unique() | ||
for category in categories: | ||
sliced = df.loc[df[col_group] == category, cols] | ||
ell_df = pd.DataFrame(np_ellipse(sliced.to_numpy()), columns=cols) # type: ignore | ||
ell_df[col_group] = category | ||
groups.append(ell_df) | ||
return pd.concat(groups).reset_index() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
- Figure out a more ergonomic way of applying the function to each group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcoGorelli not an urgent one.
Do you know of a more idiomatic way to write this pandas
code?
Based on this from 7 years ago:
altair/tests/examples_arguments_syntax/deviation_ellipses.py
Lines 37 to 45 in a22e8dc
columns = ['petalLength', 'petalWidth'] | |
petal_ellipse = [] | |
for species in iris.species.unique(): | |
ell_df = pd.DataFrame(ellipse(X=iris.loc[iris.species == species, columns].as_matrix()), | |
columns = columns) | |
ell_df['species'] = species | |
petal_ellipse.append(ell_df) | |
petal_ellipse = pd.concat(petal_ellipse, axis=0).reset_index() |
Personally I'd rather use polars
, but the pandas
dependency is already there due to https://github.com/altair-viz/vega_datasets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
polars
version
Could probably be reduced a bit further.
- Using
pl.DataFrame.partition_by
works- But needing to handle
dict[tuple[str, ...], pl.DataFrame]
for a single key seems like a code smell
- But needing to handle
def pl_ellipse(
df: pl.DataFrame, col_x: str, col_y: str, col_group: str
) -> pl.DataFrame:
parts = df.select(col_x, col_y, col_group).partition_by(
col_group, as_dict=True, include_key=False
)
return pl.concat(
pl.DataFrame(np_ellipse(group.to_numpy()), [col_x, col_y])
.with_columns(pl.lit(k[0]).alias(col_group))
.with_row_index()
for k, group in parts.items()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey - i haven't looked into ellipse
, but the pattern of creating a list of dataframes and then concatenating is what pandas recommends (as opposed to continuously concatenating in the for loop)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey - i haven't looked into
ellipse
, but the pattern of creating a list of dataframes and then concatenating is what pandas recommends (as opposed to continuously concatenating in the for loop)
Thanks @MarcoGorelli
I mean - if nothing jumped out at you as a pandas
anti-pattern - then that's a good sign at least 🙂
Re: (ellipse
|np_ellipse
) the only relevant parts would be the signature:
from typing import TypeAlias
import numpy as np
_2DArray: TypeAlias = np.ndarray[tuple[int, int], np.dtype[np.float64]]
def np_ellipse(arr: _2DArray, segments: int = 50) -> _2DArray: ...
The segments
parameter controls the number of rows (elements) returned.
So there's a potential for changing shape before/after numpy
Observed no visible reduction in quality. Slightly visible at `<=40`
Previously returned `segments+1` rows, but this isn't specified in `ggplot2 https://github.com/tidyverse/ggplot2/blob/efc53cc000e7d86e3db22e1f43089d366fe24f2e/R/stat-ellipse.R#L122
I forgot that the only requirement was that the import is the **first statement**. Partially reverts (7cd2a77)
Also resolves #3747 (comment)
groups = [] | ||
# TODO: Rewrite in a more readable way | ||
categories = df[col_group].unique() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
groups = [] | |
# TODO: Rewrite in a more readable way | |
categories = df[col_group].unique() | |
groups = [] | |
categories = df[col_group].unique() |
Will close #3715
Description
Adds an example inspired by (
ggplot2
|plotnine
).stat_ellipse()
.As can be seen in the first commit, this PR began by rebasing a closed PR from almost 7 years ago.
Relevant info from (#3715 (comment)):
Example
Tasks
sphinxext.altairgallery
parsing (CI Run)numpy
andscipy
pandas
(see thread)examples_methods_syntax
versionalt.(X|Y)
changed to chain.scale(zero=False)
plotnine.stat_ellipse
Future Work
I think a more generalized version of this would be a good fit for https://github.com/vega/altair_ally.
An issue might be the
scipy
dependency, which I really was hoping to be able to avoid here.The dendrogram example shows some kind of inlining from
scipy
- but I have no idea if that is possible for: