docs(example): Adds Confidence Interval Ellipses #3747

dangotbanned · 2025-01-04T17:42:46Z

Will close #3715

Description

Adds an example inspired by (ggplot2|plotnine).stat_ellipse().

As can be seen in the first commit, this PR began by rebasing a closed PR from almost 7 years ago.

Deviation ellipses example #514

I believe plotnine.stat_ellipse would be an example of implementing this with numpy, scipy.
Source code

I also found an old closed PR (#514 by @essicolo) that would have added an example for this.
The blocker at the time is no longer an issue as (#3202 by @joelostblom) added scipy as a docs dependency.

Example

Tasks

Fix sphinxext.altairgallery parsing (CI Run)
- (4944b80)
- (7cd2a77)
Finish tidying up from (1983ede)
- numpy and scipy
- pandas (see thread)
Add examples_methods_syntax version
- Only alt.(X|Y) changed to chain .scale(zero=False)
docs: rename to 'Confidence Interval Ellipses'
Provide more detail in the description (62927af)
- Reference plotnine.stat_ellipse
- Attributing author of original PR Serge-Étienne Parent (@essicolo)

Future Work

I think a more generalized version of this would be a good fit for https://github.com/vega/altair_ally.
An issue might be the scipy dependency, which I really was hoping to be able to avoid here.
The dendrogram example shows some kind of inlining from scipy - but I have no idea if that is possible for:

example showing bivariate deviation ellipses of petal length and width of three iris species

Happy with the end result, but not comfortable merging so much complexity I don't understand yet #3715

`scipy` is only used for one example in the user guide, but this will be the second https://docs.scipy.org/doc/scipy/release/1.15.0-notes.html#other-changes

https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-library-stubs-or-py-typed-marker https://github.com/vega/altair/actions/runs/12612565960/job/35149436953?pr=3747

Fixes https://github.com/vega/altair/actions/runs/12612637008/job/35149593128?pr=3747#step:6:25

Temporary fix for https://github.com/vega/altair/actions/runs/12612997919/job/35150338097?pr=3747

dangotbanned · 2025-01-04T19:32:14Z

Cannot express how relieved I am to see the CI finally green 😅
be087d2

dangotbanned · 2025-01-04T19:44:55Z

tests/examples_arguments_syntax/deviation_ellipses.py

+def pd_ellipse(
+    df: pd.DataFrame, col_x: str, col_y: str, col_group: str
+) -> pd.DataFrame:
+    cols = col_x, col_y
+    groups = []
+    # TODO: Rewrite in a more readable way
+    categories = df[col_group].unique()
+    for category in categories:
+        sliced = df.loc[df[col_group] == category, cols]
+        ell_df = pd.DataFrame(np_ellipse(sliced.to_numpy()), columns=cols) # type: ignore
+        ell_df[col_group] = category
+        groups.append(ell_df)
+    return pd.concat(groups).reset_index()


TODO

Figure out a more ergonomic way of applying the function to each group

@MarcoGorelli not an urgent one.

Do you know of a more idiomatic way to write this pandas code?

Based on this from 7 years ago:

altair/tests/examples_arguments_syntax/deviation_ellipses.py

Lines 37 to 45 in a22e8dc

columns = ['petalLength', 'petalWidth']

petal_ellipse = []

for species in iris.species.unique():

ell_df = pd.DataFrame(ellipse(X=iris.loc[iris.species == species, columns].as_matrix()),

columns = columns)

ell_df['species'] = species

petal_ellipse.append(ell_df)

petal_ellipse = pd.concat(petal_ellipse, axis=0).reset_index()

Personally I'd rather use polars, but the pandas dependency is already there due to https://github.com/altair-viz/vega_datasets

polars version

Could probably be reduced a bit further.

Using pl.DataFrame.partition_by works

But needing to handle dict[tuple[str, ...], pl.DataFrame] for a single key seems like a code smell

def pl_ellipse( df: pl.DataFrame, col_x: str, col_y: str, col_group: str ) -> pl.DataFrame: parts = df.select(col_x, col_y, col_group).partition_by( col_group, as_dict=True, include_key=False ) return pl.concat( pl.DataFrame(np_ellipse(group.to_numpy()), [col_x, col_y]) .with_columns(pl.lit(k[0]).alias(col_group)) .with_row_index() for k, group in parts.items() )

hey - i haven't looked into ellipse, but the pattern of creating a list of dataframes and then concatenating is what pandas recommends (as opposed to continuously concatenating in the for loop)

hey - i haven't looked into ellipse, but the pattern of creating a list of dataframes and then concatenating is what pandas recommends (as opposed to continuously concatenating in the for loop)

Thanks @MarcoGorelli
I mean - if nothing jumped out at you as a pandas anti-pattern - then that's a good sign at least 🙂

Re: (ellipse|np_ellipse) the only relevant parts would be the signature:

from typing import TypeAlias import numpy as np _2DArray: TypeAlias = np.ndarray[tuple[int, int], np.dtype[np.float64]] def np_ellipse(arr: _2DArray, segments: int = 50) -> _2DArray: ...

The segments parameter controls the number of rows (elements) returned.
So there's a potential for changing shape before/after numpy

tests/examples_arguments_syntax/deviation_ellipses.py

Observed no visible reduction in quality. Slightly visible at `<=40`

Previously returned `segments+1` rows, but this isn't specified in `ggplot2 https://github.com/tidyverse/ggplot2/blob/efc53cc000e7d86e3db22e1f43089d366fe24f2e/R/stat-ellipse.R#L122

I forgot that the only requirement was that the import is the **first statement**. Partially reverts (7cd2a77)

Also resolves #3747 (comment)

dangotbanned · 2025-01-06T21:00:58Z

tests/examples_arguments_syntax/deviation_ellipses.py

+    groups = []
+    # TODO: Rewrite in a more readable way
+    categories = df[col_group].unique()


Suggested change

groups = []

# TODO: Rewrite in a more readable way

categories = df[col_group].unique()

groups = []

categories = df[col_group].unique()

Serge-Étienne Parent and others added 9 commits January 4, 2025 16:16

Create deviation_ellipses.py

a22e8dc

example showing bivariate deviation ellipses of petal length and width of three iris species

docs: Initial rewrite of (#514)

1983ede

Happy with the end result, but not comfortable merging so much complexity I don't understand yet #3715

ci(typing): Adds scipy-stubs to altair[doc]

dc7639d

`scipy` is only used for one example in the user guide, but this will be the second https://docs.scipy.org/doc/scipy/release/1.15.0-notes.html#other-changes

fix: Only install scipy-stubs on >=3.10

a296b82

chore(typing): Ignore incorrect pandas stubs

eb25871

ci(typing): ignore scipy on 3.9

279fca5

https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-library-stubs-or-py-typed-marker https://github.com/vega/altair/actions/runs/12612565960/job/35149436953?pr=3747

docs: Add missing category

4944b80

fix: Add missing support for from __future__ import annotations

7cd2a77

Fixes https://github.com/vega/altair/actions/runs/12612637008/job/35149593128?pr=3747#step:6:25

test: skip example when scipy not installed

be087d2

Temporary fix for https://github.com/vega/altair/actions/runs/12612997919/job/35150338097?pr=3747

dangotbanned commented Jan 4, 2025

View reviewed changes

tests/examples_arguments_syntax/deviation_ellipses.py Outdated Show resolved Hide resolved

dangotbanned added 7 commits January 4, 2025 22:19

docs: reduce segments 100 -> 50

1623629

Observed no visible reduction in quality. Slightly visible at `<=40`

docs: Clean up numpy, scipy docs/comments

a357668

refactor: Simplify numpy transforms

ac34139

docs: add tooltip, increase size

e0e276b

fix: Remove incorrect range stop

dc0ae52

Previously returned `segments+1` rows, but this isn't specified in `ggplot2 https://github.com/tidyverse/ggplot2/blob/efc53cc000e7d86e3db22e1f43089d366fe24f2e/R/stat-ellipse.R#L122

refactor: Remove special casing __future__ import

4969a98

I forgot that the only requirement was that the import is the **first statement**. Partially reverts (7cd2a77)

docs: Remove unused method code

dcb9fa5

Also resolves #3747 (comment)

dangotbanned added the documentation label Jan 5, 2025

dangotbanned changed the title ~~docs(DRAFT): Add Confidence Interval Ellipse example~~ docs: Add Confidence Interval Ellipse example Jan 5, 2025

dangotbanned changed the title ~~docs: Add Confidence Interval Ellipse example~~ docs(example): Adds Confidence Interval Ellipses Jan 5, 2025

dangotbanned added 4 commits January 5, 2025 17:18

docs: rename to 'Confidence Interval Ellipses'

bd4e30f

docs: add references to description

62927af

Merge branch 'main' into vegalite_v2_examples

250d2b4

Merge branch 'main' into vegalite_v2_examples

fadb2e3

dangotbanned commented Jan 6, 2025

View reviewed changes

dangotbanned requested a review from joelostblom January 6, 2025 21:05

Merge branch 'main' into vegalite_v2_examples

5886aae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(example): Adds Confidence Interval Ellipses #3747

docs(example): Adds Confidence Interval Ellipses #3747

dangotbanned commented Jan 4, 2025 •

edited

Loading

dangotbanned commented Jan 4, 2025

dangotbanned Jan 4, 2025 •

edited

Loading

dangotbanned Jan 4, 2025

dangotbanned Jan 5, 2025

MarcoGorelli Jan 5, 2025

dangotbanned Jan 6, 2025

dangotbanned Jan 6, 2025

	columns = ['petalLength', 'petalWidth']
	petal_ellipse = []
	for species in iris.species.unique():
	ell_df = pd.DataFrame(ellipse(X=iris.loc[iris.species == species, columns].as_matrix()),
	columns = columns)
	ell_df['species'] = species
	petal_ellipse.append(ell_df)

	petal_ellipse = pd.concat(petal_ellipse, axis=0).reset_index()

docs(example): Adds Confidence Interval Ellipses #3747

Are you sure you want to change the base?

docs(example): Adds Confidence Interval Ellipses #3747

Conversation

dangotbanned commented Jan 4, 2025 • edited Loading

Description

Example

Tasks

Future Work

dangotbanned commented Jan 4, 2025

dangotbanned Jan 4, 2025 • edited Loading

Choose a reason for hiding this comment

TODO

dangotbanned Jan 4, 2025

Choose a reason for hiding this comment

dangotbanned Jan 5, 2025

Choose a reason for hiding this comment

polars version

MarcoGorelli Jan 5, 2025

Choose a reason for hiding this comment

dangotbanned Jan 6, 2025

Choose a reason for hiding this comment

dangotbanned Jan 6, 2025

Choose a reason for hiding this comment

dangotbanned commented Jan 4, 2025 •

edited

Loading

dangotbanned Jan 4, 2025 •

edited

Loading

`polars` version