feat: Use HTML to present warnings of train_test_split when in notebooks #1060

rouk1 · 2025-01-08T18:41:34Z

This PR modifies the behavior of train_test_split.
From now on, in a notebook context (vscode or jupyter), messages will no longer be warnings but HTML, in order to avoid horizontal scrolling.

Why do this?

The train_test_split feature is dedicated to an experimentation phase and has an educational purpose. We anticipate that most of our users will use notebooks or an interactive context. Some will disable warnings...
Therefore, I believe that warning messages are not the right communication medium during the experimentation phase. Additionally, it's not very visually appealing. While waiting to create a proper TrainTestSplitReport, this PR proposes an alternative.

UI preview

jupyter.mp4

vscode.mp4

… or jupyter

github-actions · 2025-01-08T18:43:47Z

pytest coverage report

File	Stmts	Miss	Cover	Missing
src/skore
__init__.py	12	0	100%
__main__.py	8	1	1	80%
exceptions.py	3	0	100%
src/skore/cli
__init__.py	5	0	100%
cli.py	33	0	100%
color_format.py	43	3	2	90%
launch_dashboard.py	26	15	0	39%
quickstart_command.py	14	0	100%
src/skore/externals
__init__.py	0	0	100%
_sklearn_compat.py	220	183	4	15%
src/skore/item
__init__.py	21	0	100%
cross_validation_item.py	137	10	2	93%
item.py	41	13	0	68%
item_repository.py	42	2	1	93%
media_item.py	70	4	1	94%
numpy_array_item.py	25	1	1	93%
pandas_dataframe_item.py	34	1	1	95%
pandas_series_item.py	34	1	1	95%
polars_dataframe_item.py	32	1	1	94%
polars_series_item.py	27	1	1	94%
primitive_item.py	27	2	1	92%
sklearn_base_estimator_item.py	33	1	1	95%
skrub_table_report_item.py	10	1	1	86%
src/skore/persistence
__init__.py	0	0	100%
abstract_storage.py	22	1	0	95%
disk_cache_storage.py	33	1	1	95%
in_memory_storage.py	20	0	100%
src/skore/project
__init__.py	3	0	100%
create.py	52	8	0	88%
load.py	23	3	0	89%
open.py	14	0	100%
project.py	64	4	4	91%
src/skore/sklearn
__init__.py	3	0	100%
find_ml_task.py	35	3	2	89%
types.py	2	0	100%
src/skore/sklearn/cross_validation
__init__.py	2	0	100%
cross_validation_helpers.py	47	4	1	90%
cross_validation_reporter.py	35	1	1	95%
src/skore/sklearn/cross_validation/plots
__init__.py	0	0	100%
compare_scores_plot.py	29	1	2	92%
timing_plot.py	29	1	1	94%
src/skore/sklearn/train_test_split
__init__.py	0	0	100%
train_test_split.py	48	9	2	81%
src/skore/sklearn/train_test_split/warning
__init__.py	8	0	100%
high_class_imbalance_too_few_examples_warning.py	17	3	2	78%
high_class_imbalance_warning.py	18	2	1	88%
random_state_unset_warning.py	11	1	1	87%
shuffle_true_warning.py	9	0	1	91%
stratify_is_set_warning.py	11	1	1	87%
time_based_column_warning.py	22	1	2	89%
train_test_split_warning.py	5	1	0	80%
src/skore/ui
__init__.py	0	0	100%
app.py	25	5	2	71%
dependencies.py	7	1	0	86%
project_routes.py	50	0	100%
src/skore/utils
__init__.py	0	0	100%
_environment.py	26	10	1	50%
_logger.py	21	4	0	84%
_show_versions.py	31	0	100%
src/skore/view
__init__.py	0	0	100%
view.py	5	0	100%
view_repository.py	16	2	1	83%
TOTAL	1640	307	80%

Tests	Skipped	Failures	Errors	Time
259	0 💤	0 ❌	0 🔥	34.873s ⏱️

glemaitre · 2025-01-08T21:50:16Z

It might be fine temporary. My short term vision is that those warnings should be collected and added the EstimatorReport and accessed from an accessor such that the user is pulling the information and we don't throw stuff to him/her.

sylvaincom · 2025-01-09T15:20:50Z

Thanks, could we also display the title of the warning such as ShuffleTrueWarning? To kind of have a key takeaway for the user

github-actions · 2025-01-09T17:13:36Z

Coverage Report for backend

File	Stmts	Miss	Cover	Missing
venv/lib/python3.12/site-packages/skore
__init__.py	12	0	100%
__main__.py	8	1	80%	19
exceptions.py	3	0	100%
venv/lib/python3.12/site-packages/skore/cli
__init__.py	5	0	100%
cli.py	33	3	85%	104, 111, 117
color_format.py	43	3	90%	35–>40, 41–43
launch_dashboard.py	26	15	39%	36–57
quickstart_command.py	14	7	50%	37–51
venv/lib/python3.12/site-packages/skore/item
__init__.py	21	0	100%
cross_validation_item.py	137	10	93%	27–42, 370
item.py	41	13	68%	85, 88, 92–112
item_repository.py	42	2	93%	12–13
media_item.py	70	4	94%	15–18
numpy_array_item.py	25	1	93%	15
pandas_dataframe_item.py	34	1	95%	15
pandas_series_item.py	34	1	95%	15
polars_dataframe_item.py	32	1	94%	15
polars_series_item.py	27	1	94%	15
primitive_item.py	27	2	92%	13–15
sklearn_base_estimator_item.py	33	1	95%	15
skrub_table_report_item.py	10	1	86%	11
venv/lib/python3.12/site-packages/skore/persistence
__init__.py	0	0	100%
abstract_storage.py	22	1	95%	130
disk_cache_storage.py	33	1	95%	44
in_memory_storage.py	20	0	100%
venv/lib/python3.12/site-packages/skore/project
__init__.py	3	0	100%
create.py	52	8	88%	116–122, 132–133, 140–141
load.py	23	3	89%	43–45
open.py	14	0	100%
project.py	64	4	91%	135, 149, 183, 187
venv/lib/python3.12/site-packages/skore/sklearn
__init__.py	4	0	100%
find_ml_task.py	35	1	95%	41–>49, 50
types.py	2	0	100%
venv/lib/python3.12/site-packages/skore/sklearn/_estimator
__init__.py	10	0	100%
base.py	76	2	98%	87–88
metrics_accessor.py	198	2	98%	131, 266
report.py	165	1	97%	145–>151, 147–>149, 150, 153–>155, 159–>163, 408–>413
utils.py	11	11	0%	1–19
venv/lib/python3.12/site-packages/skore/sklearn/_plot
__init__.py	4	0	100%
precision_recall_curve.py	126	2	97%	200–>203, 313–314
prediction_error.py	75	0	99%	289–>297
roc_curve.py	95	3	94%	156, 167–>170, 223–224
utils.py	77	0	100%
venv/lib/python3.12/site-packages/skore/sklearn/cross_validation
__init__.py	2	0	100%
cross_validation_helpers.py	47	4	90%	104–>136, 123–126
cross_validation_reporter.py	35	1	95%	177
venv/lib/python3.12/site-packages/skore/sklearn/cross_validation/plots
__init__.py	0	0	100%
compare_scores_plot.py	29	1	92%	10, 45–>48
timing_plot.py	29	1	94%	10
venv/lib/python3.12/site-packages/skore/sklearn/train_test_split
__init__.py	0	0	100%
train_test_split.py	49	9	82%	18–19, 215–234
venv/lib/python3.12/site-packages/skore/sklearn/train_test_split/warning
__init__.py	8	0	100%
high_class_imbalance_too_few_examples_warning.py	17	3	78%	16–18, 80
high_class_imbalance_warning.py	18	2	88%	16–18
random_state_unset_warning.py	11	1	87%	15
shuffle_true_warning.py	9	0	91%	44–>exit
stratify_is_set_warning.py	11	1	87%	15
time_based_column_warning.py	22	1	89%	17, 69–>exit
train_test_split_warning.py	5	1	80%	21
venv/lib/python3.12/site-packages/skore/ui
__init__.py	0	0	100%
app.py	25	5	71%	24, 53–58
dependencies.py	7	1	86%	12
project_routes.py	50	0	100%
venv/lib/python3.12/site-packages/skore/utils
__init__.py	0	0	100%
_accessor.py	7	0	100%
_environment.py	26	10	50%	24–30, 36–40
_logger.py	21	4	84%	14–18
_show_versions.py	31	0	100%
venv/lib/python3.12/site-packages/skore/view
__init__.py	0	0	100%
view.py	5	0	100%
view_repository.py	16	2	83%	8–9
TOTAL	2266	153	92%

Tests	Skipped	Failures	Errors	Time
349	0 💤	0 ❌	0 🔥	44.338s ⏱️

github-actions · 2025-01-09T17:14:31Z

Documentation preview @ a7bf2d3

rouk1 · 2025-01-09T17:15:19Z

Thanks, could we also display the title of the warning such as ShuffleTrueWarning? To kind of have a key takeaway for the user

What do you think ?

vscode

jupyter

sylvaincom · 2025-01-09T18:25:46Z

Thanks! LGTM

What if there are several warnings?

glemaitre · 2025-01-09T23:20:31Z

skore/src/skore/sklearn/train_test_split/train_test_split.py

+    if is_environment_html_capable():
+        with contextlib.suppress(ImportError):
+            from IPython.core.interactiveshell import InteractiveShell
+            from IPython.display import HTML, display
+            from markdown import markdown
+
+            if InteractiveShell.initialized():
+                markup = "".join(
+                    [
+                        HTML_WARNING_TEMPLATE.format(
+                            warning=markdown(warning),
+                            warning_class=re.sub(
+                                "([A-Z][a-z]+)",
+                                r" \1",
+                                re.sub("([A-Z]+)", r" \1", warning_class.__name__),
+                            ).lstrip(),
+                        )
+                        for warning, warning_class in to_display
+                    ]
+                )
+                display(HTML(markup))
+    else:
+        for warning, warning_class in to_display:
            warnings.warn(message=warning, category=warning_class, stacklevel=1)


An alternative that would work in both HTML and in prompt would be to leverage rich:

from skore import console # avoid circular import from rich.panel import Panel for warning_class in TRAIN_TEST_SPLIT_WARNINGS: warning = warning_class.check(**kwargs) if warning is not None: # Only check if warning is filtered/ignored if not warnings.filters or not any( f[0] == "ignore" and f[2] == warning_class for f in warnings.filters ): console.print( Panel( title=warning_class.__name__, renderable=warning, style="orange1", border_style="cyan", ) )

It would provide the following outputs:

And here it is configure such that someone filter the warnings with ignore, then we don't show it.

It might be a bit less invasive than checking for the environment (while this code could be useful for some other stuff along the road).

Using rich everywhere does provide coherence (with the EstimatorReport of #997)

Can rich display False with code style for example?

Can rich display False with code style for example?

Yes rich is capable to display markdown.

I hadn't considered using pure textual, it's cool and lightweight. Only down side is that we cant match hosting styles.
I'm happy to refactor that way. @MarieS-WiMLDS can you please settle this ?

What do you mean by matching hosting style? Is it light/dark theme, or is it about the font?
I really don't mind if we don't have the default font of the user, it's more troublesome if it's about the theme, because it might affect readability.

An note that we have a similar limitation with the logging when creating project, etc. where we use rich.

Thanks for your insights, anyway an orange background seems anti-UX, let's move on with rich. Should we close this PR and open a new one for rich? How do you wish to proceed with rich?

Agree with Sylvain, very unlikely for a user to have a orange background!
The "we can do it later" stuff tends to be never done, especially when it's about having clean code, so I'd prefer that we go for rich before merging, if it's compatible with your deadline of tuesday evening.

You're the best ⚡️

sylvaincom · 2025-01-10T16:23:34Z

Superseded by ~~#1060~~ #1086

closes #1060 It is the alternative to #1060 using `rich`. I added a test to check that we can filter the warning since we are not using the usual `warnings` module. In the future, we could factor out the code in a utils to be sure that we can also transform the warnings into error.

feat: Use HTML to present warnings of train_test_split when in vscode…

7490f19

… or jupyter

github-actions bot assigned rouk1 Jan 8, 2025

rouk1 changed the title ~~feat: Use HTML to present warnings of train_test_split when in vscode…~~ feat: Use HTML to present warnings of train_test_split when in notebooks Jan 8, 2025

rouk1 added 2 commits January 9, 2025 11:20

use environement css vars as much as possible

cb20147

Merge branch 'main' into train-test-split-html-warnings

c5101dd

rouk1 marked this pull request as ready for review January 9, 2025 10:33

add warning title

9e94136

sylvaincom self-requested a review January 9, 2025 18:29

glemaitre reviewed Jan 9, 2025

View reviewed changes

rouk1 added 2 commits January 10, 2025 10:34

fix bottom padding in jupyter

2195aad

Merge branch 'main' into train-test-split-html-warnings

a7bf2d3

glemaitre mentioned this pull request Jan 10, 2025

feat: Use rich Panel for showing warning in train_test_split #1086

Merged

sylvaincom closed this Jan 10, 2025

rouk1 deleted the train-test-split-html-warnings branch January 13, 2025 09:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Use HTML to present warnings of train_test_split when in notebooks #1060

feat: Use HTML to present warnings of train_test_split when in notebooks #1060

rouk1 commented Jan 8, 2025 •

edited

Loading

github-actions bot commented Jan 8, 2025 •

edited

Loading

glemaitre commented Jan 8, 2025 •

edited

Loading

sylvaincom commented Jan 9, 2025

github-actions bot commented Jan 9, 2025 •

edited

Loading

github-actions bot commented Jan 9, 2025 •

edited

Loading

rouk1 commented Jan 9, 2025

sylvaincom commented Jan 9, 2025 •

edited

Loading

glemaitre Jan 9, 2025

sylvaincom Jan 9, 2025

sylvaincom Jan 9, 2025

rouk1 Jan 10, 2025

MarieS-WiMLDS Jan 10, 2025

glemaitre Jan 10, 2025

sylvaincom Jan 10, 2025 •

edited

Loading

MarieS-WiMLDS Jan 10, 2025

glemaitre Jan 10, 2025

sylvaincom Jan 10, 2025

sylvaincom commented Jan 10, 2025 •

edited by thomass-dev

Loading

feat: Use HTML to present warnings of train_test_split when in notebooks #1060

feat: Use HTML to present warnings of train_test_split when in notebooks #1060

Conversation

rouk1 commented Jan 8, 2025 • edited Loading

Why do this?

UI preview

github-actions bot commented Jan 8, 2025 • edited Loading

glemaitre commented Jan 8, 2025 • edited Loading

sylvaincom commented Jan 9, 2025

github-actions bot commented Jan 9, 2025 • edited Loading

github-actions bot commented Jan 9, 2025 • edited Loading

rouk1 commented Jan 9, 2025

vscode

jupyter

sylvaincom commented Jan 9, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sylvaincom Jan 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sylvaincom commented Jan 10, 2025 • edited by thomass-dev Loading

rouk1 commented Jan 8, 2025 •

edited

Loading

github-actions bot commented Jan 8, 2025 •

edited

Loading

glemaitre commented Jan 8, 2025 •

edited

Loading

github-actions bot commented Jan 9, 2025 •

edited

Loading

github-actions bot commented Jan 9, 2025 •

edited

Loading

sylvaincom commented Jan 9, 2025 •

edited

Loading

sylvaincom Jan 10, 2025 •

edited

Loading

sylvaincom commented Jan 10, 2025 •

edited by thomass-dev

Loading