Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Use HTML to present warnings of train_test_split when in notebooks #1060

Closed
wants to merge 6 commits into from

Conversation

rouk1
Copy link
Contributor

@rouk1 rouk1 commented Jan 8, 2025

This PR modifies the behavior of train_test_split.
From now on, in a notebook context (vscode or jupyter), messages will no longer be warnings but HTML, in order to avoid horizontal scrolling.

Why do this?

The train_test_split feature is dedicated to an experimentation phase and has an educational purpose. We anticipate that most of our users will use notebooks or an interactive context. Some will disable warnings...
Therefore, I believe that warning messages are not the right communication medium during the experimentation phase. Additionally, it's not very visually appealing. While waiting to create a proper TrainTestSplitReport, this PR proposes an alternative.

UI preview

jupyter.mp4
vscode.mp4

@rouk1 rouk1 changed the title feat: Use HTML to present warnings of train_test_split when in vscode… feat: Use HTML to present warnings of train_test_split when in notebooks Jan 8, 2025
Copy link
Contributor

github-actions bot commented Jan 8, 2025

Coverage

pytest coverage report
FileStmtsMissCoverMissing
src/skore
   __init__.py120100% 
   __main__.py811 80%
   exceptions.py30100% 
src/skore/cli
   __init__.py50100% 
   cli.py330100% 
   color_format.py4332 90%
   launch_dashboard.py26150 39%
   quickstart_command.py140100% 
src/skore/externals
   __init__.py00100% 
   _sklearn_compat.py2201834 15%
src/skore/item
   __init__.py210100% 
   cross_validation_item.py137102 93%
   item.py41130 68%
   item_repository.py4221 93%
   media_item.py7041 94%
   numpy_array_item.py2511 93%
   pandas_dataframe_item.py3411 95%
   pandas_series_item.py3411 95%
   polars_dataframe_item.py3211 94%
   polars_series_item.py2711 94%
   primitive_item.py2721 92%
   sklearn_base_estimator_item.py3311 95%
   skrub_table_report_item.py1011 86%
src/skore/persistence
   __init__.py00100% 
   abstract_storage.py2210 95%
   disk_cache_storage.py3311 95%
   in_memory_storage.py200100% 
src/skore/project
   __init__.py30100% 
   create.py5280 88%
   load.py2330 89%
   open.py140100% 
   project.py6444 91%
src/skore/sklearn
   __init__.py30100% 
   find_ml_task.py3532 89%
   types.py20100% 
src/skore/sklearn/cross_validation
   __init__.py20100% 
   cross_validation_helpers.py4741 90%
   cross_validation_reporter.py3511 95%
src/skore/sklearn/cross_validation/plots
   __init__.py00100% 
   compare_scores_plot.py2912 92%
   timing_plot.py2911 94%
src/skore/sklearn/train_test_split
   __init__.py00100% 
   train_test_split.py4892 81%
src/skore/sklearn/train_test_split/warning
   __init__.py80100% 
   high_class_imbalance_too_few_examples_warning.py1732 78%
   high_class_imbalance_warning.py1821 88%
   random_state_unset_warning.py1111 87%
   shuffle_true_warning.py901 91%
   stratify_is_set_warning.py1111 87%
   time_based_column_warning.py2212 89%
   train_test_split_warning.py510 80%
src/skore/ui
   __init__.py00100% 
   app.py2552 71%
   dependencies.py710 86%
   project_routes.py500100% 
src/skore/utils
   __init__.py00100% 
   _environment.py26101 50%
   _logger.py2140 84%
   _show_versions.py310100% 
src/skore/view
   __init__.py00100% 
   view.py50100% 
   view_repository.py1621 83%
TOTAL164030780% 

Tests Skipped Failures Errors Time
259 0 💤 0 ❌ 0 🔥 34.873s ⏱️

@glemaitre
Copy link
Member

glemaitre commented Jan 8, 2025

It might be fine temporary. My short term vision is that those warnings should be collected and added the EstimatorReport and accessed from an accessor such that the user is pulling the information and we don't throw stuff to him/her.

@rouk1 rouk1 marked this pull request as ready for review January 9, 2025 10:33
@sylvaincom
Copy link
Contributor

Thanks, could we also display the title of the warning such as ShuffleTrueWarning? To kind of have a key takeaway for the user

Copy link
Contributor

github-actions bot commented Jan 9, 2025

Coverage

Coverage Report for backend
FileStmtsMissCoverMissing
venv/lib/python3.12/site-packages/skore
   __init__.py120100% 
   __main__.py8180%19
   exceptions.py30100% 
venv/lib/python3.12/site-packages/skore/cli
   __init__.py50100% 
   cli.py33385%104, 111, 117
   color_format.py43390%35–>40, 41–43
   launch_dashboard.py261539%36–57
   quickstart_command.py14750%37–51
venv/lib/python3.12/site-packages/skore/item
   __init__.py210100% 
   cross_validation_item.py1371093%27–42, 370
   item.py411368%85, 88, 92–112
   item_repository.py42293%12–13
   media_item.py70494%15–18
   numpy_array_item.py25193%15
   pandas_dataframe_item.py34195%15
   pandas_series_item.py34195%15
   polars_dataframe_item.py32194%15
   polars_series_item.py27194%15
   primitive_item.py27292%13–15
   sklearn_base_estimator_item.py33195%15
   skrub_table_report_item.py10186%11
venv/lib/python3.12/site-packages/skore/persistence
   __init__.py00100% 
   abstract_storage.py22195%130
   disk_cache_storage.py33195%44
   in_memory_storage.py200100% 
venv/lib/python3.12/site-packages/skore/project
   __init__.py30100% 
   create.py52888%116–122, 132–133, 140–141
   load.py23389%43–45
   open.py140100% 
   project.py64491%135, 149, 183, 187
venv/lib/python3.12/site-packages/skore/sklearn
   __init__.py40100% 
   find_ml_task.py35195%41–>49, 50
   types.py20100% 
venv/lib/python3.12/site-packages/skore/sklearn/_estimator
   __init__.py100100% 
   base.py76298%87–88
   metrics_accessor.py198298%131, 266
   report.py165197%145–>151, 147–>149, 150, 153–>155, 159–>163, 408–>413
   utils.py11110%1–19
venv/lib/python3.12/site-packages/skore/sklearn/_plot
   __init__.py40100% 
   precision_recall_curve.py126297%200–>203, 313–314
   prediction_error.py75099%289–>297
   roc_curve.py95394%156, 167–>170, 223–224
   utils.py770100% 
venv/lib/python3.12/site-packages/skore/sklearn/cross_validation
   __init__.py20100% 
   cross_validation_helpers.py47490%104–>136, 123–126
   cross_validation_reporter.py35195%177
venv/lib/python3.12/site-packages/skore/sklearn/cross_validation/plots
   __init__.py00100% 
   compare_scores_plot.py29192%10, 45–>48
   timing_plot.py29194%10
venv/lib/python3.12/site-packages/skore/sklearn/train_test_split
   __init__.py00100% 
   train_test_split.py49982%18–19, 215–234
venv/lib/python3.12/site-packages/skore/sklearn/train_test_split/warning
   __init__.py80100% 
   high_class_imbalance_too_few_examples_warning.py17378%16–18, 80
   high_class_imbalance_warning.py18288%16–18
   random_state_unset_warning.py11187%15
   shuffle_true_warning.py9091%44–>exit
   stratify_is_set_warning.py11187%15
   time_based_column_warning.py22189%17, 69–>exit
   train_test_split_warning.py5180%21
venv/lib/python3.12/site-packages/skore/ui
   __init__.py00100% 
   app.py25571%24, 53–58
   dependencies.py7186%12
   project_routes.py500100% 
venv/lib/python3.12/site-packages/skore/utils
   __init__.py00100% 
   _accessor.py70100% 
   _environment.py261050%24–30, 36–40
   _logger.py21484%14–18
   _show_versions.py310100% 
venv/lib/python3.12/site-packages/skore/view
   __init__.py00100% 
   view.py50100% 
   view_repository.py16283%8–9
TOTAL226615392% 

Tests Skipped Failures Errors Time
349 0 💤 0 ❌ 0 🔥 44.338s ⏱️

Copy link
Contributor

github-actions bot commented Jan 9, 2025

Documentation preview @ a7bf2d3

@rouk1
Copy link
Contributor Author

rouk1 commented Jan 9, 2025

Thanks, could we also display the title of the warning such as ShuffleTrueWarning? To kind of have a key takeaway for the user

What do you think ?

vscode

Screenshot 2025-01-09 at 18 09 49

jupyter

Screenshot 2025-01-09 at 18 14 56

@sylvaincom
Copy link
Contributor

sylvaincom commented Jan 9, 2025

Thanks! LGTM

What if there are several warnings?

@sylvaincom sylvaincom self-requested a review January 9, 2025 18:29
Comment on lines +214 to 237
if is_environment_html_capable():
with contextlib.suppress(ImportError):
from IPython.core.interactiveshell import InteractiveShell
from IPython.display import HTML, display
from markdown import markdown

if InteractiveShell.initialized():
markup = "".join(
[
HTML_WARNING_TEMPLATE.format(
warning=markdown(warning),
warning_class=re.sub(
"([A-Z][a-z]+)",
r" \1",
re.sub("([A-Z]+)", r" \1", warning_class.__name__),
).lstrip(),
)
for warning, warning_class in to_display
]
)
display(HTML(markup))
else:
for warning, warning_class in to_display:
warnings.warn(message=warning, category=warning_class, stacklevel=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative that would work in both HTML and in prompt would be to leverage rich:

from skore import console  # avoid circular import
    from rich.panel import Panel

    for warning_class in TRAIN_TEST_SPLIT_WARNINGS:
        warning = warning_class.check(**kwargs)

        if warning is not None:
            # Only check if warning is filtered/ignored
            if not warnings.filters or not any(
                f[0] == "ignore" and f[2] == warning_class for f in warnings.filters
            ):
                console.print(
                    Panel(
                        title=warning_class.__name__,
                        renderable=warning,
                        style="orange1",
                        border_style="cyan",
                    )
                )

It would provide the following outputs:

image

And here it is configure such that someone filter the warnings with ignore, then we don't show it.

It might be a bit less invasive than checking for the environment (while this code could be useful for some other stuff along the road).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using rich everywhere does provide coherence (with the EstimatorReport of #997)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can rich display False with code style for example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can rich display False with code style for example?

Yes rich is capable to display markdown.

I hadn't considered using pure textual, it's cool and lightweight. Only down side is that we cant match hosting styles.
I'm happy to refactor that way. @MarieS-WiMLDS can you please settle this ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by matching hosting style? Is it light/dark theme, or is it about the font?
I really don't mind if we don't have the default font of the user, it's more troublesome if it's about the theme, because it might affect readability.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An note that we have a similar limitation with the logging when creating project, etc. where we use rich.

Copy link
Contributor

@sylvaincom sylvaincom Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your insights, anyway an orange background seems anti-UX, let's move on with rich. Should we close this PR and open a new one for rich? How do you wish to proceed with rich?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Sylvain, very unlikely for a user to have a orange background!
The "we can do it later" stuff tends to be never done, especially when it's about having clean code, so I'd prefer that we go for rich before merging, if it's compatible with your deadline of tuesday evening.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cf. #1086

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're the best ⚡️

@sylvaincom
Copy link
Contributor

sylvaincom commented Jan 10, 2025

Superseded by #1060 #1086

@sylvaincom sylvaincom closed this Jan 10, 2025
sylvaincom pushed a commit that referenced this pull request Jan 10, 2025
closes #1060 

It is the alternative to #1060 using `rich`. I added a test to check
that we can filter the warning since we are not using the usual
`warnings` module.

In the future, we could factor out the code in a utils to be sure that
we can also transform the warnings into error.
@rouk1 rouk1 deleted the train-test-split-html-warnings branch January 13, 2025 09:34
thomass-dev pushed a commit that referenced this pull request Jan 13, 2025
closes #1060 

It is the alternative to #1060 using `rich`. I added a test to check
that we can filter the warning since we are not using the usual
`warnings` module.

In the future, we could factor out the code in a utils to be sure that
we can also transform the warnings into error.
rouk1 pushed a commit that referenced this pull request Jan 14, 2025
closes #1060 

It is the alternative to #1060 using `rich`. I added a test to check
that we can filter the warning since we are not using the usual
`warnings` module.

In the future, we could factor out the code in a utils to be sure that
we can also transform the warnings into error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants