FIX roc_auc_curve: Return np.nan instead of 0.0 for single class #30103

janezd · 2024-10-18T18:12:53Z

Reference Issues/PRs

Fixes #30079.

What does this implement/fix? Explain your changes.

As discussed in #30079, it may be more appropriate to return np.nan when all data comes from a single class.

github-actions · 2024-10-18T18:14:20Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 9ea9001. Link to the linter CI: here}

adrinjalali

Otherwise LGTM.

doc/whats_new/upcoming_changes/sklearn.metrics/27412.fix.rst

glevv · 2024-10-19T08:13:22Z

sklearn/metrics/tests/test_common.py

                with pytest.warns(UndefinedMetricWarning):
-                    assert metric(y1_row, y2_row) == pytest.approx(0.0)
+                    assert np.isnan(metric(y1_row, y2_row))


math.isnan is preferable when working with numbers

glevv · 2024-10-19T08:14:01Z

doc/whats_new/upcoming_changes/sklearn.metrics/27412.fix.rst

@@ -1,3 +1,3 @@
- :func:`metrics.roc_auc_score` will now correctly return 0.0 and
+- :func:`metrics.roc_auc_score` will now correctly return np.nan and
  warn user if only one class is present in the labels.
  By :user:`Gleb Levitski <glevv>`


Almost sure that you should add yourself to the changelog

My "contribution" is rather trivial, but OK, I've added myself.

glevv · 2024-10-19T08:14:23Z

sklearn/metrics/tests/test_ranking.py

@@ -388,7 +389,8 @@ def test_roc_curve_toydata():
        "ROC AUC score is not defined in that case."
    )
    with pytest.warns(UndefinedMetricWarning, match=expected_message):
-        roc_auc_score(y_true, y_score)
+        auc = roc_auc_score(y_true, y_score)
+    assert_almost_equal(auc, np.nan)


same as above math.isnan is preferable

Missed one on 375th line

Thanks. I amended the last commit.

… changelog

adrinjalali

otherwise LGTM

adrinjalali · 2024-10-23T06:00:41Z

sklearn/metrics/tests/test_ranking.py

@@ -370,7 +371,8 @@ def test_roc_curve_toydata():
        "ROC AUC score is not defined in that case."
    )
    with pytest.warns(UndefinedMetricWarning, match=expected_message):
-        roc_auc_score(y_true, y_score)
+        auc = roc_auc_score(y_true, y_score)
+    assert_almost_equal(auc, np.nan)


same here with math.isnan

adrinjalali · 2024-10-23T09:46:01Z

doc/whats_new/upcoming_changes/sklearn.metrics/27412.fix.rst

  warn user if only one class is present in the labels.
-  By :user:`Gleb Levitski <glevv>`
+  By :user:`Gleb Levitski <glevv>` and :user:`Janez Demšar <janezd>`


@lesteve this PR number is not gonna show up here. What do we do?

I think we can duplicate the changelog entry into two files with different PR numbers in their name but the same contents.

Then both entries will be merged by towncrier when we aggregate the changelog of a given release.

Yeah that's the towncrier's way of doing it, creating two files with same content that only differ by the PR number in the filename.

I originally thought it was a hidden towncrier feature but it is mentioned in the tutorial

$ towncrier create --content 'Can also be ``rst`` as well!' 3456.doc.rst # You can associate multiple issue numbers with a news fragment by giving them the same contents. $ towncrier create --content 'Can also be ``rst`` as well!' 7890.doc.rst

I discovered this by looking at the towncrier source code. Note that it only works if both fragments have the same type (fix in this case) IIRC.

There is twisted/towncrier#599 to try to make it a bit more convenient.

@glemaitre you've enabled automerge before resolving this though.

I duplicated the changelog.

ogrisel

LGTM as well once the review comments have all been taken care of.

ogrisel · 2024-10-23T10:15:02Z

sklearn/metrics/_ranking.py

@@ -375,12 +375,11 @@ def _binary_roc_auc_score(y_true, y_score, sample_weight=None, max_fpr=None):
        warnings.warn(
            (
                "Only one class is present in y_true. ROC AUC score "
-                "is not defined in that case. The score is set to "
-                "0.0."
+                "is not defined in that case."


It would be helpful to store np.unique(y_true) in a local variable and then display the value of that class label in the warning message.

To my understanding and exploration of the code, y_true can already be swapped at this point, so the warning would be misleading. (See #30079 (comment)).

I would prefer to keep this as it is, but if anybody wants to change it - go ahead. :)

I think the idea here is different: we are not going to say that it is a positive or a negative class, we will just show the value that we found, i.e

f"Only one class is present in y_true: {y_true[0]!r}. " f"ROC AUC score is not defined in that case."

Or something like that.

Please let us stop this.

As I wrote, classes are renumerated. I know next to nothing about sklearn's code, but if caller gives y_true=[5, 2, 5] to roc_auc_score, the y_true that is passed to this internal function will be [1, 0, 1]. If caller gives [1, 1, 1], this function gets [0, 0, 0]. Any y_true composed of equal values is renumerated to y_true's composed of 0s.

If I change the function to

def _binary_roc_auc_score(y_true, y_score, sample_weight=None, max_fpr=None): if len(np.unique(y_true)) != 2: print("all values equal", y_true[0]) return

I get the following

>>> import sklearn.metrics sklearn.metrics.roc_auc_score([0, 0, 0], [0.3, 0.4, 0.5]) all values equal 0 >>> sklearn.metrics.roc_auc_score([1, 1, 1], [0.3, 0.4, 0.5]) all values equal 0 >>> sklearn.metrics.roc_auc_score([5, 5, 5], [0.3, 0.4, 0.5]) all values equal 0

To my possibly incomplete understanding of the code, len(np.unique(y_true)) != 2 may be equivalent to not np.any(y_true).

This function cannot give a more descriptive warning, because it doesn't have sufficient information for it. A better warning could be given by the function that calls it, but it would require a big refactoring because a (curried) _binary_roc_auc_score is given as an argument to _average_binary_score.

Changing a single 0.0 into np.nan and adding two lines of tests has taken me several hours spread across five days, mostly because of these warnings that could, imho, stay as they were. Not a great experience that I would want to repeat. :)

ogrisel · 2024-10-23T10:17:23Z

doc/whats_new/upcoming_changes/sklearn.metrics/27412.fix.rst

  warn user if only one class is present in the labels.
-  By :user:`Gleb Levitski <glevv>`
+  By :user:`Gleb Levitski <glevv>` and :user:`Janez Demšar <janezd>`


I think we can duplicate the changelog entry into two files with different PR numbers in their name but the same contents.

Then both entries will be merged by towncrier when we aggregate the changelog of a given release.

sklearn/metrics/tests/test_ranking.py

glemaitre

LGTM.

sklearn/metrics/tests/test_ranking.py

roc_auc_curve: Return np.nan instead of 0.0 for single class

0637f13

github-actions bot added the module:metrics label Oct 18, 2024

adrinjalali reviewed Oct 21, 2024

View reviewed changes

doc/whats_new/upcoming_changes/sklearn.metrics/27412.fix.rst Outdated Show resolved Hide resolved

glevv reviewed Oct 21, 2024

View reviewed changes

roc_auc_curve: Use math.isnan instead of np.isnan; add attribution to…

da2809d

… changelog

adrinjalali approved these changes Oct 23, 2024

View reviewed changes

adrinjalali added this to the 1.6 milestone Oct 23, 2024

adrinjalali reviewed Oct 23, 2024

View reviewed changes

ogrisel approved these changes Oct 23, 2024

View reviewed changes

markotoplak mentioned this pull request Oct 25, 2024

Test and Score: Adapt test to change in sklearn biolab/orange3#6914

Merged

3 tasks

glemaitre self-requested a review October 29, 2024 10:20

glemaitre added the No Changelog Needed label Oct 29, 2024

glemaitre reviewed Oct 29, 2024

View reviewed changes

sklearn/metrics/tests/test_ranking.py Outdated Show resolved Hide resolved

Update sklearn/metrics/tests/test_ranking.py

ef8fdba

glemaitre approved these changes Oct 29, 2024

View reviewed changes

glemaitre enabled auto-merge (squash) October 29, 2024 10:29

glemaitre reviewed Oct 29, 2024

View reviewed changes

sklearn/metrics/tests/test_ranking.py Outdated Show resolved Hide resolved

Update sklearn/metrics/tests/test_ranking.py

4d33b35

glemaitre disabled auto-merge October 29, 2024 10:57

glemaitre self-requested a review October 29, 2024 16:14

duplicate entry changelog

9ea9001

glemaitre enabled auto-merge (squash) October 29, 2024 16:26

glemaitre merged commit fff920e into scikit-learn:main Oct 29, 2024
28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX roc_auc_curve: Return np.nan instead of 0.0 for single class #30103

FIX roc_auc_curve: Return np.nan instead of 0.0 for single class #30103

janezd commented Oct 18, 2024

github-actions bot commented Oct 18, 2024 •

edited

Loading

adrinjalali left a comment

glevv Oct 19, 2024

glevv Oct 19, 2024

janezd Oct 22, 2024

glevv Oct 19, 2024

glevv Oct 22, 2024

janezd Oct 23, 2024

adrinjalali left a comment

adrinjalali Oct 23, 2024

adrinjalali Oct 23, 2024

ogrisel Oct 23, 2024

lesteve Oct 23, 2024 •

edited

Loading

adrinjalali Oct 29, 2024

glemaitre Oct 29, 2024

ogrisel left a comment

ogrisel Oct 23, 2024

janezd Oct 23, 2024

glevv Oct 23, 2024

janezd Oct 23, 2024 •

edited

Loading

ogrisel Oct 23, 2024

glemaitre left a comment

FIX roc_auc_curve: Return np.nan instead of 0.0 for single class #30103

FIX roc_auc_curve: Return np.nan instead of 0.0 for single class #30103

Conversation

janezd commented Oct 18, 2024

Reference Issues/PRs

What does this implement/fix? Explain your changes.

github-actions bot commented Oct 18, 2024 • edited Loading

✔️ Linting Passed

adrinjalali left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrinjalali left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lesteve Oct 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ogrisel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janezd Oct 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glemaitre left a comment

Choose a reason for hiding this comment

github-actions bot commented Oct 18, 2024 •

edited

Loading

lesteve Oct 23, 2024 •

edited

Loading

janezd Oct 23, 2024 •

edited

Loading