Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Colab issue with 01_the_machine_learning_landscape.ipynb #179

Open
olliefr opened this issue Jan 11, 2025 · 1 comment
Open

[BUG] Colab issue with 01_the_machine_learning_landscape.ipynb #179

olliefr opened this issue Jan 11, 2025 · 1 comment

Comments

@olliefr
Copy link

olliefr commented Jan 11, 2025

Describe the bug

The first notebook throws an error in Colab. I think this is due to a more recent sklearn version used there.

I researched the issue and found a solution but am not sure how it should be implemented to make sure that it works across all supported environments.

To Reproduce
Run the first notebook in Colab.

File name: 01_the_machine_learning_landscape.ipynb

The offending line of code is:

t0ridge, t1ridge = ridge.intercept_[0], ridge.coef_[0][0]

The error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
[<ipython-input-42-51c3ff846895>](https://localhost:8080/#) in <cell line: 13>()
     11 y_sample = country_stats[[lifesat_col]]
     12 ridge.fit(X_sample, y_sample)
---> 13 t0ridge, t1ridge = ridge.intercept_[0], ridge.coef_[0][0]
     14 plt.plot(X, t0ridge + t1ridge * X, "b--",
     15          label="Regularized linear model on partial data")

IndexError: invalid index to scalar variable.

To make this code work in Colab I had to replace 2D indexing with 1D for ridge.coef_:

t0ridge, t1ridge = ridge.intercept_[0], ridge.coef_[0]

Then everything worked as expected.

Versions
Colab with the default provisioned runtime:

  • OS: N/A
  • Python: 3.10.12
  • TensorFlow: N/A
  • Scikit-Learn: 1.6.0
  • Numpy: 1.26.4

Additional context

A cursory research of issues in sklearn GitHub and some experimentation has led me to conclude that LinearRegression() and Ridge handle coef_ dimensions differently in the recent versions of sklearn.

Here's a Python example demonstrating the difference:

import sys
print(sys.version_info)
# Colab outputs: sys.version_info(major=3, minor=10, micro=12, releaselevel='final', serial=0)

import numpy as np
print(np.__version__) # Colab outputs: 1.26.4

import sklearn
print(sklearn.__version__) # Colab outputs: 1.6.0

X = np.array([1.0, 2.0, 3.0]).reshape(-1, 1)
y = np.array([2.0, 4.0, 6.0]).reshape(-1, 1)

print(X.shape, y.shape) # Expected output: (3, 1) (3, 1)

from sklearn import linear_model

linX = linear_model.LinearRegression()
linX.fit(X, y)

print(linX.coef_.shape) # outputs (1,1)

ridgeX = linear_model.Ridge(alpha=10**9.5)
ridgeX.fit(X, y)

print(ridgeX.coef_.shape) # outputs (1,)

Clearly, (1,1) is not the same as (1,). So it seems that when the same input containing a single target is given to both LinearRegression() and Ridge, they handle the coef_ dimensions differently.

I understand that the repo provides an Anaconda environment definition in environment.yml as well as the requirements.txt file containing the right package versions but to the best of my knowledge this information cannot be used to modify the environment in Colab easily, and since Colab is the environment recommended in the README file I thought it might be worth to mention this issue here.

@olliefr
Copy link
Author

olliefr commented Jan 11, 2025

I also filed an issue in scikit-learn/scikit-learn#30624 repo to confirm whether this difference in behaviour between LinearRegression and Ridge classes is a feature and not a bug.

Thinking logically, the values of your y parameters are 2D, so I would expect Ridge.coef_ to be 2D as well (with n_targets = 1, of course) but this is not what happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant