Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large dataset error #53

Open
ZeonlungPun opened this issue Jan 24, 2024 · 19 comments
Open

Large dataset error #53

ZeonlungPun opened this issue Jan 24, 2024 · 19 comments

Comments

@ZeonlungPun
Copy link

my feature number is 30000, it get an error :
Loss is 511581280.0
Did you normalize input?
Choosing lambda with cross-validation: 0%| | 0/5 [01:12<?, ?it/s]
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3553, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 3, in
path = model.fit( x, y)
File "/opt/conda/lib/python3.10/site-packages/lassonet/interfaces.py", line 744, in fit
self.path(X, y, return_state_dicts=False)
File "/opt/conda/lib/python3.10/site-packages/lassonet/interfaces.py", line 679, in path
path = super().path(
File "/opt/conda/lib/python3.10/site-packages/lassonet/interfaces.py", line 472, in path
last = self._train(
File "/opt/conda/lib/python3.10/site-packages/lassonet/interfaces.py", line 331, in _train
optimizer.step(closure)
File "/opt/conda/lib/python3.10/site-packages/torch/optim/optimizer.py", line 373, in wrapper
out = func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/optim/optimizer.py", line 76, in _use_grad
ret = func(self, *args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/optim/sgd.py", line 66, in step
loss = closure()
File "/opt/conda/lib/python3.10/site-packages/lassonet/interfaces.py", line 326, in closure
assert False
AssertionError

however,when the feature number is 1000, it would not get this error

@louisabraham
Copy link
Collaborator

This is because of a wrong condition I used in a previous version of pydivsufsort. I used to check loss == loss + 1 to detect infinite numbers instead of torch.isfinite(loss).
Could you test with the latest version I just uploaded to PyPI?

That being said, your loss still looks very large. Did you actually normalize inputs?

@ZeonlungPun
Copy link
Author

of course i have normalized inputs, and i use these codes :

from lassonet import LassoNetRegressorCV
model = LassoNetRegressorCV() # LassoNetRegressorCV
path = model.fit(X_train, y_train)
print("Best model scored", model.score(X_test, y_test))
print("Lambda =", model.best_lambda_)

however, my input's shape is (20000,30000)

@louisabraham
Copy link
Collaborator

The number of samples is irrelevant as the MSE has reduction="mean".

Did you test with the latest version?

@ZeonlungPun
Copy link
Author

yes, i have tried the latest version; at the begining, the loss is normal; when the new fitting begin , the loss will be explosive :
……
epoch: 850
loss: 0.017978345975279808
epoch: 851
loss: 0.017944464460015297
epoch: 852
loss: 0.0179106704890728
epoch: 853
loss: 0.017876965925097466
epoch: 854
loss: 0.017843332141637802
epoch: 855
loss: 0.017809787765145302
epoch: 0
loss: 0.017776312306523323
epoch: 1
loss: 5.919191360473633
epoch: 2
loss: 245.20724487304688
epoch: 3
loss: 37423.44140625
epoch: 4
loss: 10632257.0
Loss is 3204740096.0
Did you normalize input?
Loss: 3204740096.0
l2_regularization: 0.3105020225048065
l2_regularization_skip: 575.5364379882812

@louisabraham
Copy link
Collaborator

I think you are using an older version because the epoch: and loss: lines were removed from the previous version on PyPI. I just added some additional logging for the automatically selected value of lambda_start. Could you test again with:

pip install git+https://github.com/lasso-net/lassonet

and use verbose=2 as parameter?

@ZeonlungPun
Copy link
Author

i have follwed your tips:
image
but the same error happened:
image

@louisabraham
Copy link
Collaborator

Could you try to manually set lambda_start? To some larger value like 100.

@ZeonlungPun
Copy link
Author

same error happened …… i think maybe is something related to the huge shape of dataset , i have tested that when the shape is (2000,3000), all the thing normal

@louisabraham
Copy link
Collaborator

Can you post the logging output?

@ElrondL
Copy link

ElrondL commented Apr 28, 2024

Hey @louisabraham what else was changed in 0.0.15? After 0.0.15 LassoNetRegressor keeps returning 'None' for the lassoregressor model's state_dict, even though using the exact same settings 0.0.14 returns the model well. What were the updates between 14 and 15 in addition to the auto logging that could have caused this?

@ZeonlungPun
Copy link
Author

Loss is 15310032732160.0
Did you normalize input?
Traceback (most recent call last):
File "D:\anaconda\envs\newtorch\lib\site-packages\IPython\core\interactiveshell.py", line 3397, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in <cell line: 1>
runfile('D:/paper/npmcm2021d/read_select.py', wdir='D:/paper/npmcm2021d')
File "D:\pycharm\PyCharm Community Edition 2021.3.2\plugins\python-ce\helpers\pydev_pydev_bundle\pydev_umd.py", line 198, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "D:\pycharm\PyCharm Community Edition 2021.3.2\plugins\python-ce\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "D:/paper/npmcm2021d/read_select.py", line 30, in
path = model.path(X_train, y_train)
File "D:\anaconda\envs\newtorch\lib\site-packages\lassonet\interfaces.py", line 472, in path
last = self._train(
File "D:\anaconda\envs\newtorch\lib\site-packages\lassonet\interfaces.py", line 331, in _train
optimizer.step(closure)
File "D:\anaconda\envs\newtorch\lib\site-packages\torch\optim\optimizer.py", line 88, in wrapper
return func(*args, **kwargs)
File "D:\anaconda\envs\newtorch\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "D:\anaconda\envs\newtorch\lib\site-packages\torch\optim\sgd.py", line 120, in step
loss = closure()
File "D:\anaconda\envs\newtorch\lib\site-packages\lassonet\interfaces.py", line 326, in closure
assert False
AssertionError

@laladeegui
Copy link

I am having the same problem, my dataset shape is (74,201376), I also tried the suggestions you gave above (install the latest version, set lambda_start lager, verbose=2), but I still get the error.

model = LassoNetRegressorCV(lambda_start=500,verbose=2)
path = model.fit(x_train, y_train)
Choosing lambda with cross-validation: 0%| | 0/5 [00:00<?, ?it/s]
Initialized dense model
100 epochs, val_objective 9.32e+01, val_loss 9.32e+01, regularization 3.11e+02, l2_regularization 1.83e-01
Loss is nan
Did you normalize input?
Loss:: inf
l2_regularization: tensor(inf)
l2_regularization_skip: tensor(4.4145e+17)
Choosing lambda with cross-validation: 0%| | 0/5 [29:49<?, ?it/s]
AssertionError Traceback (most recent call last)
Cell In[8], line 5
2 x_train, x_test, y_train, y_test = train_test_split(x_sd,y_sd,train_size=0.8,random_state=1)
4 model = LassoNetRegressorCV(lambda_start=500,verbose=2)
----> 5 path = model.fit(x_train, y_train)

File D:\Anaconda\envs\lassonet8262\Lib\site-packages\lassonet\interfaces.py:935, in BaseLassoNetCV.fit(self, X, y)
926 def fit(
927 self,
928 X,
929 y,
930 ):
931 """Train the model.
932 Note that if lambda_ is not given, the trained model
933 will most likely not use any feature.
934 """
--> 935 self.path(X, y, return_state_dicts=False)
936 return self

File D:\Anaconda\envs\lassonet8262\Lib\site-packages\lassonet\interfaces.py:870, in BaseLassoNetCV.path(self, X, y, return_state_dicts)
867 split_lambdas.append(hist[-1].lambda_)
868 split_scores.append(model.score(X[test_index], y[test_index]))
--> 870 path = super().path(
871 X[train_index],
872 y[train_index],
873 return_state_dicts=False, # avoid memory cost
874 callback=callback,
875 )
876 self.raw_paths_.append(path)
878 # build final path

File D:\Anaconda\envs\lassonet8262\Lib\site-packages\lassonet\interfaces.py:471, in BaseLassoNet.path(self, X, y, X_val, y_val, lambda_seq, lambda_max, return_state_dicts, callback, disable_lambda_warning)
469 if self.model.selected_count() == 0:
470 break
--> 471 last = self.train(
472 X_train,
473 y_train,
474 X_val,
475 y_val,
476 batch_size=self.batch_size,
477 lambda
=current_lambda,
478 epochs=self.n_iters_path,
479 optimizer=optimizer,
480 patience=self.patience_path,
481 return_state_dict=return_state_dicts,
482 )
483 if is_dense and self.model.selected_count() < X_train.shape[1]:
484 is_dense = False

File D:\Anaconda\envs\lassonet8262\Lib\site-packages\lassonet\interfaces.py:317, in BaseLassoNet.train(self, X_train, y_train, X_val, y_val, batch_size, epochs, lambda, optimizer, return_state_dict, patience)
314 loss += ans.item() * batch_size / n_train
315 return ans
--> 317 optimizer.step(closure)
318 model.prox(
319 lambda_=lambda_ * optimizer.param_groups[0]["lr"],
320 M=self.M,
321 )
323 if epoch == 0:
324 # fallback to running loss of first epoch

File D:\Anaconda\envs\lassonet8262\Lib\site-packages\torch\optim\optimizer.py:484, in Optimizer.profile_hook_step..wrapper(*args, **kwargs)
479 else:
480 raise RuntimeError(
481 f"{func} must return None or a tuple of (new_args, new_kwargs), but got {result}."
482 )
--> 484 out = func(*args, **kwargs)
485 self._optimizer_step_code()
487 # call optimizer step post hooks

File D:\Anaconda\envs\lassonet8262\Lib\site-packages\torch\optim\optimizer.py:89, in _use_grad_for_differentiable.._use_grad(self, *args, **kwargs)
87 torch.set_grad_enabled(self.defaults["differentiable"])
88 torch._dynamo.graph_break()
---> 89 ret = func(self, *args, **kwargs)
90 finally:
91 torch._dynamo.graph_break()

File D:\Anaconda\envs\lassonet8262\Lib\site-packages\torch\optim\sgd.py:112, in SGD.step(self, closure)
110 if closure is not None:
111 with torch.enable_grad():
--> 112 loss = closure()
114 for group in self.param_groups:
115 params: List[Tensor] = []

File D:\Anaconda\envs\lassonet8262\Lib\site-packages\lassonet\interfaces.py:312, in BaseLassoNet._train..closure()
304 print(
305 "l2_regularization:",
306 model.l2_regularization(),
307 )
308 print(
309 "l2_regularization_skip:",
310 model.l2_regularization_skip(),
311 )
--> 312 assert False
313 ans.backward()
314 loss += ans.item() * batch_size / n_train

AssertionError:

@louisabraham
Copy link
Collaborator

louisabraham commented Aug 28, 2024

are you able to share this dataset? or reproduce on a public dataset?

Also, just a hinch but is your data in float64?

@laladeegui
Copy link

laladeegui commented Aug 29, 2024 via email

@louisabraham
Copy link
Collaborator

louisabraham commented Aug 29, 2024

thank you very much, do you have maybe a MRE?

How do you get 201376 samples?

@Sharayu0711
Copy link

Sharayu0711 commented Nov 14, 2024

Hi @louisabraham,

I am facing the same problem, my dataset shape is (149,9000), I also tried the suggestions you gave above (install the latest version.
model = LassoNetRegressorCV(hidden_dims=(128, 32), lambda_start=0.050, path_multiplier=1.00, random_state=42, verbose=2)
model.fit(X_train, y_train)

But getting the same error given below,

Loss is inf
Did you normalize input?
Loss:: inf
l2_regularization: tensor(1.4971e+25)
l2_regularization_skip: tensor(1.5108e+15)

AssertionError Traceback (most recent call last)
Cell In[103], line 1
----> 1 model.fit(X, y)

File ~//.venv/lib64/python3.9/site-packages/lassonet/interfaces.py:231, in BaseLassoNet.fit(self, X, y, X_val, y_val, dense_only)
225 """Train the model.
226 Note that if lambda_ is not given, the trained model
227 will most likely not use any feature.
228 If dense_only is True, will only train a dense model.
229 """
230 lambda_seq = [] if dense_only else None
--> 231 self.path_ = self.path(
232 X,
233 y,
234 X_val=X_val,
235 y_val=y_val,
236 return_state_dicts=False,
237 lambda_seq=lambda_seq,
238 )
239 return self

File hon3.9/site-packages/lassonet/interfaces.py:471, in BaseLassoNet.path(self, X, y, X_val, y_val, lambda_seq, lambda_max, return_state_dicts, callback, disable_lambda_warning)
469 if self.model.selected_count() == 0:
470 break
--> 471 last = self.train(
472 X_train,
473 y_train,
474 X_val,
475 y_val,
476 batch_size=self.batch_size,
477 lambda
=current_lambda,
478 epochs=self.n_iters_path,
479 optimizer=optimizer,
480 patience=self.patience_path,
481 return_state_dict=return_state_dicts,
482 )
483 if is_dense and self.model.selected_count() < X_train.shape[1]:
484 is_dense = False

File ~//.venv/lib64/python3.9/site-packages/lassonet/interfaces.py:317, in BaseLassoNet.train(self, X_train, y_train, X_val, y_val, batch_size, epochs, lambda, optimizer, return_state_dict, patience)
314 loss += ans.item() * batch_size / n_train
315 return ans
--> 317 optimizer.step(closure)
318 model.prox(
319 lambda_=lambda_ * optimizer.param_groups[0]["lr"],
320 M=self.M,
321 )
323 if epoch == 0:
324 # fallback to running loss of first epoch

File ~/.venv/lib64/python3.9/site-packages/torch/optim/optimizer.py:487, in Optimizer.profile_hook_step..wrapper(*args, **kwargs)
482 else:
483 raise RuntimeError(
484 f"{func} must return None or a tuple of (new_args, new_kwargs), but got {result}."
485 )
--> 487 out = func(*args, **kwargs)
488 self._optimizer_step_code()
490 # call optimizer step post hooks

File ~/.venv/lib64/python3.9/site-packages/torch/optim/optimizer.py:91, in _use_grad_for_differentiable.._use_grad(self, *args, **kwargs)
89 torch.set_grad_enabled(self.defaults["differentiable"])
90 torch._dynamo.graph_break()
---> 91 ret = func(self, *args, **kwargs)
92 finally:
93 torch._dynamo.graph_break()

File ~/.venv/lib64/python3.9/site-packages/torch/optim/sgd.py:112, in SGD.step(self, closure)
110 if closure is not None:
111 with torch.enable_grad():
--> 112 loss = closure()
114 for group in self.param_groups:
115 params: List[Tensor] = []

File ~/venv/lib64/python3.9/site-packages/lassonet/interfaces.py:312, in BaseLassoNet._train..closure()
304 print(
305 "l2_regularization:",
306 model.l2_regularization(),
307 )
308 print(
309 "l2_regularization_skip:",
310 model.l2_regularization_skip(),
311 )
--> 312 assert False
313 ans.backward()
314 loss += ans.item() * batch_size / n_train

AssertionError:

@louisabraham
Copy link
Collaborator

would you be able to share a (possibly mock) dataset where I can reproduce the error?

@louisabraham
Copy link
Collaborator

just give me a code I can run with the data and I should be able to fix the issue.

@Sharayu0711
Copy link

Sorry, I can't share the dataset and the code due to security reasons.
Could you please help me understand why this "AssertionError:" is occurring?

Also, for smaller dataset(149,5370), after changing lambda_start_ and path_multiplier values, it is working but taking so long to train the model, like 12+ hours and still running. Could you please tell if this model normally takes that much time or it's abnormal in my case?
For reference,
device = torch.device("cuda:3" if torch.cuda.is_available() else "cpu")
model = LassoNetRegressorCV(hidden_dims=(128, 32), lambda_start=0.050, path_multiplier=1.00, random_state=42, verbose=2, device=device)
model.fit(X_train, y_train)

And what values do you suggest of "hidden_dims", "lambda_start_" and "path_multiplier" will be ideal to train the dataset with (149, 9238)?

Thank You !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants