[QST] is there any feature to undersampling or oversampling like scikit-learn-contrib/imbalanced-learn #4362

rrfaria · 2021-11-14T04:52:47Z

I could not found anything from imbalance-learn
I'm using something like this

from imblearn.under_sampling import NearMiss
from imblearn.over_sampling import SMOTE
...
# x is the feature. y is classes
x_resampled, y_resampled = SMOTE().fit_resample(x, y)
# or for undersampling 
x_resampled, y_resampled = NearMiss().fit_resample(x, y)

But I would like to use cuml to speed up because with big amount of data it takes a lot of time

Is there any method I could use to do it?

beckernick · 2021-11-15T14:56:58Z

Hi @rrfaria . Today, there isn't a simple way to do this.

We're excited about this use case, as we've also seen that nuanced oversampling and undersampling on CPUs can be very time consuming.

We're currently working with the imbalanced-learn maintainers on a pull request that would allow you to use cuML estimators with imbalanced learn, like this:

from imblearn.over_sampling import SMOTE
...
nn = cuml.neighbors.NearestNeighbors()
x_resampled, y_resampled = SMOTE(k_neighbors=nn).fit_resample(x, y)

If accelerated imbalanced-learn is important for your work, it would be great if you could comment on this imbalanced-learn issue to indicate your interest in this effort.

rrfaria · 2021-11-16T11:23:13Z

Thank you so much
It will help a lot
let me know if I can help in something

github-actions · 2021-12-16T12:02:57Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions · 2022-03-16T12:07:10Z

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

antno1000 · 2022-03-31T16:06:11Z

any update?

beckernick · 2022-04-04T13:42:59Z

The relevant code has been merged into imbalanced-learn, so the code snippet above now works when using imbalanced-learn built from source. It's not yet available in pip/conda installations of imbalanced-learn, but will be in the next release.

Based on initial testing, it's possible to achieve large speedups on samplers as data sizes grow.

I'm going to close this issue. If you build imbalanced-learn from source and run into any issues using it with cuML, please feel free to re-open this issue.

rrfaria added ? - Needs Triage Need team to review and classify question Further information is requested labels Nov 14, 2021

github-actions bot added the inactive-30d label Dec 16, 2021

github-actions bot added the inactive-90d label Mar 16, 2022

github-actions bot removed inactive-90d inactive-30d labels Mar 31, 2022

beckernick closed this as completed Apr 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] is there any feature to undersampling or oversampling like scikit-learn-contrib/imbalanced-learn #4362

[QST] is there any feature to undersampling or oversampling like scikit-learn-contrib/imbalanced-learn #4362

rrfaria commented Nov 14, 2021

beckernick commented Nov 15, 2021 •

edited

Loading

rrfaria commented Nov 16, 2021

github-actions bot commented Dec 16, 2021

github-actions bot commented Mar 16, 2022

antno1000 commented Mar 31, 2022

beckernick commented Apr 4, 2022

[QST] is there any feature to undersampling or oversampling like scikit-learn-contrib/imbalanced-learn #4362

[QST] is there any feature to undersampling or oversampling like scikit-learn-contrib/imbalanced-learn #4362

Comments

rrfaria commented Nov 14, 2021

beckernick commented Nov 15, 2021 • edited Loading

rrfaria commented Nov 16, 2021

github-actions bot commented Dec 16, 2021

github-actions bot commented Mar 16, 2022

antno1000 commented Mar 31, 2022

beckernick commented Apr 4, 2022

beckernick commented Nov 15, 2021 •

edited

Loading