Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial sklearnex support #102

Merged
merged 6 commits into from
Oct 6, 2023
Merged

Conversation

ethanglaser
Copy link
Contributor

@ethanglaser ethanglaser commented Oct 5, 2023

[please review the Contribution Guidelines prior to submitting your pull request. go ahead and delete this line if you've already reviewed said guidelines.]

What does this PR do?

Adds configs for and enables use of the Intel(R) Extension for Scikit-learn in tpot2.

How should this PR be tested?

Ensure backwards compatibility - make sure previous configurations run without issue.
Test usage of the config files both using TPOTEstimator and TPOTRegressor/TPOTClassifier with root_config_dict set to classifiers_sklearnex or regressors_sklearnex respectively.

Any background context you want to provide?

https://github.com/intel/scikit-learn-intelex/
Notebook with demonstration of speedup capabilities in tpot: https://colab.research.google.com/drive/1yeqbKg2Sv8affoHrOCsluunscjQnmqy5#scrollTo=h_TKDu-FnAFZ&line=5&uniqifier=1

What are the relevant issues?

#1316

Questions:

  • Should a requirements-optional.txt be created?
  • I would like to add an example, I see notebooks are primarily being used here which would be nice to demonstrate opportunities for speedups - any reservations about this?

@perib
Copy link
Collaborator

perib commented Oct 5, 2023

Thanks for the contribution! I did a quick test on my machine, and it works correctly.

Some thoughts:

  1. I think we can remove the use_sklearnex parameter. Instead, users can use root_config_dict='classifiers_sklearnex', which is already correctly working. This param might be confusing because the actual set of classifiers/regressors differs in the two configuration dictionaries. (I'm considering removing the "Auto" as an option for root_config_dict for consistency with the other methods and instead explicitly setting it to classifiers/regressors in the TPOTClassifier/TPOTRegressor. )

  2. I also noticed that xgboost was included even though it is not part of sklearnex. Perhaps this works for now, but when allow individual modules to be specified by string for config dicts. For example ["LogisticRegression", "KNN"] etc. #60 is addressed, perhaps we should separate it from the group. Users could then include it manually if they want.

@perib
Copy link
Collaborator

perib commented Oct 5, 2023

params_ElasticNet throws an error because log=True requires the search space to be >0.

The solution could be to either:

'alpha': 1 - trial.suggest_float(f'alpha_{name}', 0.0, 1.0),

or

'alpha': 1 - trial.suggest_float(f'alpha_{name}', 1e-4, 1.0, log=True),

@perib
Copy link
Collaborator

perib commented Oct 6, 2023

Realized I could simply make the commit. I think this is functional now and should be ready to merge

@perib perib requested a review from nickotto October 6, 2023 20:38
@nickotto nickotto changed the base branch from main to dev October 6, 2023 20:54
@nickotto nickotto marked this pull request as ready for review October 6, 2023 21:31
@nickotto nickotto merged commit 5ebb0ab into EpistasisLab:dev Oct 6, 2023
@ethanglaser
Copy link
Contributor Author

ethanglaser commented Oct 9, 2023

Thanks for the review and merge @perib @nickotto - a couple other things we were thinking:

  1. Addition of an example with use of sklearnex config - could be used to demonstrate potential for speedups over stock scikit-learn library (something along the lines of this notebook, but more concise: https://colab.research.google.com/drive/1yeqbKg2Sv8affoHrOCsluunscjQnmqy5?usp=sharing)
  2. Update to readme/docs mentioning option for sklearnex config usage
  3. Some sort of optional requirements listing sklearnex as optional dependency

Please let me know if any of these additions would be of interest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants