Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call predict_quantile by default if quantiles is set to True, list or np.ndarray #133

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

r3v1
Copy link

@r3v1 r3v1 commented Jan 4, 2022

When using RangerForestRegressor with quantiles=True in a parameter optimization software (i.e. tune-sklearn) in order to optimize probabilistic metrics like Continuous Ranked Probability Score (CRPS), it is required the model ot output the 2D tensor corresponding to the predict_quantiles method. However, when making CRPS a score metric with the sklearn API with make_score function, in a final step, it will call (always) the Ranger's predict method, so it is never going to predict quantiles in any way.

Here is a brief example of what I am trying to explain:

from sklearn.metrics import make_scorer
from skranger.ensemble import RangerForestRegressor
from tune_sklearn import TuneSearchCV
from solarforecastarbiter.metrics.probabilistic import continuous_ranked_probability_score as crps

param_dists = {
    'max_depth': (0, 50),
    'min_node_size': (10, 100),
    'n_estimators': (100, 1000),
    'split_rule': ['variance', 'extratrees', 'maxstat'],
}

m = RangerForestRegressor(quantiles=True)
gs = TuneSearchCV(m,
                  param_distributions=param_dists,
                  scoring=make_scorer(crps, greater_is_better=False),
)
gs.fit(X, y)  # Raise error: forecasts must be 2D arrays

I think the sklearn API is correct. To surpass this problem, I made some chages in skranger:

  • First, I initialize RangerForestRegressor with quantiles: Union[bool, list, np.ndarray]. If quantiles receives any variable of that type, it will be in quantile mode.
  • Second, if the model is in quantile mode, then it will call predict_quantile by default when predicting.

NOTE: Additional logic should be implemented if a non-quantile prediction is required and quantile mode is enabled.

@crflynn
Copy link
Owner

crflynn commented Jan 6, 2022

I see what you're doing and it makes sense. I'm wondering if we should just break out the quantile regression to a separate estimator. Does that make sense to do here?

FWIW R's grf does this and I followed this pattern when writing skgrf.

@crflynn
Copy link
Owner

crflynn commented Jan 6, 2022

Looks like builds are broken due to this bug in setuptools. Looks like a fix is in progress. pypa/setuptools#3002

@r3v1
Copy link
Author

r3v1 commented Jan 9, 2022

I'm wondering if we should just break out the quantile regression to a separate estimator. Does that make sense to do here?

Well, I wouldn't know what would be better, I think you know better the global structure of the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants