Call `predict_quantile` by default if `quantiles` is set to `True`, `list` or `np.ndarray` #133

r3v1 · 2022-01-04T15:59:53Z

When using RangerForestRegressor with quantiles=True in a parameter optimization software (i.e. tune-sklearn) in order to optimize probabilistic metrics like Continuous Ranked Probability Score (CRPS), it is required the model ot output the 2D tensor corresponding to the predict_quantiles method. However, when making CRPS a score metric with the sklearn API with make_score function, in a final step, it will call (always) the Ranger's predict method, so it is never going to predict quantiles in any way.

Here is a brief example of what I am trying to explain:

from sklearn.metrics import make_scorer
from skranger.ensemble import RangerForestRegressor
from tune_sklearn import TuneSearchCV
from solarforecastarbiter.metrics.probabilistic import continuous_ranked_probability_score as crps

param_dists = {
    'max_depth': (0, 50),
    'min_node_size': (10, 100),
    'n_estimators': (100, 1000),
    'split_rule': ['variance', 'extratrees', 'maxstat'],
}

m = RangerForestRegressor(quantiles=True)
gs = TuneSearchCV(m,
                  param_distributions=param_dists,
                  scoring=make_scorer(crps, greater_is_better=False),
)
gs.fit(X, y)  # Raise error: forecasts must be 2D arrays

I think the sklearn API is correct. To surpass this problem, I made some chages in skranger:

First, I initialize RangerForestRegressor with quantiles: Union[bool, list, np.ndarray]. If quantiles receives any variable of that type, it will be in quantile mode.
Second, if the model is in quantile mode, then it will call predict_quantile by default when predicting.

NOTE: Additional logic should be implemented if a non-quantile prediction is required and quantile mode is enabled.

… `np.ndarray` Signed-off-by: David <[email protected]>

crflynn · 2022-01-06T02:56:44Z

I see what you're doing and it makes sense. I'm wondering if we should just break out the quantile regression to a separate estimator. Does that make sense to do here?

FWIW R's grf does this and I followed this pattern when writing skgrf.

crflynn · 2022-01-06T03:47:37Z

Looks like builds are broken due to this bug in setuptools. Looks like a fix is in progress. pypa/setuptools#3002

r3v1 · 2022-01-09T16:58:39Z

I'm wondering if we should just break out the quantile regression to a separate estimator. Does that make sense to do here?

Well, I wouldn't know what would be better, I think you know better the global structure of the project.

Call predict_quantile by default if quantiles is set to True or…

250b36e

… `np.ndarray` Signed-off-by: David <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Call `predict_quantile` by default if `quantiles` is set to `True`, `list` or `np.ndarray` #133

Call `predict_quantile` by default if `quantiles` is set to `True`, `list` or `np.ndarray` #133

r3v1 commented Jan 4, 2022

crflynn commented Jan 6, 2022 •

edited

Loading

crflynn commented Jan 6, 2022

r3v1 commented Jan 9, 2022

Call predict_quantile by default if quantiles is set to True, list or np.ndarray #133

Are you sure you want to change the base?

Call predict_quantile by default if quantiles is set to True, list or np.ndarray #133

Conversation

r3v1 commented Jan 4, 2022

crflynn commented Jan 6, 2022 • edited Loading

crflynn commented Jan 6, 2022

r3v1 commented Jan 9, 2022

Call `predict_quantile` by default if `quantiles` is set to `True`, `list` or `np.ndarray` #133

Call `predict_quantile` by default if `quantiles` is set to `True`, `list` or `np.ndarray` #133

crflynn commented Jan 6, 2022 •

edited

Loading