Ss test mapie #41

ssorou1 · 2025-01-31T21:26:19Z

Implementation of MAPIE for rf and mlp models.

Additions

Removals

Changes

Testing

Screenshots

Notes

Todos

Checklist

Testing checklist

Target Environment support

Windows
Linux
Browser

Accessibility

Keyboard friendly
Screen reader friendly

Other

…dev for parquet column features" This reverts commit 7faad8a, reversing changes made to bcd50c8.

fix: add back in accidental removal of read_type argument.

…tion-selector into ss_test_fci_dev3

ssorou1 · 2025-01-31T21:27:04Z

Changes are made in fs_algo_train_eval.py

…ng calculation. Convert rf Bagging ci as a separate function

…ing calculation. Convert mlp Bagging ci as a separate function

…trapping runs from the yaml file

…strapping runs from the yaml file

…ccordingly

…prediction algorithms pipeline to remove ci

glitt13

Hi Soroush,

These comments relate to some overall changes to design that won't be quick fixes. Given that, I'd suggest working on these revisions first in this same branch, then once pushed back into this PR I'll take a closer look in reviewing.

glitt13 · 2025-02-07T16:32:58Z

pkg/proc.attr.hydfab/.RData

I think this snuck in before the .gitignore changes. Delete file, git add . , git commit, git push

glitt13 · 2025-02-07T16:48:46Z

pkg/fs_algo/fs_algo/fs_algo_train_eval.py

+        mean_pred = predictions.mean(axis=0)
+        std_pred = predictions.std(axis=0)
+
+        ci_factors = {90: 1.645, 95: 1.96, 99: 2.576}


Rather than manually specify these factors, let's use a function that can do it automatically, and for any confidence interval of interest.
https://stackoverflow.com/questions/55857722/how-to-calculate-a-confidence-interval-using-numpy-percentile-in-python

glitt13 · 2025-02-07T16:51:45Z

pkg/fs_algo/fs_algo/fs_algo_train_eval.py

+        std_pred = predictions.std(axis=0)
+
+        ci_factors = {90: 1.645, 95: 1.96, 99: 2.576}
+        confidence_intervals = {


Some design requirements commentary after reflecting on what this multiple ci_factor will mean:

We'll want a way to communicate the confidence intervals that the user cares about. The user should specify the confidence interval in the algo configuration file. This could be a single value or multiple values that ultimately gets passed into the algorithms via Retr_Params - perhaps we can add a section to this dict named uncertainty. We'll want to keep track of the confidence intervals of interest for communicating results. This could mean tabular data and plots. This will be a little more challenging when accommodating multiple confidence intervals, but we could probably standardize how ci data look in a table, e.g.
column names of ci_90, ci_95, etc. We'll also need to make sure different plots that are generated have different filenames and titles specifying the confidence intervals.

In summary for this PR (prior to worrying about tables/plots) we first need a way to handle confidence intervals via config file, Retr_Params, and how we track it in the data objects we generate.

glitt13 · 2025-02-07T16:55:17Z

pkg/fs_algo/fs_algo/fs_algo_train_eval.py

+            # --- Calculate prediction intervals using MAPIE ---
+            # mapie = MapieRegressor(rf, cv="prefit", agg_function="median")  
+            # mapie.fit(self.X_train, self.y_train)
+            mapie = self.calculate_mapie(rf)


Let's try to generalize mapie further. Take it out of the individual algorithms and create a generalized function for prediction uncertainty that is called after the algorithm training. Here:

formulation-selector/pkg/fs_algo/fs_algo/fs_algo_train_eval.py

Line 1186 in b2f104a

…r-specified confidence level

Soroush Sorourian and others added 19 commits January 22, 2025 15:49

bring in the ci for rf in fs_algo_train_eval.py

31791cb

bring in ci to fs_proc_algo.py

dcc6c5a

bring in ci to fs_pred_algo.py

1805ec0

apply only one n_estimators (grid selection bug)

93ede73

fix a syntax error in fs_algo_train_eval

2d3dbfa

clean fs_algo_train_eval.py

1154eef

added unit test for std_Xtrain_path function

50e790c

Revert "Merge remote-tracking branch 'upstream/dev' into ss_test_fci_…

ffbbc62

…dev for parquet column features" This reverts commit 7faad8a, reversing changes made to bcd50c8.

added unit test for fci function

bdb7d32

Update fs_pred_algo.py

cb0efe1

fix: add back in accidental removal of read_type argument.

brought back list of values in n_estimators in xssa_algo_config.yaml

5eb00e4

Merge branch 'ss_test_fci_dev3' of https://github.com/ssorou1/formula…

c310236

…tion-selector into ss_test_fci_dev3

Incorporate Bagging into mlp in fs_algo_train_eval

737c36a

rf n_estimators=400

88661ad

Incorporate Bagging into rf in fs_algo_train_eval

ea914f4

incorporated mapie for rf model

ad24727

update the MAPIE to use the same fit as rf

983a662

incorporated mapie for mlp model

710038a

deleted unsed pred_rf variable inside tran_algos function

d24e19f

ssorou1 requested a review from glitt13 January 31, 2025 21:26

glitt13 changed the base branch from main to dev February 3, 2025 23:31

Soroush Sorourian added 8 commits February 6, 2025 13:17

add number of bootstrap runs as a parameter to the yaml file

08deb3c

Implemented multiple confidence intervals (90, 95 & 99%) for rf Baggi…

6476710

…ng calculation. Convert rf Bagging ci as a separate function

Implemented multiple confidence intervals (90, 95 & 99%) for mlp Bagg…

e94b32d

…ing calculation. Convert mlp Bagging ci as a separate function

Update rf_Bagging_ci function to dynamically read the number of boots…

746b172

…trapping runs from the yaml file

Update mlp_Bagging_ci function to dynamically read the number of boot…

6fdb8d6

…strapping runs from the yaml file

develop a separate function for MAPIE and update fs_algo_train_eval a…

52f088f

…ccordingly

Rename ci for rf model for clarification

048ca30

Update fs_pred to consider forestci only for rf model. Update saving …

5c9e14b

…prediction algorithms pipeline to remove ci

added pkg/proc.attr.hydfab/.RData to the gitignore

275564f

glitt13 requested changes Feb 7, 2025

View reviewed changes

Soroush Sorourian added 2 commits February 7, 2025 11:46

add confidence level to the yaml file

75a5140

update Bagging_ci files to calculate the confidence interval from use…

3bc23ea

…r-specified confidence level

glitt13 mentioned this pull request Feb 7, 2025

Ss test mlpci #39

Closed

21 tasks

delete .RData

ba8858d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ss test mapie #41

Ss test mapie #41

ssorou1 commented Jan 31, 2025

ssorou1 commented Jan 31, 2025

glitt13 left a comment

glitt13 Feb 7, 2025

ssorou1 Feb 7, 2025

glitt13 Feb 7, 2025

glitt13 Feb 7, 2025

glitt13 Feb 7, 2025

Ss test mapie #41

Are you sure you want to change the base?

Ss test mapie #41

Conversation

ssorou1 commented Jan 31, 2025

Additions

Removals

Changes

Testing

Screenshots

Notes

Todos

Checklist

Testing checklist

Target Environment support

Accessibility

Other

ssorou1 commented Jan 31, 2025

glitt13 left a comment

Choose a reason for hiding this comment

glitt13 Feb 7, 2025

Choose a reason for hiding this comment

ssorou1 Feb 7, 2025

Choose a reason for hiding this comment

glitt13 Feb 7, 2025

Choose a reason for hiding this comment

glitt13 Feb 7, 2025

Choose a reason for hiding this comment

glitt13 Feb 7, 2025

Choose a reason for hiding this comment