Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ss test mapie #41

Open
wants to merge 31 commits into
base: dev
Choose a base branch
from
Open

Ss test mapie #41

wants to merge 31 commits into from

Conversation

ssorou1
Copy link
Collaborator

@ssorou1 ssorou1 commented Jan 31, 2025

Implementation of MAPIE for rf and mlp models.

Additions

Removals

Changes

Testing

Screenshots

Notes

Todos

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Visually tested in supported browsers and devices (see checklist below 👇)
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Testing checklist

Target Environment support

  • Windows
  • Linux
  • Browser

Accessibility

  • Keyboard friendly
  • Screen reader friendly

Other

  • Is useable without CSS
  • Is useable without JS
  • Flexible from small to large screens
  • No linting errors or warnings
  • JavaScript tests are passing

@ssorou1 ssorou1 requested a review from glitt13 January 31, 2025 21:26
@ssorou1
Copy link
Collaborator Author

ssorou1 commented Jan 31, 2025

Changes are made in fs_algo_train_eval.py

@glitt13 glitt13 changed the base branch from main to dev February 3, 2025 23:31
Copy link
Collaborator

@glitt13 glitt13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Soroush,

These comments relate to some overall changes to design that won't be quick fixes. Given that, I'd suggest working on these revisions first in this same branch, then once pushed back into this PR I'll take a closer look in reviewing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this snuck in before the .gitignore changes. Delete file, git add . , git commit, git push

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

mean_pred = predictions.mean(axis=0)
std_pred = predictions.std(axis=0)

ci_factors = {90: 1.645, 95: 1.96, 99: 2.576}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than manually specify these factors, let's use a function that can do it automatically, and for any confidence interval of interest.
https://stackoverflow.com/questions/55857722/how-to-calculate-a-confidence-interval-using-numpy-percentile-in-python

std_pred = predictions.std(axis=0)

ci_factors = {90: 1.645, 95: 1.96, 99: 2.576}
confidence_intervals = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some design requirements commentary after reflecting on what this multiple ci_factor will mean:

We'll want a way to communicate the confidence intervals that the user cares about. The user should specify the confidence interval in the algo configuration file. This could be a single value or multiple values that ultimately gets passed into the algorithms via Retr_Params - perhaps we can add a section to this dict named uncertainty. We'll want to keep track of the confidence intervals of interest for communicating results. This could mean tabular data and plots. This will be a little more challenging when accommodating multiple confidence intervals, but we could probably standardize how ci data look in a table, e.g.
column names of ci_90, ci_95, etc. We'll also need to make sure different plots that are generated have different filenames and titles specifying the confidence intervals.

In summary for this PR (prior to worrying about tables/plots) we first need a way to handle confidence intervals via config file, Retr_Params, and how we track it in the data objects we generate.

# --- Calculate prediction intervals using MAPIE ---
# mapie = MapieRegressor(rf, cv="prefit", agg_function="median")
# mapie.fit(self.X_train, self.y_train)
mapie = self.calculate_mapie(rf)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to generalize mapie further. Take it out of the individual algorithms and create a generalized function for prediction uncertainty that is called after the algorithm training. Here:

@glitt13 glitt13 mentioned this pull request Feb 7, 2025
21 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants