Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AnomalyDetection] Add threshold and aggregation functions. #34018

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

shunping
Copy link
Collaborator

Also include the following changes:

  • change prediction to predictions (iterable) in AnomalyResult per reviews.
  • fix some tests that contaminate _KNONW_SPECIFIABLE

This is a follow-up PR of #33845 and #33994.

Also include the following changes:
* change prediction to predictions (iterable) in AnomalyResult.
* fix some tests that contaminate _KNONW_SPECIFIABLE
@shunping
Copy link
Collaborator Author

r: @damccorm

@shunping shunping marked this pull request as ready for review February 19, 2025 13:22
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

Comment on lines +39 to +40
include_history (bool): If True, include the input predictions in the
`agg_history` of the output. Defaults to False.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is history a standard term here or something we came up with? It seems a little odd to me if its not already a standard, to me it implies including historical predictions (e.g. what did I predict for the last data point).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use that to store pre-aggregated predictions so that users can always go back to check what leads to the aggregated result. It is kind of a history to me, but I agree that it may be misleading in some other context. If you have a better term, I am more than happy to hear that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just include_input_predictions? And input_predictions instead of agg_history?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about include_source_predictions and source_predictions?

if prediction.label is not None
]

if len(labels) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a hard time reasoning about this - does this basically mean all subpredictors were inconclusive?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if inconclusive/None is still a useful input to the aggregation function - I could see someone wanting to treat inconclusive as an anomaly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"""
scores = [
prediction.score for prediction in predictions
if prediction.score is not None and not math.isnan(prediction.score)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might make sense to define a score for predictions which have no score, but have a label (and vice versa). Or maybe we can just throw? I guess this relates to my above question - when would we expect this scoreless condition to happen.

Copy link
Collaborator Author

@shunping shunping Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this important point.

Let me take a step back and clarify the workflow we implemented here:
[input] -> detector -> [score] -> threhold_fn -> [label] -> aggregation_fn -> [aggregated label]

  • First, we are trying to address scenarios where a detector generates score of None and NaN. In my opinion, we can distinguish between these two cases:

    • The detector is NOT ready to give a prediction. This could imply that the detector needs some warm-up time before the first inference can be made.
    • The detector is ready to predict, but there is something wrong during the prediction process. For example, the input data could be ill-formated or the detector is simply not able to make a prediction on this input.

    We can use None to represent the first case, and NaN for the second one. The rationale is that None value is something we don't know yet, but recoverable (if we feed the input into the detector that is ready to score), but NaN is coming from an error during prediction and can never be recovered.

  • After we have None and NaN scores, the threshold_fn needs to handle how to assign labels for them.

    • In the current implementation, I only consider None and assign a normal label to it, which may be ok, because we don't want to flag outliers when the detector is still warming up. Alternatively, we can also set the label to be None which means that we will defer the decision to other detectors.
    • For the irrecoverable NaN score, I think we can assign an outlier label.
  • When multiple labels are ready for aggregation, it is reasonable to apply the aggregation_fn on all the non-None labels. If they are all None (the situation you mentioned in the previous comment), maybe we can expose another parameter in the aggregation function for undecided default.

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I like this approach and think that using NaN/None for weird failure/intentional no output is reasonable.

We can use None to represent the first case, and NaN for the second one. The rationale is that None value is something we don't know yet, but recoverable (if we feed the input into the detector that is ready to score), but NaN is coming from an error during prediction and can never be recovered.

I'd actually flip these. I think None is more likely to happen because of some user mistake (e.g. I'm using a predictor that outputs a label in a situation that expects a score or vice versa), whereas NaN is an intentional choice.

When multiple labels are ready for aggregation, it is reasonable to apply the aggregation_fn on all the non-None labels. If they are all None (the situation you mentioned in the previous comment), maybe we can expose another parameter in the aggregation function for undecided default.

I think if all the detectors agree (whether that is a None or NaN, it makes sense to match whatever they are outputting). If they're all inconclusive, an inconclusive result makes sense. If they're all errors, an error result makes sense. Thoughts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually flip these. I think None is more likely to happen because of some user mistake (e.g. I'm using a predictor that outputs a label in a situation that expects a score or vice versa), whereas NaN is an intentional choice.

I am fine with that.

I think if all the detectors agree (whether that is a None or NaN, it makes sense to match whatever they are outputting). If they're all inconclusive, an inconclusive result makes sense. If they're all errors, an error result makes sense. Thoughts?

Hmmm...Are you saying that other than normal and outlier label, we will have to add two labels for the cases of score=None and score=NaN, respectively? This will be get a bit complicated; for example, some model says None while the other says NaN.

Comment on lines +95 to +96
self._threshold_fn = cast(
ThresholdFn, Specifiable.from_spec(threshold_fn_spec))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this cast? Is this a requirement for instantiating specs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this cast makes my IDE happy, but we can remove that without any lint problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think if we can remove it lets do so

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants