-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AnomalyDetection] Add threshold and aggregation functions. #34018
base: master
Are you sure you want to change the base?
Conversation
Also include the following changes: * change prediction to predictions (iterable) in AnomalyResult. * fix some tests that contaminate _KNONW_SPECIFIABLE
r: @damccorm |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
include_history (bool): If True, include the input predictions in the | ||
`agg_history` of the output. Defaults to False. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is history a standard term here or something we came up with? It seems a little odd to me if its not already a standard, to me it implies including historical predictions (e.g. what did I predict for the last data point).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use that to store pre-aggregated predictions so that users can always go back to check what leads to the aggregated result. It is kind of a history to me, but I agree that it may be misleading in some other context. If you have a better term, I am more than happy to hear that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just include_input_predictions
? And input_predictions
instead of agg_history
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about include_source_predictions
and source_predictions
?
if prediction.label is not None | ||
] | ||
|
||
if len(labels) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having a hard time reasoning about this - does this basically mean all subpredictors were inconclusive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if inconclusive/None is still a useful input to the aggregation function - I could see someone wanting to treat inconclusive as an anomaly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my comment in https://github.com/apache/beam/pull/34018/files/7ede19cde81ab5ed889b92e04a1feb92fc239c1d#r1962833576.
""" | ||
scores = [ | ||
prediction.score for prediction in predictions | ||
if prediction.score is not None and not math.isnan(prediction.score) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might make sense to define a score for predictions which have no score, but have a label (and vice versa). Or maybe we can just throw? I guess this relates to my above question - when would we expect this scoreless condition to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for raising this important point.
Let me take a step back and clarify the workflow we implemented here:
[input] -> detector -> [score] -> threhold_fn -> [label] -> aggregation_fn -> [aggregated label]
-
First, we are trying to address scenarios where a detector generates score of
None
andNaN
. In my opinion, we can distinguish between these two cases:- The detector is NOT ready to give a prediction. This could imply that the detector needs some warm-up time before the first inference can be made.
- The detector is ready to predict, but there is something wrong during the prediction process. For example, the input data could be ill-formated or the detector is simply not able to make a prediction on this input.
We can use
None
to represent the first case, andNaN
for the second one. The rationale is thatNone
value is something we don't know yet, but recoverable (if we feed the input into the detector that is ready to score), butNaN
is coming from an error during prediction and can never be recovered. -
After we have
None
andNaN
scores, the threshold_fn needs to handle how to assign labels for them.- In the current implementation, I only consider
None
and assign a normal label to it, which may be ok, because we don't want to flag outliers when the detector is still warming up. Alternatively, we can also set the label to beNone
which means that we will defer the decision to other detectors. - For the irrecoverable
NaN
score, I think we can assign an outlier label.
- In the current implementation, I only consider
-
When multiple labels are ready for aggregation, it is reasonable to apply the aggregation_fn on all the non-None labels. If they are all
None
(the situation you mentioned in the previous comment), maybe we can expose another parameter in the aggregation function for undecided default.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, I like this approach and think that using NaN/None for weird failure/intentional no output is reasonable.
We can use None to represent the first case, and NaN for the second one. The rationale is that None value is something we don't know yet, but recoverable (if we feed the input into the detector that is ready to score), but NaN is coming from an error during prediction and can never be recovered.
I'd actually flip these. I think None
is more likely to happen because of some user mistake (e.g. I'm using a predictor that outputs a label in a situation that expects a score or vice versa), whereas NaN is an intentional choice.
When multiple labels are ready for aggregation, it is reasonable to apply the aggregation_fn on all the non-None labels. If they are all None (the situation you mentioned in the previous comment), maybe we can expose another parameter in the aggregation function for undecided default.
I think if all the detectors agree (whether that is a None or NaN, it makes sense to match whatever they are outputting). If they're all inconclusive, an inconclusive result makes sense. If they're all errors, an error result makes sense. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd actually flip these. I think None is more likely to happen because of some user mistake (e.g. I'm using a predictor that outputs a label in a situation that expects a score or vice versa), whereas NaN is an intentional choice.
I am fine with that.
I think if all the detectors agree (whether that is a None or NaN, it makes sense to match whatever they are outputting). If they're all inconclusive, an inconclusive result makes sense. If they're all errors, an error result makes sense. Thoughts?
Hmmm...Are you saying that other than normal and outlier label, we will have to add two labels for the cases of score=None and score=NaN, respectively? This will be get a bit complicated; for example, some model says None while the other says NaN.
self._threshold_fn = cast( | ||
ThresholdFn, Specifiable.from_spec(threshold_fn_spec)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this cast? Is this a requirement for instantiating specs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this cast makes my IDE happy, but we can remove that without any lint problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think if we can remove it lets do so
Also include the following changes:
This is a follow-up PR of #33845 and #33994.