Skip to content

Commit

Permalink
Minor Refactoring
Browse files Browse the repository at this point in the history
Updated README.md
  • Loading branch information
OliverHennhoefer committed Dec 30, 2024
1 parent 7a2f71a commit d88dc7e
Show file tree
Hide file tree
Showing 60 changed files with 128 additions and 228 deletions.
71 changes: 41 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,36 +3,35 @@
[![License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause) ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/unquad)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

**unquad** is a wrapper applicable for most [*PyOD*](https://pyod.readthedocs.io/en/latest/) detectors (see [Supported Estimators](#supported-estimators)) for
**unquad** is a wrapper applicable for most [*PyOD*](https://pyod.readthedocs.io/en/latest/) detectors (see [Supported Estimators](#supported-estimators)) enabling
**uncertainty-quantified anomaly detection** based on one-class classification and the principles of **conformal inference**.

```sh
pip install unquad
```

Mind the **optional dependencies** for using deep learning models or the built-in datasets (see. [pyproject.toml](https://github.com/OliverHennhoefer/unquad/blob/main/pyproject.toml)).

## What is *Conformal Anomaly Detection*?

[*Conformal Anomaly Detection*](https://www.diva-portal.org/smash/get/diva2:690997/FULLTEXT02.pdf) (CAD) is based on the
model-agnostic and non-parametric framework of [*conformal prediction*](https://en.wikipedia.org/wiki/Conformal_prediction#:~:text=Conformal%20prediction%20(CP)%20is%20a,assuming%20exchangeability%20of%20the%20data.) (CP).
While CP aims to produce statistically valid prediction regions (*prediction intervals* or *prediction sets*) for any
given point predictor or classifier, CAD aims to control statistical metrics, like the [*false discovery rate*](https://en.wikipedia.org/wiki/False_discovery_rate),
for a given anomaly detector suitable for one-class classification – without overly compromising on its
[*statistical power*](https://en.wikipedia.org/wiki/Power_of_a_test).
[![start with why](https://img.shields.io/badge/start%20with-why%3F-brightgreen.svg?style=flat)](https://www.diva-portal.org/smash/get/diva2:690997/FULLTEXT02.pdf)

In essence, CAD translates anomaly scores into statistical p-values by comparing anomaly scores observed on test data to a retained set of calibration
scores as previously obtained for normal data during the model training stage.
The larger the discrepancy between *normal* scores and observed test scores, the lower the obtained (and **statistically valid**) p-value.
The p-values, instead of the usual anomaly estimates, allow, e.g., for FDR control by statistical procedures like *Benjamini-Hochberg*.
[*Conformal Anomaly Detection*](https://www.diva-portal.org/smash/get/diva2:690997/FULLTEXT02.pdf) applies the principles of conformal inference ([*conformal prediction*](https://en.wikipedia.org/wiki/Conformal_prediction#:~:text=Conformal%20prediction%20(CP)%20is%20a,assuming%20exchangeability%20of%20the%20data.)) to anomaly detection.
*Conformal Anomaly Detection* focuses on controlling error metrics like the [*false discovery rate*](https://en.wikipedia.org/wiki/False_discovery_rate), while maintaining [*statistical power*](https://en.wikipedia.org/wiki/Power_of_a_test).

CAD converts anomaly scores to _p_-values by comparing test data scores against calibration scores from normal training data.
The resulting _p_-value of the test score(s) is computed as the normalized rank among the calibration scores.
These **statistically valid** _p_-values enable error control through methods like *Benjamini-Hochberg*, replacing traditional anomaly estimates that lack any kind of statistical guarantee.

### Usage: Split-Conformal (Inductive Approach)

Using the default behavior of `ConformalDetector()` with default `DetectorConfig()`.

```python
from pyod.models.gmm import GMM

from unquad.utils.data.loader import DataLoader
from unquad.utils.enums.dataset import Dataset
from unquad.estimator.configuration import DetectorConfig
from unquad.utils.enums import Dataset
from unquad.data.loader import DataLoader
from unquad.estimator.detector import ConformalDetector
from unquad.strategy.split import SplitConformal
from unquad.utils.metrics import false_discovery_rate, statistical_power
Expand All @@ -42,8 +41,7 @@ x_train, x_test, y_test = dl.get_example_setup(random_state=1)

ce = ConformalDetector(
detector=GMM(),
strategy=SplitConformal(calib_size=1_000),
config=DetectorConfig(alpha=0.05),
strategy=SplitConformal(calib_size=1_000)
)

ce.fit(x_train)
Expand All @@ -54,23 +52,38 @@ print(f"Empirical Power: {statistical_power(y=y_test, y_hat=estimates)}")
```

Output:
```text
Empirical FDR: 0.108
Empirical Power: 0.892
```

The behavior can be customized by changing the `DetectorConfig()`:

```python
Empirical FDR: 0.03
Empirical Power: 0.97
@dataclass
class DetectorConfig:
alpha: float = 0.2 # Nominal FDR value
adjustment: Adjustment = Adjustment.BH # Multiple Testing Procedure
aggregation: Aggregation = Aggregation.MEDIAN # Score Aggregation (if necessary)
seed: int = 1
silent: bool = True
```

### Usage: Bootstrap-after-Jackknife+ (JaB+)

Using `ConformalDetector()` with customized `DetectorConfig()`.
The `BootstrapConformal()` strategy allows to set 2 of the 3 parameters `resampling_ratio`, `n_boostraps` and `n_calib`.
For either combination, the remaining parameter will be filled automatically. This allows exact control of the
calibration procedure when using a bootstrap strategy.

```python
from pyod.models.iforest import IForest

from unquad.utils.data.loader import DataLoader
from unquad.utils.enums.dataset import Dataset
from unquad.data.loader import DataLoader
from unquad.estimator.configuration import DetectorConfig
from unquad.estimator.detector import ConformalDetector
from unquad.strategy.bootstrap import BootstrapConformal
from unquad.utils.enums.aggregation import Aggregation
from unquad.utils.enums.adjustment import Adjustment
from unquad.utils.enums import Aggregation, Adjustment, Dataset
from unquad.utils.metrics import false_discovery_rate, statistical_power

dl = DataLoader(dataset=Dataset.SHUTTLE)
Expand All @@ -79,9 +92,7 @@ x_train, x_test, y_test = dl.get_example_setup(random_state=1)
ce = ConformalDetector(
detector=IForest(behaviour="new"),
strategy=BootstrapConformal(resampling_ratio=0.99, n_bootstraps=20, plus=True),
config=DetectorConfig(alpha=0.1,
adjustment=Adjustment.BENJAMINI_HOCHBERG,
aggregation=Aggregation.MEAN),
config=DetectorConfig(alpha=0.1, adjustment=Adjustment.BY, aggregation=Aggregation.MEAN),
)

ce.fit(x_train)
Expand All @@ -92,15 +103,15 @@ print(f"Empirical Power: {statistical_power(y=y_test, y_hat=estimates)}")
```

Output:
```python
Empirical FDR: 0.067
Empirical Power: 0.933
```text
Empirical FDR: 0.0
Empirical Power: 1.0
```

### Supported Estimators

The package currently supports anomaly estimators that are suitable for unsupervised one-class classification. As respective
detectors are therefore exclusively fitted on *normal* (or *non-anomalous*) data, parameters like *threshold* are therefore internally
The package only supports anomaly estimators that are suitable for unsupervised one-class classification. As respective
detectors are therefore exclusively fitted on *normal* (or *non-anomalous*) data, parameters like *threshold* are internally
set to the smallest possible values.

Models that are **currently supported** include:
Expand Down
4 changes: 2 additions & 2 deletions examples/abod.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
from pyod.models.abod import ABOD

from unquad.utils.data.loader import DataLoader
from unquad.utils.enums import Dataset
from unquad.data.loader import DataLoader
from unquad.estimator.detector import ConformalDetector
from unquad.strategy.cross_val import CrossValidationConformal
from unquad.utils.enums.dataset import Dataset
from unquad.utils.metrics import false_discovery_rate, statistical_power

if __name__ == "__main__":
Expand Down
4 changes: 2 additions & 2 deletions examples/autoencoder.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
from pyod.models.auto_encoder_torch import AutoEncoder

from unquad.utils.data.loader import DataLoader
from unquad.utils.enums import Dataset
from unquad.data.loader import DataLoader
from unquad.estimator.configuration import DetectorConfig
from unquad.estimator.detector import ConformalDetector
from unquad.strategy.split import SplitConformal
from unquad.utils.enums.dataset import Dataset
from unquad.utils.metrics import false_discovery_rate, statistical_power

if __name__ == "__main__":
Expand Down
4 changes: 2 additions & 2 deletions examples/cd.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
from pyod.models.cd import CD

from unquad.utils.data.loader import DataLoader
from unquad.utils.enums import Dataset
from unquad.data.loader import DataLoader
from unquad.estimator.detector import ConformalDetector
from unquad.strategy.cross_val import CrossValidationConformal
from unquad.utils.enums.dataset import Dataset
from unquad.utils.metrics import false_discovery_rate, statistical_power

if __name__ == "__main__":
Expand Down
4 changes: 2 additions & 2 deletions examples/copod.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
from pyod.models.copod import COPOD

from unquad.utils.data.loader import DataLoader
from unquad.utils.enums import Dataset
from unquad.data.loader import DataLoader
from unquad.estimator.detector import ConformalDetector
from unquad.strategy.jackknife import JackknifeConformal
from unquad.utils.enums.dataset import Dataset
from unquad.utils.metrics import false_discovery_rate, statistical_power

if __name__ == "__main__":
Expand Down
4 changes: 2 additions & 2 deletions examples/dif.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
from pyod.models.dif import DIF

from unquad.utils.data.loader import DataLoader
from unquad.utils.enums import Dataset
from unquad.data.loader import DataLoader
from unquad.estimator.detector import ConformalDetector
from unquad.strategy.bootstrap import BootstrapConformal
from unquad.utils.enums.dataset import Dataset
from unquad.utils.metrics import false_discovery_rate, statistical_power

if __name__ == "__main__":
Expand Down
4 changes: 2 additions & 2 deletions examples/ecod.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
from pyod.models.ecod import ECOD

from unquad.utils.data.loader import DataLoader
from unquad.utils.enums import Dataset
from unquad.data.loader import DataLoader
from unquad.estimator.detector import ConformalDetector
from unquad.strategy.jackknife import JackknifeConformal
from unquad.utils.enums.dataset import Dataset
from unquad.utils.metrics import false_discovery_rate, statistical_power

if __name__ == "__main__":
Expand Down
11 changes: 3 additions & 8 deletions examples/gmm.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
from pyod.models.gmm import GMM

from unquad.utils.data.loader import DataLoader
from unquad.utils.enums.dataset import Dataset
from unquad.estimator.configuration import DetectorConfig
from unquad.utils.enums import Dataset
from unquad.data.loader import DataLoader
from unquad.estimator.detector import ConformalDetector
from unquad.strategy.split import SplitConformal
from unquad.utils.metrics import false_discovery_rate, statistical_power
Expand All @@ -11,11 +10,7 @@
dl = DataLoader(dataset=Dataset.SHUTTLE)
x_train, x_test, y_test = dl.get_example_setup(random_state=1)

ce = ConformalDetector(
detector=GMM(),
strategy=SplitConformal(calib_size=1_000),
config=DetectorConfig(alpha=0.05),
)
ce = ConformalDetector(detector=GMM(), strategy=SplitConformal(calib_size=1_000))

ce.fit(x_train)
estimates = ce.predict(x_test)
Expand Down
4 changes: 2 additions & 2 deletions examples/hbos.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
from pyod.models.hbos import HBOS

from unquad.utils.data.loader import DataLoader
from unquad.utils.enums import Dataset
from unquad.data.loader import DataLoader
from unquad.estimator.detector import ConformalDetector
from unquad.strategy.cross_val import CrossValidationConformal
from unquad.utils.enums.dataset import Dataset
from unquad.utils.metrics import false_discovery_rate, statistical_power

if __name__ == "__main__":
Expand Down
12 changes: 5 additions & 7 deletions examples/iforest.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
from pyod.models.iforest import IForest

from unquad.utils.data.loader import DataLoader
from unquad.utils.enums import Dataset
from unquad.data.loader import DataLoader
from unquad.estimator.configuration import DetectorConfig
from unquad.estimator.detector import ConformalDetector
from unquad.strategy.bootstrap import BootstrapConformal
from unquad.utils.enums.aggregation import Aggregation
from unquad.utils.enums.adjustment import Adjustment
from unquad.utils.enums.dataset import Dataset
from unquad.utils.enums import Aggregation
from unquad.utils.enums import Adjustment
from unquad.utils.metrics import false_discovery_rate, statistical_power

if __name__ == "__main__":
Expand All @@ -17,9 +17,7 @@
detector=IForest(behaviour="new"),
strategy=BootstrapConformal(resampling_ratio=0.99, n_bootstraps=20, plus=True),
config=DetectorConfig(
alpha=0.1,
adjustment=Adjustment.BENJAMINI_HOCHBERG,
aggregation=Aggregation.MEAN,
alpha=0.1, adjustment=Adjustment.BY, aggregation=Aggregation.MEAN
),
)

Expand Down
4 changes: 2 additions & 2 deletions examples/inne.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
from pyod.models.inne import INNE

from unquad.utils.data.loader import DataLoader
from unquad.utils.enums import Dataset
from unquad.data.loader import DataLoader
from unquad.estimator.detector import ConformalDetector
from unquad.strategy.bootstrap import BootstrapConformal
from unquad.utils.enums.dataset import Dataset
from unquad.utils.metrics import false_discovery_rate, statistical_power

if __name__ == "__main__":
Expand Down
4 changes: 2 additions & 2 deletions examples/kde.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
from pyod.models.kde import KDE

from unquad.utils.data.loader import DataLoader
from unquad.utils.enums import Dataset
from unquad.data.loader import DataLoader
from unquad.estimator.configuration import DetectorConfig
from unquad.estimator.detector import ConformalDetector
from unquad.strategy.split import SplitConformal
from unquad.utils.enums.dataset import Dataset
from unquad.utils.metrics import false_discovery_rate, statistical_power

if __name__ == "__main__":
Expand Down
4 changes: 2 additions & 2 deletions examples/knn.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
from pyod.models.knn import KNN

from unquad.utils.data.loader import DataLoader
from unquad.utils.enums import Dataset
from unquad.data.loader import DataLoader
from unquad.estimator.configuration import DetectorConfig
from unquad.estimator.detector import ConformalDetector
from unquad.strategy.split import SplitConformal
from unquad.utils.enums.dataset import Dataset
from unquad.utils.metrics import false_discovery_rate, statistical_power

if __name__ == "__main__":
Expand Down
4 changes: 2 additions & 2 deletions examples/knn_mahalanobis.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@

from pyod.models.knn import KNN

from unquad.utils.data.loader import DataLoader
from unquad.utils.enums import Dataset
from unquad.data.loader import DataLoader
from unquad.estimator.configuration import DetectorConfig
from unquad.estimator.detector import ConformalDetector
from unquad.strategy.bootstrap import BootstrapConformal
from unquad.utils.enums.dataset import Dataset
from unquad.utils.metrics import false_discovery_rate, statistical_power

if __name__ == "__main__":
Expand Down
4 changes: 2 additions & 2 deletions examples/kpca.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
from pyod.models.kpca import KPCA

from unquad.utils.data.loader import DataLoader
from unquad.utils.enums import Dataset
from unquad.data.loader import DataLoader
from unquad.estimator.detector import ConformalDetector
from unquad.strategy.cross_val import CrossValidationConformal
from unquad.utils.enums.dataset import Dataset
from unquad.utils.metrics import false_discovery_rate, statistical_power

if __name__ == "__main__":
Expand Down
4 changes: 2 additions & 2 deletions examples/lmdd.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
from pyod.models.lmdd import LMDD

from unquad.utils.data.loader import DataLoader
from unquad.utils.enums import Dataset
from unquad.data.loader import DataLoader
from unquad.estimator.configuration import DetectorConfig
from unquad.estimator.detector import ConformalDetector
from unquad.strategy.split import SplitConformal
from unquad.utils.enums.dataset import Dataset
from unquad.utils.metrics import false_discovery_rate, statistical_power

if __name__ == "__main__":
Expand Down
4 changes: 2 additions & 2 deletions examples/loci.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
from pyod.models.loci import LOCI

from unquad.utils.data.loader import DataLoader
from unquad.utils.enums import Dataset
from unquad.data.loader import DataLoader
from unquad.estimator.configuration import DetectorConfig
from unquad.estimator.detector import ConformalDetector
from unquad.strategy.split import SplitConformal
from unquad.utils.enums.dataset import Dataset
from unquad.utils.metrics import false_discovery_rate, statistical_power

if __name__ == "__main__":
Expand Down
4 changes: 2 additions & 2 deletions examples/loda.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
from pyod.models.loda import LODA

from unquad.utils.data.loader import DataLoader
from unquad.utils.enums import Dataset
from unquad.data.loader import DataLoader
from unquad.estimator.detector import ConformalDetector
from unquad.strategy.cross_val import CrossValidationConformal
from unquad.utils.enums.dataset import Dataset
from unquad.utils.metrics import false_discovery_rate, statistical_power

if __name__ == "__main__":
Expand Down
4 changes: 2 additions & 2 deletions examples/lof.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
from pyod.models.lof import LOF

from unquad.utils.data.loader import DataLoader
from unquad.utils.enums import Dataset
from unquad.data.loader import DataLoader
from unquad.estimator.detector import ConformalDetector
from unquad.strategy.jackknife import JackknifeConformal
from unquad.utils.enums.dataset import Dataset
from unquad.utils.metrics import false_discovery_rate, statistical_power

if __name__ == "__main__":
Expand Down
Loading

0 comments on commit d88dc7e

Please sign in to comment.