Skip to content

Implementation of an out-of-distribution detection method for geospatial deployments and its related experiments.

License

Notifications You must be signed in to change notification settings

microsoft/geospatial-ood-detection

Repository files navigation

Distribution Shifts at Scale: Out-of-distribution Detection in Earth Observation

TARDIS Pipeline
TARDIS consists of four key steps:
1. Sampling in-distribution (ID) and WILD samples.
2. Extracting internal activations from a pre-trained model ( f ) for both ID and WILD samples.
3. Clustering the combined feature space and labeling WILD samples as surrogate-ID or surrogate-OOD.
4. Fitting a binary classifier ( g ) on the labeled feature representations to distinguish between ID and OOD samples. The classifier, during deployment, flags out-of-distribution inputs.

Approach

TARDIS is a post-hoc OOD detection method designed for scalable geospatial deployments. It works by extracting internal activations from a pre-trained model, clustering the feature space, and assigning surrogate labels to WILD samples as either in-distribution (ID) or out-of-distribution (OOD). These surrogate labels are then used to train a binary classifier, enabling OOD detection during inference without compromising the model's primary task performance. The method is computationally efficient, making it practical for large-scale real-world deployments. For more details, check out our paper.

Overview

We first demonstrate our method on two datasets, EuroSAT and xBD, under 17 experimental setups involving covariate and semantic shifts. This is implemented in notebooks/eurosat_exp.ipynb and notebooks/xbd_exp.ipynb, with the corresponding code in src/tardis/.

Then, we scale up the method for real-world deployment using a model trained on the Fields of the World (FTW) dataset. This is demonstrated in notebooks/tardis_FTW.ipynb, with the corresponding code in src/tardis_ftw/.

We assume access to a pre-trained model, its training set, and a collection of data with an unknown distribution (either ID or OOD). Here is how our method works:

from src.tardis_ftw.tardis_wrapper import TARDISWrapper

ood_model = TARDISWrapper(
    base_model,              # The pre-trained model to investigate
    hook_layer_name,         # The layer name from which activations are extracted
    id_loader,               # DataLoader for in-distribution (ID) samples
    wild_loader,             # DataLoader for WILD samples with unknown distributions (ID or OOD)
    num_clusters,            # Number of clusters for K-Means clustering in activation space
    id_ratio,                # Threshold for determining surrogate-ID or surrogate-OOD labels based on ID sample ratio in a cluster
    classifier_save_path,    # Path to save the trained binary classifier
)

# Step 1: Extract internal features
X, y = ood_model.compute_features()

# Step 2: Cluster the feature space
y_clustered = ood_model.feature_space_clustering(X, y)

# Step 3: Train the binary OOD classifier
metrics = ood_model.g_classification(X, y_clustered)

# Step 4: Deploy the binary OOD classifier as a wrapper around the model
f_preds, g_pred_probs = ood_model.f_g_prediction(inference_images)

# `f_preds`: The predictions of the  model (`base_model`).  
# `g_pred_probs`: Probability scores of the binary classifier, where **0** indicates higher ID characteristics and **1** indicates stronger OOD characteristics.

OOD Detection Goes Global: TARDIS in Action

TARDIS in Action The figure illustrates the geographical distribution of ID and WILD samples, where WILD samples are classified by the domain shift classifier as either ID or OOD. For randomly sampled Sentinel-2 input pairs, the model predictions and classifier outputs are shown. Notably, poor model predictions often correspond to high OOD detection performance, with a geographical pattern emerging: samples from arid biomes (e.g., the Sahara, Patagonia, Inner Australia) and polar regions (e.g., Icelandic glaciers, South Pole) are frequently flagged as OOD due to their ecological dissimilarity to mesic environments represented in the ID samples.

Citation

@misc{ekim2024distributionshiftsscaleoutofdistribution,
      title={Distribution Shifts at Scale: Out-of-distribution Detection in Earth Observation}, 
      author={Burak Ekim and Girmaw Abebe Tadesse and Caleb Robinson and Gilles Hacheme and Michael Schmitt and Rahul Dodhia and Juan M. Lavista Ferres},
      year={2024},
      eprint={2412.13394},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.13394}, 
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

About

Implementation of an out-of-distribution detection method for geospatial deployments and its related experiments.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published