Noisy Test-Time Adaptation in Vision-Language Models

Chentao Cao, Zhun Zhong, Zhanke Zhou, Tongliang Liu, Yang Liu, Kun Zhang, Bo Han

Comparison between TTA, noisy TTA, zero-shot OOD detection, and the proposed zero-shot noisy TTA (left). Performance ranking distribution of five TTA methods across 44 ID-OOD dataset pairs (right).

Abstract

Test-time adaptation (TTA) aims to address distribution shifts between source and target data by relying solely on target data during testing. In open-world scenarios, models often encounter noisy samples, i.e., samples outside the in-distribution (ID) label space. Leveraging the zero-shot capability of pre-trained vision-language models (VLMs), this paper introduces Zero-Shot Noisy TTA (ZS-NTTA), focusing on adapting the model to target data with noisy samples during test-time in a zero-shot manner. In the preliminary study, we reveal that existing TTA methods suffer from a severe performance decline under ZS-NTTA, often lagging behind even the frozen model. We conduct comprehensive experiments to analyze this phenomenon, revealing that the negative impact of unfiltered noisy data outweighs the benefits of clean data during model updating. In addition, as these methods adopt the adapting classifier to implement ID classification and noise detection sub-tasks, the ability of the model in both sub-tasks is largely hampered. Based on this analysis, we propose a novel framework that decouples the classifier and detector, focusing on developing an individual detector while keeping the classifier (including the backbone) frozen. Technically, we introduce the Adaptive Noise Detector (AdaND), which utilizes the frozen model’s outputs as pseudo-labels to train a noise detector for detecting noisy samples effectively. To address clean data streams, we further inject Gaussian noise during adaptation, preventing the detector from misclassifying clean samples as noisy. Beyond the ZS-NTTA, AdaND can also improve the zero-shot out-of-distribution (ZS-OOD) detection ability of VLMs. Extensive experiments show that our method outperforms in both ZS-NTTA and ZS-OOD detection. On ImageNet, AdaND achieves a notable improvement of $8.32%$ in harmonic mean accuracy ($\text{Acc}_\text{H}$) for ZS-NTTA and $9.40%$ in FPR95 for ZS-OOD detection, compared to state-of-the-art methods. Importantly, AdaND is computationally efficient and comparable to the model-frozen method.

Setup

Dependencies

# make sure you have installed anaconda
conda create -n zs_ntta
conda activate zs_ntta
pip install -r requirements.txt

Dataset Preparation

Please set the base path of all datasets to config.data.path in the configs/default_configs.py.

In-distribution (Clean) Datasets

We consider the following ID datasets: CIFAR-10/100, CUB-200-2011, STANFORD-CARS, Food-101, Oxford-IIIT Pet, ImageNet-1k, ImageNet-K, ImageNet-A, ImageNet-V2, and ImageNet-R.

Out-of-Distribution (Noisy) Datasets

We consider the following OOD datasets: iNaturalist, SUN, Places, SVHN, LSUN.

Specifically, please refer to Huang et al. 2021 for the preparation of the following datasets: iNaturalist, SUN, Places, Texture.

Quick Start

You can directly use the following script. This repository also provides implementations for running ZS-CLIP, Tent, SoTTA, TPT, and TDA. All methods run in the same way, just specify the configuration file in the bash script.

# Taking CIFAR-10/100 as an example
bash scripts/cifar.sh

Citation

If you find our work useful, please consider citing our paper:

@inproceedings{cao2025zsntta,
  title={Noisy Test-Time Adaptation in Vision-Language Models},
  author={Cao, Chentao and Zhong, Zhun and Zhou, Zhanke and Liu, Tongliang and Liu, Yang and Zhang, Kun and Han, Bo},
  booktitle={ICLR},
  year={2025}
}

Our implementation is based on TPT and OWTTT. Thanks for their great work!

Questions

If you have any questions, please feel free to contact [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
clip		clip
configs		configs
data		data
img		img
scripts		scripts
ttda_method		ttda_method
utils		utils
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Noisy Test-Time Adaptation in Vision-Language Models

Abstract

Setup

Dependencies

Dataset Preparation

In-distribution (Clean) Datasets

Out-of-Distribution (Noisy) Datasets

Quick Start

Citation

Questions

About

Releases

Packages

Languages

Aboriginer/ZS-NTTA

Folders and files

Latest commit

History

Repository files navigation

Noisy Test-Time Adaptation in Vision-Language Models

Abstract

Setup

Dependencies

Dataset Preparation

In-distribution (Clean) Datasets

Out-of-Distribution (Noisy) Datasets

Quick Start

Citation

Questions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages