SSL-Backdoor

SSL-Backdoor is an academic research library focused on poisoning attacks in self-supervised learning (SSL). The project currently implements two attack algorithms: SSLBKD and CorruptEncoder. This library rewrites the SSLBKD library, providing consistent training code while maintaining consistent hyperparameters (in line with SSLBKD) and training settings, making the training results directly comparable. The key features of this library are:

Unified poisining and training framework.
Retains the hyperparameters of the default implement, ensuring good comparability.

Future updates will support multimodal contrastive learning models.

Algorithm	Method	Clean Acc ↑	Backdoor Acc ↓	ASR ↑
SSLBKD	BYOL	66.38%	23.82%	70.2%
SSLBKD	SimCLR	70.9%	49.1%	33.9%
SSLBKD	MoCo	66.28%	33.24%	57.6%
SSLBKD	SimSiam	64.48%	29.3%	62.2%
CorruptEncoder	BYOL	65.48%	25.3%	9.66%
CorruptEncoder	SimCLR	70.14%	45.38%	36.9%
CorruptEncoder	MoCo	67.04%	38.64%	37.3%
CorruptEncoder	SimSiam	57.54%	14.14%	79.48%

Algorithm	Method	Clean Acc ↑	Backdoor Acc ↓	ASR ↑
CTRL	BYOL	75.02%	30.87%	66.95%
CTRL	SimCLR	70.32%	20.82%	81.97%
CTRL	MoCo	71.01%	54.5%	34.34%
CTRL	SimSiam	71.04%	50.36%	41.43%

Data calculated using the 10% available data evaluation protocol from the SSLBKD paper on the lorikeet class of ImageNet-100 and the airplane class of CIFAR-10, respectively.

Supported Attacks

Algorithm	Paper
SSLBKD	Backdoor attacks on self-supervised learning CVPR2022
CTRL	An Embarrassingly Simple Backdoor Attack on Self-supervised Learning CVPR2023
CorruptEncoder	Data poisoning based backdoor attacks to contrastive learning CVPR2024
BLTO (only inference)	BACKDOOR CONTRASTIVE LEARNING VIA BI-LEVEL TRIGGER OPTIMIZATION ICLR2024

Setup

To set up the project, follow these steps:

Clone the repository:

git clone https://github.com/jsrdcht/SSL-Backdoor.git
cd SSL-Backdoor

[optional] Install the required dependencies:
```
pip install -r requirements.txt
```

Usage

Data Organization

Take CIFAR10 as an example, organize the dataset as follows:

Store the dataset in data/CIFAR10/train and data/CIFAR10/test directories.
Each dataset should be organized in the ImageFolder format.
Generate the required dataset configuration filelist under data/CIFAR10. An example can be found in data/CIFAR10/sorted_trainset.txt. We provide a reference code for generating the dataset configuration file in scripts/all_data.ipynb.

For ImageNet-100, follow these extra steps to split the dataset based on the SSLBKD's class list:

python scripts/create_imagenet_subset.py --subset utils/imagenet100_classes.txt --full_imagenet_path <path> --subset_imagenet_path <path>

Configuration File

After organizing the data, you need to modify the config file to specify parameters for a single poisoning experiment. For example, in sslbkd.yaml, you need to set the attack target, poisoning rate, etc.

Regardless of whether the pre-training data and attack data come from the same filelist, you need to specify the reference_dataset_file_list parameter. For CorruptEncoder attacks, a spectial config file for the reference set is at SSL-Backdoor/poison-generation/poisonencoder_utils/data_config.txt.

Example config (configs/poisoning/trigger_based/sslbkd.yaml):

data: /workspace/sync/SSL-Backdoor/data/ImageNet-100/ImageNet100_trainset.txt  # Path to dataset configuration file
dataset: imagenet-100  # Dataset name
save_poisons: True  # Whether to save poisons for persistence, the default path is /poisons appended to the save_folder
save_poisons_path: # Path to save poisons
poisons_save_path: # Path where poisons are saved, using it when you restore training from checkpoints
if_target_from_other_dataset: False  # Whether the reference set comes from another dataset, always true for corruptencoder 

# Following parameters are one-to-one correspondence
attack_target_list:
  - 6  # Attack target: int
trigger_path_list:
  - /workspace/sync/SSL-Backdoor/poison-generation/triggers/trigger_14.png  # Trigger path
reference_dataset_file_list:
  - /workspace/sync/SSL-Backdoor/data/ImageNet-100/ImageNet100_trainset.txt  # Reference set's dataset configuration file
num_poisons_list:
  - 650  # Number of poisons

attack_target_word: n01558993  # Attack class name
trigger_insert: patch  # trigger type
trigger_size: 50  # Trigger size

training a ssl model on poisoned dataset

To train a model using the BYOL method with a specific attack algorithm, run the following command:

bash scripts/train_ssl.sh

Note: Most hyperparameters are hardcoded based on SSLBKD. Modify the script if you need to change any parameters. For CTRL and adaptive, you must specify the --no_gaussian flag to disable the Gaussian noise and use ResNet-CIFAR.

Evaluating a model using linear probing

To evaluate a model using the linear probing method with a specific attack algorithm, run the following command:

bash scripts/linear_probe.sh

TODO List

implement adaptive attack

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
__pycache__		__pycache__
configs/poisoning		configs/poisoning
datasets		datasets
models		models
poison-generation		poison-generation
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
class_index.txt		class_index.txt
eval_linear.py		eval_linear.py
eval_utils.py		eval_utils.py
methods.py		methods.py
requirements.txt		requirements.txt
ssl_pretrain.py		ssl_pretrain.py
util.py		util.py
vis_conf_matrix.py		vis_conf_matrix.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SSL-Backdoor

Supported Attacks

Setup

Usage

Data Organization

Configuration File

training a ssl model on poisoned dataset

Evaluating a model using linear probing

TODO List

About

Releases

Packages

Languages

License

jsrdcht/SSL-Backdoor

Folders and files

Latest commit

History

Repository files navigation

SSL-Backdoor

Supported Attacks

Setup

Usage

Data Organization

Configuration File

training a ssl model on poisoned dataset

Evaluating a model using linear probing

TODO List

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages