⭐️ Backdoored Model Detection

🔍 Overview

This repository contains our solution for the Backdoored Model Detection Challenge, part of the Rayan International AI Contest. The challenge aims to develop a system that classifies models as either backdoored or clean.

🎯 Challenge Objective

The goal is to build a robust system that can:

Process input consisting of:
- A machine learning model (specifically, a PreActResNet18 model trained on an image classification task)
- The number of classes in the dataset
- A folder path to clean test images (representing 1% of the test dataset)
- The image transformation function used in model preprocessing
Identify and return whether the provided model contains a backdoor, with output:
- 0 if the model is backdoored
- 1 if the model is clean

For more details, check the problem_description.pdf file.

⚙️ Constraints

🗂️ Maximum Upload File Size: 1MB
🔢 Constant Parameters Limitation:
- ❌ No more than one constant parameter (e.g., a threshold to discriminate models trained on the evaluation set)
🌐 Internet Access:
- ❌ No internet access will be granted during the test process
⏱️ Computation Time Per Sample:
- ⏳ 60-second limit per data sample for answer computation
📁 File Permissions:
- ❌ Strictly prohibited from reading or writing any files in your solution

🧠 Our Approach

Our solution extends the Mm-Bd method by introducing feature-space optimization, improved initialization, image resizing, and a refined statistical detection rule. We address the challenge of detecting backdoors without prior knowledge of the attack type or trigger structure by reverse-engineering the model's sensitivity in the feature space.

1. Activation Perturbation Optimization

We attempt to expose latent triggers by optimizing activation patterns to maximize class confidence.

Initialization: We sample clean images from the verification set and resize them to $128\times128$, as this resolution consistently improves accuracy.
Feature Extraction: Inputs are passed through the model up to model.layer1[0] to produce an initial feature tensor. We found that initializing from real features yields better results than random initialization.
Optimization: We freeze deeper layers and treat the extracted activations as trainable variables. We optimize these activations to maximize the model's confidence for a target class.
Scoring: We compute a maximal confidence score for each class, penalized by the activation magnitudes of other classes to ensure specificity.

2. Statistical Outlier Detection

We employ a statistical anomaly test to distinguish between clean and poisoned models.

Hypothesis: Clean models exhibit balanced confidence across classes, whereas backdoored models display unusually high maximal confidence for a specific target class.
Distribution Fitting: We fit the maximal confidence scores (excluding the highest one) to an exponential distribution, which provided the best separation in our experiments.
Decision Rule: We treat the maximum score as a candidate outlier and calculate its p-value. If the p-value is below 0.08, the model is classified as backdoored; otherwise, it is classified as clean.

🏆 Results

Our solution for this Challenge achieved outstanding results. The evaluation metric for this challenge is Accuracy, with submissions tested on a private test dataset. Our method achieved the second highest score.

The table below presents a summary of the Top 🔟 teams and their respective accuracy scores:

Rank	Team	Accuracy (%)
🥇	AUTs	79
🥈	No Trust Issues Here (Our Team)	78
🥉	Persistence	74
4	AI Guardians of Trust	72
5	AIUoK	70
6	Pileh	67
7	My Team	66
8	Unknown	66
9	red_serotonin	65
10	DevNull	65

📄 Technical Report

For a detailed explanation of our methodologies, experiments, and results, please refer to our full Technical Report.

🏃🏻‍♂️‍➡️ Steps to Set Up and Run

Follow these instructions to set up your environment and execute the pipeline.

1. Clone the Repository

git clone git@github.com:safinal/backdoored-model-detection.git
cd backdoored-model-detection

2. Set Up the Environment

We recommend using a virtual environment to manage dependencies.

Using venv:

python -m venv venv
source venv/bin/activate       # On macOS/Linux
venv\Scripts\activate          # On Windows

Using conda:

conda create --name backdoored-model-detection python=3.8 -y
conda activate backdoored-model-detection

3. Install Dependencies

Install all required libraries from the requirements.txt file:

pip install -r requirements.txt

4. Run

python run.py --config ./config/config.yaml

🤝🏼 Contributions

We welcome contributions from the community to make this repository better!

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
config		config
src		src
LICENSE		LICENSE
README.md		README.md
problem_description.pdf		problem_description.pdf
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

⭐️ Backdoored Model Detection

🔍 Overview

🎯 Challenge Objective

⚙️ Constraints

🧠 Our Approach

1. Activation Perturbation Optimization

2. Statistical Outlier Detection

🏆 Results

📄 Technical Report

🏃🏻‍♂️‍➡️ Steps to Set Up and Run

1. Clone the Repository

2. Set Up the Environment

3. Install Dependencies

4. Run

🤝🏼 Contributions

About

Uh oh!

Releases

Packages

Languages

License

safinal/backdoored-model-detection

Folders and files

Latest commit

History

Repository files navigation

⭐️ Backdoored Model Detection

🔍 Overview

🎯 Challenge Objective

⚙️ Constraints

🧠 Our Approach

1. Activation Perturbation Optimization

2. Statistical Outlier Detection

🏆 Results

📄 Technical Report

🏃🏻‍♂️‍➡️ Steps to Set Up and Run

1. Clone the Repository

2. Set Up the Environment

3. Install Dependencies

4. Run

🤝🏼 Contributions

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages