2024 IEEE Symposium on Security and Privacy (SP)
Download PDF

Abstract

Self-supervised learning is widely used in various domains for building foundation models. It has been demonstrated to achieve state-of-the-art performance in a range of tasks. In the computer vision domain, self-supervised learning is utilized to generate an image feature extractor, called an encoder, such that a variety of downstream tasks can build classifiers on top of it with limited data and resources. Despite the impressive performance of self-supervised learning, it is susceptible to backdoor attacks, where an attacker injects a backdoor into its unlabeled training data. A downstream classifier built on the backdoored encoder will misclassify any inputs inserted with the trigger to a target label. Existing backdoor attacks in self-supervised learning possess a key out-of-distribution property, where the poisoned samples significantly differ from the clean data in the feature space. The poisoned distribution is also exceptionally concentrated, inducing high pairwise similarity among poisoned samples. As a result, these attacks can be detected by state-of-the-art defense techniques. We propose a novel distribution preserving attack, which transforms the poisoned samples into in-distribution data by reducing their distributional distance to the clean data. We also distribute the poisoned data to a wider region in the target-class distribution, mitigating the concentration problem. Our evaluation of five popular datasets demonstrates that our attack, Drupe, significantly reduces the distributional distance and concentration of the poisoned distribution compared to existing attacks. Drupe successfully evades two state-of-the-art backdoor defenses in self-supervised learning and is robust against knowledgeable defenders.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles