Blacklight: Scalable Defense for Neural Networks against Query-Based Black-Box Attacks
| Introduction | Publication | Source Code | Cite Our Work |
Deep learning systems are known to be vulnerable to adversarial examples. In particular, query-based black-box attacks do not require knowledge of the deep learning model, but can compute adversarial examples over the network by submitting queries and inspecting returns. Recent work largely improves the efficiency of those attacks, demonstrating their practicality on today’s ML-as-a-service platforms.
We propose Blacklight, a new defense against query-based black-box adversarial attacks. Blacklight is driven by a fundamental insight: to compute adversarial examples, these attacks perform iterative optimization over the network, producing queries highly similar in the input space. Thus Blacklight detects query-based black-box attacks by detecting highly similar queries, using an efficient similarity engine operating on probabilistic content fingerprints. We evaluate Blacklight against eight state-of-the-art attacks, across a variety of models and image classification tasks. Blacklight identifies them all, often after only a handful of queries. By rejecting all detected queries, Blacklight prevents any attack from completing, even when persistent attackers continue to submit queries after banned accounts or rejected queries. Blacklight is also robust against several powerful countermeasures, including an optimal black-box attack that approximates white-box attacks in efficiency. Finally, we illustrate how Blacklight generalizes to other domains like text classification.
Introduction
Attack Scenario for black-box adversarial attacks.
The vulnerability of deep neural networks (DNNs) to a variety of adversarial examples is well documented. Adversarial attacks can be broadly divided by whether they assume white-box or black-box threat models. In the white-box setting, the attacker has total access to the target model, including its internal architecture, weights and parameters. Given a benign input, the attacker can directly compute adversarial examples as an optimization problem. In contrast, an attacker in the black-box setting can only interact with the model by submitting queries and inspecting returns. Black-box attacks assume a more realistic threat model, where attackers interact with models via a query interface such as ML-as-a-service platforms (See Figure 1).
Examples of attack query sequence , produced by three black-box attacks (NES, Boundary, HSJA). While these attacks generate queries differently, the resulting query sequences all contain some highly similar images.
A common and effective attack is query-based black-box attacks. An attacker queries the target model repeatedly, often remotely over a network, to implement iterative optimization required to compute adversarial examples. Specifically, based on the past query results, the attacker iteratively perturbs the current query to produce the next query, hoping to converge to a successful adversarial example. The fundamental insight driving our work is that, in order to compute adversarial examples, query-based black-box attacks perform iterative optimization over the network, an incremental process that produces queries highly similar in the input space. Figure 2 shows some visual examples from attack query sequences generated by three attacks.
With this in mind, we propose Blacklight, a novel defense that detects query-based black-box attacks using an efficient content-similarity engine. Blacklight detects the highly similar queries as part of the iterative optimization process in the attack, since benign queries rarely share this level of similarity. Blacklight’s query detection is account oblivious, thus is effective no matter how many accounts an attacker uses to submit queries.
Blacklight is highly scalable and lightweight. It detects highly similar queries generated by iterative optimization using probabilistic fingerprints, a compact hash representation computed for each input query. We design these fingerprints such that queries highly similar in the input space will have large overlap in their fingerprints. As such, Blacklight identifies an (incoming) query as part of a query-based black-box attack, if its fingerprint matches any prior fingerprint by more than a threshold. Since we use secure one-way hashes to compute fingerprints, even an attacker aware of our algorithm cannot optimize the content perturbation of a query to disrupt its fingerprint and avoid detection.
Blacklight computes a small set of hash entries(as its probabilistic fingerprint). Blacklight detects attack images hidden inside a large stream of benign images by comparing and detecting highly similar fingerprints.
Figure 3 illustrates Blacklight’s attack detection process. For an incoming query x, Blacklight extracts its probabilistic fingerprint and stores it in the database. Blacklight runs an efficient hash match algorithm to detect overlaps between x’s fingerprint and those in the database. Upon detecting sufficient overlap between x and an existing fingerprint y, it flags (x, y) as a pair of attack queries.
We experimentally evaluate Blacklight against eight SOTA black-box attacks on multiple datasets and image classification models. Not only does Blacklight detect all eight attacks, but it does so quickly, often after only a handful of queries, for attacks that would require several thousands of queries to succeed. The details about evaluation can be found in our paper.
Publication
Blacklight: Scalable Defense for Neural Networks against Query-Based Black-Box Attacks. Huiying Li, Shawn Shan, Emily Wenger, Jiayun Zhang, Haitao Zheng, Ben Y. Zhao. 2022. In Proceedings of 31th USENIX Security Symposium (USENIX Security'2022). Download paper here (Download extended version here)
Source Code
The source code for Blacklight is available on Github. (code here) To generate adversarial attacks in the paper, please fine the attack as following:
Cite Our Work
@inproceedings {281294,
author = {Huiying Li and Shawn Shan and Emily Wenger and Jiayun Zhang and Haitao Zheng and Ben Y. Zhao},
title = {Blacklight: Scalable Defense for Neural Networks against {Query-Based} {Black-Box} Attacks},
booktitle = {31st USENIX Security Symposium (USENIX Security 22)},
year = {2022},
isbn = {978-1-939133-31-1},
address = {Boston, MA},
pages = {2117--2134},
url = {https://www.usenix.org/conference/usenixsecurity22/presentation/li-huiying},
publisher = {USENIX Association},
month = {August},
}