This website contains additional information about our paper “Post-breach Recovery: Protection against White-box Adversarial Examples for Leaked DNN Models” from SANDLAB at University of Chicago. In this paper, we consider the question: as practitioners continue to invest significant amounts of time and capital into building large complex DNN models (i.e. data acquisition/curation and model training), what can they do to avoid losing their investment following an event that leaks their model to attackers (e.g. a server breach)? We refer to this as the post-breach recovery problem for DNN services.
![]() Shawn Shan |
![]() Wenxin Ding |
![]() Emily Wenger |
![]() Heather Zheng |
![]() Ben Y. Zhao |
For any inquery on this project, please checkout the Github issues. If they cannot answer your question, please email Shawn at shawnshan@cs.uchicago.edu
Server breaches are an unfortunate reality on today's Internet. In the context of deep neural network (DNN) models, they are particularly harmful, because a leaked model gives an attacker “whitebox” access to generate adversarial examples, a threat model that has no practical robust defenses. For practitioners who have invested years and millions into proprietary DNNs, e.g. medical imaging, this seems like an inevitable disaster looming on the horizon.
In this paper, we consider the problem of post-breach recovery for DNN models. We propose Neo, a new system that creates new versions of leaked models, alongside an inference time filter that detects and removes adversarial examples generated on previously leaked models. The classification surfaces of different model versions are slightly offset (by introducing hidden distributions), and Neo detects the overfitting of attacks to the leaked model used in its generation. We show that across a variety of tasks and attack methods, Neo is able to filter out attacks from leaked models with very high accuracy, and provides strong protection (7–10 recoveries) against attackers who repeatedly breach the server. Neo performs well against a variety of strong adaptive attacks, dropping slightly in number of breaches recoverable, and demonstrates potential as a complement to DNN defenses in the wild.

Figure 1: An overview of our recovery system. (a) Recovery from one model breach: the attacker breaches the server and gains access to model version 1 (𝐹1). Post-leak, the recovery system retires 𝐹1 and replaces it with model version 2 (𝐹2) paired with a recovery-specific defense 𝐷2. Together, 𝐹2 and 𝐷2 can resist adversarial examples generated using 𝐹1. (b) Recovery from multiple model breaches: upon the 𝑖𝑡h server breach that leaks 𝐹𝑖 and 𝐷𝑖 , the recovery system replaces them with a new version 𝐹𝑖+1 and 𝐷𝑖+1. This new pair resists adversarial examples generated using any subset of the previous versions (1 to 𝑖).
If you want to find out more about this project, you can read our publicly available paper and presentation slides. For readers who want to extend our work, we also provide source code on Github.
To cite our paper, you can use the following BibTex entry:
@inproceedings{shan2022post,
title={Post-breach Recovery: Protection against White-box Adversarial Examples for Leaked DNN Models},
author={Shan, Shawn and Ding, Wenxin and Wenger, Emily and Zheng, Haitao and Zhao, Ben Y},
journal={Proc. of CCS},
year={2022}
}