Black-box Adversarial Attacks: From Theory to Practice



Yinpeng Dong (Tsinghua University)

Yinpeng Dong received his BE and PhD degrees from Tsinghua University in 2017 and 2022, advised by Prof. Jun Zhu. His research interest includes machine learning, deep learning, and especially the adversarial robustness of deep learning. Yinpeng has published over 20 papers in the prestigious conferences and journals, including TPAMI, IJCV, CVPR, NeurIPS. He served as reviewers for ICML, NeurIPS, ICLR, CVPR, ICCV, TPAMI, etc. He received several awards, including China National Scholarship, Tsinghua Future PhD Fellowship, Microsoft Research Asia Fellowship, and Baidu Fellowship. He won all the three tasks in NeurIPS 2017 adversarial attacks and defenses as the team leader for the two attack tasks.



Short Abstract: Adversarial machine learning is a new gamut of technologies that aim to study vulnerability of ML approaches and detect the malicious behaviors in adversarial environments. Black-box adversarial attacks have attracted a lot of attention since they can identify the vulnerability of deep learning models without access to the model gradients. Previous methods attempted to approximate the true gradient either by using the transfer gradient of a surrogate white-box model or based on the feedback of model queries. This talk will introduce the background and recent progress of black-box adversarial attacks. A gradient estimation framework is presented, which allows to conduct theoretical analyses of existing black-box attack methods. Then, various algorithms to improve black-box adversarial attacks will be introduced, including the momentum iterative method, prior-guided random gradient free method, etc.