Perturbation Augmentation for Adversarial Training with Diverse Attacks

Duygu Serbes; İnci M. Baytaş

doi:10.54287/gujsa.1458880

Research Article

Perturbation Augmentation for Adversarial Training with Diverse Attacks

Year 2024, Volume: 11 Issue: 2, 274 - 288, 29.06.2024

Duygu Serbes , İnci M. Baytaş

https://doi.org/10.54287/gujsa.1458880

Abstract

Adversarial Training (AT) aims to alleviate the vulnerability of deep neural networks to adversarial perturbations. However, the AT techniques struggle to maintain the performance on natural samples while improving the deep model’s robustness. The absence of perturbation diversity in generated during the adversarial training degrades the generalizability of the robust models, causing overfitting to particular perturbations and a decrease in natural performance. This study proposes an adversarial training framework that augments adversarial directions from a single-step attack to address the trade-off between robustness and generalization. Inspired by feature scattering adversarial training, the proposed framework computes a principal adversarial direction with a single-step attack that finds a perturbation disrupting the inter-sample relationships in the mini-batch during adversarial training. The principal direction obtained at each iteration is augmented by sampling new adversarial directions within a region spanning 45 degrees from the principal adversarial direction. The proposed adversarial training approach does not require extra backpropagation steps in adversarial direction augmentation. Therefore, generalization of the robust model is improved without posing an additional burden on the feature scattering adversarial training. Experiments on CIFAR-10, CIFAR-100, SVHN, Tiny-ImageNet, and The German Traffic Sign Recognition Benchmark consistently improve the accuracy on adversarial with an almost pristine natural performance.

Keywords

Adversarial Attacks, Adversarial Training, Adversarial Robustness, Deep Neural Networks

References

Alzantot, M., Sharma, Y., Elgohary, A., Ho, B., Srivastava, M., & Chang, K. (2018). Generating natural language adversarial examples. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, (pp. 2890–2896).
Andriushchenko, M., & Flammarion, N. (2020). Understanding and improving fast adversarial training. In: Proceedings of Advances in Neural Information Processing Systems, 33, (pp. 16048-16059).
Athalye, A., Carlini, N., & Wagner, D. (2018, July). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In: International Conference on Machine Learning (pp. 274-283).
Baytaş, İ. M., & Deb, D. (2023). Robustness-via-synthesis: Robust training with generative adversarial perturbations. Neurocomputing, 516, 49-60. https://doi.org/10.1016/j.neucom.2022.10.034
Carlini, N., Mishra, P., Vaidya, T., Zhang, Y., Sherr, M., Shields, C., ... & Zhou, W. (2016). Hidden voice commands. In: 25th USENIX security symposium (USENIX security 16), (pp. 513-530).
Carlini, N., & Wagner, D. (2017, May). Towards evaluating the robustness of neural networks. In: Proceedings of the IEEE Symposium on Security and Privacy. (pp. 39-57).
Cuturi, M. (2013). Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26.
Etmann, C., Lunz, S., Maass, P., & Schönlieb, C. B. (2019). On the connection between adversarial robustness and saliency map interpretability. In: Proceedings of the 36th International Conference on Machine Learning, 97, (pp. 1823-1832).
Finlayson, S. G., Bowers, J. D., Ito, J., Zittrain, J. L., Beam, A. L., & Kohane, I. S. (2019). Adversarial attacks on medical machine learning. Science, 363(6433), 1287-1289. https://doi.org/10.1126%2Fscience.aaw4399
Fursov, I., Morozov, M., Kaploukhaya, N., Kovtun, E., Rivera-Castro, R., Gusev, G., ... & Burnaev, E. (2021). Adversarial attacks on deep models for financial transaction records. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, (pp. 2868-2878).
Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing adversarial examples. In: Proceedings of the 3th International Conference on Learning Representations. https://arxiv.org/abs/1412.6572
Houben, S., Stallkamp, J., Salmen, J., Schlipsing, M., & Igel, C. (2013, August). Detection of traffic signs in real-world images: The German Traffic Sign Detection Benchmark. In: Proceedings of The 2013 International Joint Conference on Neural Networks. (pp. 1-8).
Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., & Madry, A. (2019). Adversarial examples are not bugs, they are features. In: Proceedings of Advances in Neural Information Processing Systems, 32.
Jang, Y., Zhao, T., Hong, S., & Lee, H. (2019). Adversarial defense via learning to generate diverse attacks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 2740-2749).
Kim, H., Lee, W., & Lee, J. (2021). Understanding catastrophic overfitting in single-step adversarial training. In: Proceedings of the AAAI Conference on Artificial Intelligence, (pp. 8119-8127).
Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. University of Toronto.
Kurakin, A., Goodfellow, I. J. & Bengio, S. (2017). Adversarial machine learning at scale. In: Proceedings of the 5th International Conference on Learning Representations. https://arxiv.org/abs/1611.01236
Le, Y., & Yang, X. (2015). Tiny imagenet visual recognition challenge. CS 231N, 7(7), 3.
Lee, S., Lee, H., & Yoon, S. (2020). Adversarial vertex mixup: Toward better adversarially robust generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 272-281).
Madry, A., Makelov, A., Schmidt, L., Tsipras, D. & Vladu, A. (2018). Towards deep learning models resistant to adversarial attacks. In: Proceedings of the International Conference on Learning Representations. https://arxiv.org/abs/1706.06083
Moosavi-Dezfooli, S. M., Fawzi, A., & Frossard, P. (2016). Deepfool: A Simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 2574-2582).
Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., & Swami, A. (2017, April). Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. (pp. 506-519).
Schmidt, L., Santurkar, S., Tsipras, D., Talwar, K., & Madry, A. (2018). Adversarially robust generalization requires more data. In: Proceedings of Advances in Neural Information Processing Systems, (pp. 5019-5031).
Shafahi, A., Najibi, M., Ghiasi, M. A., Xu, Z., Dickerson, J., Studer, C., ... & Goldstein, T. (2019). Adversarial training for free!. In: Proceedings of Advances in Neural Information Processing Systems, 32.
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. In: Proceedings of International Conference on Learning Representations. http://arxiv.org/abs/1312.6199
Tramer, F., Kurakin, A., Papernot, N., Goodfellow, I. J., Boneh, D. & McDaniel, P. D. (2018). Ensemble adversarial training: Attacks and defenses. In: Proceedings of the 6th International Conference on Learning Representations. https://arxiv.org/abs/1705.07204
Verma, V., Lamb, A., Beckham, C., Najafi, A., Mitliagkas, I., Lopez-Paz, D., & Bengio, Y. (2019, May). Manifold mixup: Better representations by interpolating hidden states. In: International Conference on Machine Learning, (pp. 6438-6447).
Villani, C. (2009). Optimal transport: old and new (Vol. 338, p. 23). Berlin: Springer.
Wang, J., & Zhang, H. (2019). Bilateral adversarial training: Towards fast training of more robust models against adversarial attacks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 6629-6638).
Wang, K., Li, F., Chen, C. M., Hassan, M. M., Long, J., & Kumar, N. (2021). Interpreting adversarial examples and robustness for deep learning-based auto-driving systems. IEEE Transactions on Intelligent Transportation Systems, 23(7), 9755-9764. https://doi.org/10.1109/TITS.2021.3108520
Wang, Y., Zou, D., Yi, J., Bailey, J., Ma, X., & Gu, Q. (2019, September). Improving adversarial robustness requires revisiting misclassified examples. In: Proceedings of International Conference on Learning Representations. https://openreview.net/forum?id=rklOg6EFwS
Wang, Z., Pang, T., Du, C., Lin, M., Liu, W., & Yan, S. (2023). Better diffusion models further improve adversarial training. In: Proceedings of the 40th International Conference on Machine Learning. (202:36246-36263) https://proceedings.mlr.press/v202/wang23ad.html
Wong, E., & Kolter, Z. (2018, July). Provable defenses against adversarial examples via the convex outer adversarial polytope. In: Proceeding of International Conference on Machine Learning, (pp. 5286-5295).
Wong, E., Rice, L., Kolter, J. Z. (2020). Fast is better than free: Revisiting adversarial training. In: Proceedings of the 8th International Conference on Learning Representations. https://arxiv.org/abs/2001.03994
Xie, Y., Wang, X., Wang, R., & Zha, H. (2020, August). A fast proximal point method for computing exact Wasserstein distance. In: Proceedings of Uncertainty in Artificial Intelligence (pp. 433-453).
Yuval, N., Wang, T., Coates, A., Bissacco, A., Wu, B., & Ng, A. Y. (2011). Reading digits in natural images with unsupervised feature learning. In: Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning.
Zagoruyko, S., & Komodakis, N. (2016) Wide residual networks. In: Proceedings of the British Machine Vision Conference. (pp. 1-12).
Zhang, D., Zhang, T., Lu, Y., Zhu, Z., & Dong, B. (2019a). You only propagate once: Accelerating adversarial training via maximal principle. In: Proceedings of Advances in Neural Information Processing Systems, 32.
Zhang, H., Cisse, M., Dauphin, Y. N., Lopez-Paz, D. (2018). Mixup: Beyond empirical risk minimization. In: Proceedings of the 6th International Conference on Learning Representations. https://arxiv.org/abs/1710.09412
Zhang, H., & Xu, W. (2020). Adversarial interpolation training: A simple approach for improving model robustness. https://openreview.net/forum
Zhang, H., Yu, Y., Jiao, J., Xing, E., El Ghaoui, L., & Jordan, M. (2019b). Theoretically principled trade-off between robustness and accuracy. In: Proceedings of the International Conference on Machine Learning, (pp. 7472-7482).
Zhang, H., & Wang, J. (2019). Defense against adversarial attacks using feature scattering-based adversarial training. In: Proceedings of the Advances in Neural Information Processing Systems, 32.
Zhang, H. (2019). Feature Scattering Adversarial Training (NeurIPS 2019) (Accessed: 24/03/2024) https://github.com/Haichao-Zhang/FeatureScatter

Year 2024, Volume: 11 Issue: 2, 274 - 288, 29.06.2024

Duygu Serbes , İnci M. Baytaş

https://doi.org/10.54287/gujsa.1458880

Abstract

References

Alzantot, M., Sharma, Y., Elgohary, A., Ho, B., Srivastava, M., & Chang, K. (2018). Generating natural language adversarial examples. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, (pp. 2890–2896).
Andriushchenko, M., & Flammarion, N. (2020). Understanding and improving fast adversarial training. In: Proceedings of Advances in Neural Information Processing Systems, 33, (pp. 16048-16059).
Athalye, A., Carlini, N., & Wagner, D. (2018, July). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In: International Conference on Machine Learning (pp. 274-283).
Baytaş, İ. M., & Deb, D. (2023). Robustness-via-synthesis: Robust training with generative adversarial perturbations. Neurocomputing, 516, 49-60. https://doi.org/10.1016/j.neucom.2022.10.034
Carlini, N., Mishra, P., Vaidya, T., Zhang, Y., Sherr, M., Shields, C., ... & Zhou, W. (2016). Hidden voice commands. In: 25th USENIX security symposium (USENIX security 16), (pp. 513-530).
Carlini, N., & Wagner, D. (2017, May). Towards evaluating the robustness of neural networks. In: Proceedings of the IEEE Symposium on Security and Privacy. (pp. 39-57).
Cuturi, M. (2013). Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26.
Etmann, C., Lunz, S., Maass, P., & Schönlieb, C. B. (2019). On the connection between adversarial robustness and saliency map interpretability. In: Proceedings of the 36th International Conference on Machine Learning, 97, (pp. 1823-1832).
Finlayson, S. G., Bowers, J. D., Ito, J., Zittrain, J. L., Beam, A. L., & Kohane, I. S. (2019). Adversarial attacks on medical machine learning. Science, 363(6433), 1287-1289. https://doi.org/10.1126%2Fscience.aaw4399
Fursov, I., Morozov, M., Kaploukhaya, N., Kovtun, E., Rivera-Castro, R., Gusev, G., ... & Burnaev, E. (2021). Adversarial attacks on deep models for financial transaction records. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, (pp. 2868-2878).
Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing adversarial examples. In: Proceedings of the 3th International Conference on Learning Representations. https://arxiv.org/abs/1412.6572
Houben, S., Stallkamp, J., Salmen, J., Schlipsing, M., & Igel, C. (2013, August). Detection of traffic signs in real-world images: The German Traffic Sign Detection Benchmark. In: Proceedings of The 2013 International Joint Conference on Neural Networks. (pp. 1-8).
Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., & Madry, A. (2019). Adversarial examples are not bugs, they are features. In: Proceedings of Advances in Neural Information Processing Systems, 32.
Jang, Y., Zhao, T., Hong, S., & Lee, H. (2019). Adversarial defense via learning to generate diverse attacks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 2740-2749).
Kim, H., Lee, W., & Lee, J. (2021). Understanding catastrophic overfitting in single-step adversarial training. In: Proceedings of the AAAI Conference on Artificial Intelligence, (pp. 8119-8127).
Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. University of Toronto.
Kurakin, A., Goodfellow, I. J. & Bengio, S. (2017). Adversarial machine learning at scale. In: Proceedings of the 5th International Conference on Learning Representations. https://arxiv.org/abs/1611.01236
Le, Y., & Yang, X. (2015). Tiny imagenet visual recognition challenge. CS 231N, 7(7), 3.
Lee, S., Lee, H., & Yoon, S. (2020). Adversarial vertex mixup: Toward better adversarially robust generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 272-281).
Madry, A., Makelov, A., Schmidt, L., Tsipras, D. & Vladu, A. (2018). Towards deep learning models resistant to adversarial attacks. In: Proceedings of the International Conference on Learning Representations. https://arxiv.org/abs/1706.06083
Moosavi-Dezfooli, S. M., Fawzi, A., & Frossard, P. (2016). Deepfool: A Simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 2574-2582).
Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., & Swami, A. (2017, April). Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. (pp. 506-519).
Schmidt, L., Santurkar, S., Tsipras, D., Talwar, K., & Madry, A. (2018). Adversarially robust generalization requires more data. In: Proceedings of Advances in Neural Information Processing Systems, (pp. 5019-5031).
Shafahi, A., Najibi, M., Ghiasi, M. A., Xu, Z., Dickerson, J., Studer, C., ... & Goldstein, T. (2019). Adversarial training for free!. In: Proceedings of Advances in Neural Information Processing Systems, 32.
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. In: Proceedings of International Conference on Learning Representations. http://arxiv.org/abs/1312.6199
Tramer, F., Kurakin, A., Papernot, N., Goodfellow, I. J., Boneh, D. & McDaniel, P. D. (2018). Ensemble adversarial training: Attacks and defenses. In: Proceedings of the 6th International Conference on Learning Representations. https://arxiv.org/abs/1705.07204
Verma, V., Lamb, A., Beckham, C., Najafi, A., Mitliagkas, I., Lopez-Paz, D., & Bengio, Y. (2019, May). Manifold mixup: Better representations by interpolating hidden states. In: International Conference on Machine Learning, (pp. 6438-6447).
Villani, C. (2009). Optimal transport: old and new (Vol. 338, p. 23). Berlin: Springer.
Wang, J., & Zhang, H. (2019). Bilateral adversarial training: Towards fast training of more robust models against adversarial attacks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 6629-6638).
Wang, K., Li, F., Chen, C. M., Hassan, M. M., Long, J., & Kumar, N. (2021). Interpreting adversarial examples and robustness for deep learning-based auto-driving systems. IEEE Transactions on Intelligent Transportation Systems, 23(7), 9755-9764. https://doi.org/10.1109/TITS.2021.3108520
Wang, Y., Zou, D., Yi, J., Bailey, J., Ma, X., & Gu, Q. (2019, September). Improving adversarial robustness requires revisiting misclassified examples. In: Proceedings of International Conference on Learning Representations. https://openreview.net/forum?id=rklOg6EFwS
Wang, Z., Pang, T., Du, C., Lin, M., Liu, W., & Yan, S. (2023). Better diffusion models further improve adversarial training. In: Proceedings of the 40th International Conference on Machine Learning. (202:36246-36263) https://proceedings.mlr.press/v202/wang23ad.html
Wong, E., & Kolter, Z. (2018, July). Provable defenses against adversarial examples via the convex outer adversarial polytope. In: Proceeding of International Conference on Machine Learning, (pp. 5286-5295).
Wong, E., Rice, L., Kolter, J. Z. (2020). Fast is better than free: Revisiting adversarial training. In: Proceedings of the 8th International Conference on Learning Representations. https://arxiv.org/abs/2001.03994
Xie, Y., Wang, X., Wang, R., & Zha, H. (2020, August). A fast proximal point method for computing exact Wasserstein distance. In: Proceedings of Uncertainty in Artificial Intelligence (pp. 433-453).
Yuval, N., Wang, T., Coates, A., Bissacco, A., Wu, B., & Ng, A. Y. (2011). Reading digits in natural images with unsupervised feature learning. In: Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning.
Zagoruyko, S., & Komodakis, N. (2016) Wide residual networks. In: Proceedings of the British Machine Vision Conference. (pp. 1-12).
Zhang, D., Zhang, T., Lu, Y., Zhu, Z., & Dong, B. (2019a). You only propagate once: Accelerating adversarial training via maximal principle. In: Proceedings of Advances in Neural Information Processing Systems, 32.
Zhang, H., Cisse, M., Dauphin, Y. N., Lopez-Paz, D. (2018). Mixup: Beyond empirical risk minimization. In: Proceedings of the 6th International Conference on Learning Representations. https://arxiv.org/abs/1710.09412
Zhang, H., & Xu, W. (2020). Adversarial interpolation training: A simple approach for improving model robustness. https://openreview.net/forum
Zhang, H., Yu, Y., Jiao, J., Xing, E., El Ghaoui, L., & Jordan, M. (2019b). Theoretically principled trade-off between robustness and accuracy. In: Proceedings of the International Conference on Machine Learning, (pp. 7472-7482).
Zhang, H., & Wang, J. (2019). Defense against adversarial attacks using feature scattering-based adversarial training. In: Proceedings of the Advances in Neural Information Processing Systems, 32.
Zhang, H. (2019). Feature Scattering Adversarial Training (NeurIPS 2019) (Accessed: 24/03/2024) https://github.com/Haichao-Zhang/FeatureScatter

There are 43 citations in total.

Details

Primary Language	English
Subjects	Deep Learning
Journal Section	Computer Engineering
Authors	Duygu Serbes 0000-0003-1067-866X İnci M. Baytaş 0000-0003-4765-2615
Early Pub Date	June 4, 2024
Publication Date	June 29, 2024
Submission Date	March 26, 2024
Acceptance Date	May 21, 2024
Published in Issue	Year 2024 Volume: 11 Issue: 2

Cite

APA	Serbes, D., & Baytaş, İ. M. (2024). Perturbation Augmentation for Adversarial Training with Diverse Attacks. Gazi University Journal of Science Part A: Engineering and Innovation, 11(2), 274-288. https://doi.org/10.54287/gujsa.1458880

Download Cover Image

Article Files

Full Text