Robust Audio Adversarial Example for a Physical Attack

Hiromu Yakura, Jun Sakuma

University of Tsukuba / RIKEN Center for Advanced Intelligence Project

Illustration of the proposed attack. Carlini et al. (2018) assumed that adversarial examples are provided directly to the recognition model. We propose a method that targets an over-the-air condition, which leads to a real threat.

The success of deep learning in recent years has raised concerns about adversarial examples, which allow attackers to force deep neural networks to output a specified target. Although a method by which to generate audio adversarial examples targeting a state-of-the-art speech recognition model has been proposed, this method cannot fool the model in the case of playing over the air, and thus, the threat was considered to be limited. In this paper, we propose a method to generate adversarial examples that can attack even when playing over the air in the physical world by simulating transformation caused by playback or recording and incorporating them in the generation process. Evaluation and a listening experiment demonstrated that audio adversarial examples generated by the proposed method may become a real threat.

Generated Samples

We played and recorded each adversarial example 10 times using JBL CLIP2 and Sony ECM-PCV80U and evaluated transcriptions by the pretrained model of DeepSpeech.

Target phrase SNR Success rate (10 trials) Average edit distance Audio
Original audio N/A N/A N/A N/A
(A) "hello world" 9.3 dB 100% 0.0
(B) "open the door" -2.7 dB 100% 0.0
(C) "ok google" 7.5 dB 0% 4.2

Comparison with conventional methods

Attack on recurrent models Attack over the air Audio
Carlini et al. (2018)
Yuan et al. (2018)
Proposed (above)


Hiromu Yakura, Jun Sakuma.
Robust Audio Adversarial Example for a Physical Attack.
In arXiv:1810.11793, 2018.


This study was supported by JST CREST JPMJCR1302 and KAKENHI 16H02864.