Audio replay spoofing attack detection using deep learning feature and long-short-term memory recurrent neural network

Conference: AIIPCC 2021 - The Second International Conference on Artificial Intelligence, Information Processing and Cloud Computing
06/26/2021 - 06/28/2021 at Hangzhou, China

Proceedings: AIIPCC 2021

Pages: 5Language: englishTyp: PDF

Authors:
Huang, Lian; Zhao, Jinhong (Guangdong Mechanical & Electronical college of Technology, Guangzhou, China)

Abstract:
In recent years, with the advancement of technology, the security risks of automatic speaker verification (ASV) systems have also been further increased. Notably, the audio replay spoofing attack is the easiest way to perform and hard to detect. Therefore, we explore a new method to detect audio replay spoofing attacks using deep learning feature and long-short-term memory (LSTM) recurrent neural networks. At first, we investigate whether combining raw wave data and other features such as constant Q cepstral coefficients (CQCC) can improve attack detection accuracy. Second, we use a vanilla convolutional neural network to obtain deep learning features from raw wave data. Third, CQCC features are extracted from the same wave. Finally, we use the last two types of features as input to the LSTM network and output genuine or spoof audio. To validate our proposed approach, we use two widely used datasets in our experiment: BTAS 2016 and ASVspoof 2017. The results show that our model in this paper significantly outperforms the baseline system in the evaluations of individual databases. We achieve a promising equal error rate (EER) of 7.73% on the ASVspoof 2017 and 0.79% on the BTAS 2016.