版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:State Key Laboratory of Media Convergence and Communication Beijing China School of Artificial Intelligence University of Chinese Academy of Sciences Beijing China Institute of Automation Chinese Academy of Sciences Beijing China Beijing National Research Center for Information Science and Technology Tsinghua University Beijing China Department of Automation Tsinghua University Beijing China School of Data Science and Intelligent Media Communication University of China Beijing China School of Cyberspace science and Technology Beijing Institute of Technology Beijing China
出 版 物:《IEEE Transactions on Audio, Speech and Language Processing》
年 卷 期:2025年第33卷
页 面:386-400页
基 金:National Natural Science Foundation of China
主 题:Deepfakes Codecs Vocoders Acoustics Training Codes Feature extraction Vector quantization Speech enhancement Speech coding
摘 要:With the proliferation of Audio Language Model (ALM) based deepfake audio, there is an urgent need for generalized detection methods. ALM-based deepfake audio currently exhibits widespread, high deception, and type versatility, posing a significant challenge to current audio deepfake detection (ADD) models trained solely on vocoded data. To effectively detect ALM-based deepfake audio, we focus on the mechanism of the ALM-based audio generation method, the conversion from neural codec to waveform. We initially constructed the Codecfake dataset, an open-source, large-scale collection comprising over 1 million audio samples in both English and Chinese, focus on ALM-based audio detection. As countermeasure, to achieve universal detection of deepfake audio and tackle domain ascent bias issue of original sharpness aware minimization (SAM), we propose the CSAM strategy to learn a domain balanced and generalized minima. In our experiments, we first demonstrate that ADD model training with the Codecfake dataset can effectively detects ALM-based audio. Furthermore, our proposed generalization countermeasure yields the lowest average equal error rate (EER) of 0.616% across all test conditions compared to baseline models. The dataset and associated code are available online.