咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >The Codecfake Dataset and Coun... 收藏
IEEE Transactions on Audio, Speech and Language Processing

The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio

作     者:Yuankun Xie Yi Lu Ruibo Fu Zhengqi Wen Zhiyong Wang Jianhua Tao Xin Qi Xiaopeng Wang Yukun Liu Haonan Cheng Long Ye Yi Sun 

作者机构:State Key Laboratory of Media Convergence and Communication Beijing China School of Artificial Intelligence University of Chinese Academy of Sciences Beijing China Institute of Automation Chinese Academy of Sciences Beijing China Beijing National Research Center for Information Science and Technology Tsinghua University Beijing China Department of Automation Tsinghua University Beijing China School of Data Science and Intelligent Media Communication University of China Beijing China School of Cyberspace science and Technology Beijing Institute of Technology Beijing China 

出 版 物:《IEEE Transactions on Audio, Speech and Language Processing》 

年 卷 期:2025年第33卷

页      面:386-400页

基  金:National Natural Science Foundation of China 

主  题:Deepfakes Codecs Vocoders Acoustics Training Codes Feature extraction Vector quantization Speech enhancement Speech coding 

摘      要:With the proliferation of Audio Language Model (ALM) based deepfake audio, there is an urgent need for generalized detection methods. ALM-based deepfake audio currently exhibits widespread, high deception, and type versatility, posing a significant challenge to current audio deepfake detection (ADD) models trained solely on vocoded data. To effectively detect ALM-based deepfake audio, we focus on the mechanism of the ALM-based audio generation method, the conversion from neural codec to waveform. We initially constructed the Codecfake dataset, an open-source, large-scale collection comprising over 1 million audio samples in both English and Chinese, focus on ALM-based audio detection. As countermeasure, to achieve universal detection of deepfake audio and tackle domain ascent bias issue of original sharpness aware minimization (SAM), we propose the CSAM strategy to learn a domain balanced and generalized minima. In our experiments, we first demonstrate that ADD model training with the Codecfake dataset can effectively detects ALM-based audio. Furthermore, our proposed generalization countermeasure yields the lowest average equal error rate (EER) of 0.616% across all test conditions compared to baseline models. The dataset and associated code are available online.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分