Currently, convolution-based models for keyword spotting focus predominantly on research conducted in clean speech environments. However, their recognition accuracy is lower in low signal-to-noise ratio conditions. To...
详细信息
Automated speaker verification systems face a higher risk of replay attacks in practice. However, the existing studies face the problem of limited detection capabilities and insufficient use of shallow fine-grained in...
详细信息
Encryption is a valid means to safeguard the safety of images, and for color images, encryption should be performed considering the intrinsic correlation between R, G, and B components. In this paper, we propose an im...
详细信息
As a representation of speech information, acoustic word embedding can enable query-by-example keyword search with low-resource speech data. An acoustic word embedding model with Transformer encoder and multivariate j...
详细信息
Cross-modality person re-identification between visible and infrared images has become a research hotspot in the image retrieval field due to its potential application scenarios. Existing research usually designs loss...
详细信息
DNA triple helix structure, as a highly specific gene targeting tool, enable gene regulation by precisely identifying and binding to target DNA sequences. However, the limits of design quality and efficiency affect th...
详细信息
Deoxyribonucleic acid (DNA) has become an ideal medium for long-term storage and retrieval due to its extremely high storage density and long-term stability. But access efficiency is an existing bottleneck in DNA stor...
详细信息
In order to improve the accuracy of speech emotion recognition, this paper proposes a speech emotion recognition method based on the channel attention mechanism. Firstly, Mel Frequency Ceptral Coefficient(MFCC), speec...
详细信息
作者:
Yan, JingZhou, ShihuaDalian University
Key Laboratory of Advanced Design and Intelligent Computing Ministry of Education School of Software Engineering Dalian China
Extraction summarization and abstraction summarization have advantages and disadvantages, so how to better combine these two ways has become a difficult problem. To address this challenge, this paper proposed a new fu...
详细信息
Automated speaker verification systems face a higher risk of replay attacks in practice. However, the existing studies face the problem of limited detection capabilities and insufficient use of shallow fine-grained in...
详细信息
ISBN:
(数字)9798350376548
ISBN:
(纸本)9798350376555
Automated speaker verification systems face a higher risk of replay attacks in practice. However, the existing studies face the problem of limited detection capabilities and insufficient use of shallow fine-grained information. To address these issues, we propose the cross-stage mutual distillation(CS-MD) framework, which involves two models learning from a deep network output of each other in different stages of training. This mutual learning approach enhances the ability of shallow networks to capture fine-grained speech information. Additionally, we use an attentional feature fusion module to integrate shallow information more effectively. The multi-scale attention mechanisms in this module can combine local and global speech features while preserving detailed information. Experimental results on the ASVspoof 2019 physical access dataset demonstrate that our proposed method outperforms state-of-the-art methods in terms of EER and min t-DCF metrics, validating the effectiveness of our CS-MD framework.
暂无评论