Electroencephalogram (EEG) signals provide an important pathway to reflect brain activations, from which auditory attention clues of the listener can be decoded, termed as auditory attention decoding (AAD). However, e...
详细信息
This paper proposes a novel neural audio codec, named APCodec+, which is an improved version of APCodec. The APCodec+ takes the audio amplitude and phase spectra as the coding object, and employs an adversarial traini...
详细信息
In our previous work, we have proposed a neural vocoder called APNet, which directly predicts speech amplitude and phase spectra with a 5 ms frame shift in parallel from the input acoustic features, and then reconstru...
详细信息
Using sarcasm on social media platforms to express negative opinions towards a person or object has become increasingly ***,detecting sarcasm in various forms of communication can be difficult due to conflicting *** t...
详细信息
Using sarcasm on social media platforms to express negative opinions towards a person or object has become increasingly ***,detecting sarcasm in various forms of communication can be difficult due to conflicting *** this paper,we introduce a contrasting sentiment-based model for multimodal sarcasm detection(CS4MSD),which identifies inconsistent emotions by leveraging the CLIP knowledge module to produce sentiment features in both text and ***,five external sentiments are introduced to prompt the model learning sentimental preferences among ***,we highlight the importance of verbal descriptions embedded in illustrations and incorporate additional knowledge-sharing modules to fuse such imagelike *** results demonstrate that our model achieves state-of-the-art performance on the public multimodal sarcasm dataset.
We describe a set of new methods to partially automate linguistic phylogenetic inference given (1) cognate sets with their respective protoforms and sound laws, (2) a mapping from phones to their articulatory features...
详细信息
This paper presents our machine translation system that was developed for the WAT2024 MultiIndic MT shared task. We built our system for the Sindhi-English language pair. We developed two MT systems. The first system ...
详细信息
In this paper, a Packet Loss Concealment (PLC) algorithm is proposed for G723.1 CELP-type speech coders in order to improve the quality of decoded speech in VoIP under burst packet loss. The original PLC method implem...
详细信息
Lip-to-speech (Lip2speech) synthesis, which predicts corresponding speech from talking face images, has witnessed significant progress with various models and training strategies in a series of independent studies. Ho...
详细信息
This paper presents a novel speech phase prediction model which predicts wrapped phase spectra directly from amplitude spectra by neural networks. The proposed model is a cascade of a residual convolutional network an...
详细信息
Modern speech recognition systems exhibit rapid performance degradation under domain shift. This issue is especially prevalent in data-scarce settings, such as low-resource languages, where the diversity of training d...
详细信息
暂无评论