Cross-lingual transfer of parsing models has been shown to work well for several closelyrelated languages, but predicting the success in other cases remains hard. Our study is a comprehensive analysis of the impact of...
详细信息
Mining high-quality bitexts for low-resource languages is challenging. This paper shows that sentence representation of language models fine-tuned with multiple negatives ranking loss, a contrastive objective, helps r...
Internet of Things (IoT) is a promising, relatively new technology that develops "smart" networks with a variety of uses and applications (e.g., smart cities, smart home and autonomous cars). The diversity o...
详细信息
This document describes an exploratory look into the Parallel Corpus Filtering Shared Task in WMT20. We submitted scores for both Pashto-English and Khmer-English systems combining multiple techniques like monolingual...
详细信息
Automatic lip-reading (ALR) aims to automatically tran-scribe spoken content from a speaker's silent lip motion captured in video. Current mainstream lip-reading approaches only use a single visual encoder to mode...
详细信息
ISBN:
(数字)9798350379815
ISBN:
(纸本)9798350379822
Automatic lip-reading (ALR) aims to automatically tran-scribe spoken content from a speaker's silent lip motion captured in video. Current mainstream lip-reading approaches only use a single visual encoder to model input videos of a single scale. In this paper, we propose to enhance lip-reading by incorporating multi-scale video data and multi-encoder. Specifically, we first introduce a novel multi-scale lip motion extraction algorithm based on the size of the speaker's face and propose an Enhanced ResNet3D visual front-end (VFE) to extract lip features at different scales. For the multi-encoder, in addition to the mainstream Transformer and Conformer, we also incorporate the recently proposed Branch-former and E-Branchformer as visual encoders. In the experiments, we explore the influence of different video data scales and encoders on ALR system performance and fuse the texts transcribed by all ALR systems using recognizer output voting error reduction (ROVER). Finally, our proposed approach placed second in the ICME 2024 ChatCLR Challenge Task 2, with a 21.52% reduction in character error rate (CER) compared to the official baseline on the evaluation set.
Code-switching automatic speech recognition (ASR) aims to transcribe speech that contains two or more languages accurately. To better capture language-specific speech representations and address language confusion in ...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Code-switching automatic speech recognition (ASR) aims to transcribe speech that contains two or more languages accurately. To better capture language-specific speech representations and address language confusion in code-switching ASR, the mixture-of-experts (MoE) architecture and an additional language diarization (LD) decoder are commonly employed. However, most researches remain stagnant in simple operations like weighted summation or concatenation to fuse language-specific speech representations, leaving significant opportunities to explore the enhancement of integrating language bias information. In this paper, we introduce CAMEL, a cross-attention-based MoE and language bias approach for code-switching ASR. Specifically, after each MoE layer, we fuse language-specific speech representations with cross-attention, leveraging its strong contextual modeling abilities. Additionally, we design a source attention-based mechanism to incorporate the language information from the LD decoder output into text embeddings. Experimental results demonstrate that our approach achieves state-of-the-art performance on the SEAME, ASRU200, and ASRU700+Librispeech460 Mandarin-English code-switching ASR datasets.
We present LEMMING, a modular log-linear model that jointly models lemmatization and tagging and supports the integration of arbitrary global features. It is trainable on corpora annotated with gold standard tags and ...
详细信息
Despite the success of the Universal Dependencies (UD) project exemplified by its impressive language breadth, there is still a lack in ‘within-language breadth’: most treebanks focus on standard languages. Even for...
详细信息
We explore ways to use speech data to screen for indications of Alzheimer’s dementia (AD). In particular, we describe our approach to the ICASSP 2023 Signal processing Grand Challenge, which involves extrapolating fr...
详细信息
We explore ways to use speech data to screen for indications of Alzheimer’s dementia (AD). In particular, we describe our approach to the ICASSP 2023 Signal processing Grand Challenge, which involves extrapolating from models learned from English speech samples, to Greek speech samples, to determine which subjects have AD. By using acoustic and linguistic features, inspired by clinical research on AD, our top-performing classification model achieves 69% accuracy in distinguishing AD patients from healthy controls, and our regression model attains an RMSE of 4.8 for inferring cognitive testing scores. These outcomes underscore the potential of our explainable model for detecting cognitive decline in AD patients via speech, and its applicability in clinical settings.
The article herein details a procedure for classifying service cases by priority level based on the service level agreement (SLA) between an organization and the customer. The main factor in the article's publicat...
详细信息
暂无评论