This paper addresses the robust speech recognition problem as an adaptation task. Specifically, we investigate the cumulative application of adaptation methods. A bidirectional Long Short-Term Memory (BLSTM) based neu...
详细信息
The recently proposed conformer architecture has been successfully used for end-to-end automatic speech recognition (ASR) architectures achieving state-of-the-art performance on different datasets. To our best knowled...
详细信息
We present a complete training pipeline to build a state-of-the-art hybrid HMM-based ASR system on the 2nd release of the TED-LIUM corpus. Data augmentation using SpecAugment is successfully applied to improve perform...
详细信息
Sequence-to-sequence attention-based models on subword units allow simple open-vocabulary end-to-end speech recognition. In this work, we show that such models can achieve competitive results on the Switchboard 300h a...
详细信息
The use of language Models (LMs) is a very important component in large and open vocabulary recognition systems. This paper presents an open-vocabulary approach for Arabic handwriting recognition. The proposed approac...
详细信息
ISBN:
(纸本)9781479901937
The use of language Models (LMs) is a very important component in large and open vocabulary recognition systems. This paper presents an open-vocabulary approach for Arabic handwriting recognition. The proposed approach makes use of Arabic word decomposition based on morphological analysis. The vocabulary is a combination of words and sub-words obtained by the decomposition process. Out Of Vocabulary (OOV) words can be recognized by combining different elements from the lexicon. The recognition system is based on Hidden Markov Models (HMMs) with position and context dependent character models. An n-gram LM trained on the decomposed text is used along with the HMMs during the search. The approach is evaluated using two Arabic handwriting datasets. The open vocabulary approach leads to a significant improvement in the system performance. Two different types experiments for two Arabic handwriting recognition tasks are conducted in this work. The proposed approach for open vocabulary allows to have an absolute improvement of up to 1% in the Word Error Rate (WER) for the constrained task and to keep the same performance of the baseline system for the unconstrained one.
We propose a method to generate linguistically meaningful subunits in a fully automated fashion for sign language corpora. The ability to automate the process of subunit annotation has profound effects on the data ava...
详细信息
ISBN:
(纸本)9781467355452
We propose a method to generate linguistically meaningful subunits in a fully automated fashion for sign language corpora. The ability to automate the process of subunit annotation has profound effects on the data available for training sign languagerecognition systems. The approach is based on the idea that subunits are shared among different signs. With sufficient data and knowledge of possible signing variants, accurate automatic subunit sequences are produced, matching the specific characteristics of given sign language data. Specifically we demonstrate how an iterative forced alignment algorithm can be used to transfer the knowledge of a user-edited open sign language dictionary to the task of annotating a challenging, large vocabulary, multi-signer corpus recorded from public TV. Existing approaches focus on labour intensive manual subunit annotations or on data-driven approaches. Our method yields an average precision and recall of 15% under the maximum achievable accuracy with little user intervention beyond providing a simple word gloss.
As one popular modeling approach for end-to-end speech recognition, attention-based encoder-decoder models are known to suffer the length bias and corresponding beam problem. Different approaches have been applied in ...
详细信息
Following the rationale of end-to-end modeling, CTC, RNN-T or encoder-decoder-attention models for automatic speech recognition (ASR) use graphemes or grapheme-based subword units based on e.g. byte-pair encoding (BPE...
详细信息
Recent publications on automatic-speech-recognition (ASR) have a strong focus on attention encoder-decoder (AED) architectures which tend to suffer from over-fitting in low resource scenarios. One solution to tackle t...
详细信息
Unsupervised representation learning has recently helped automatic speech recognition (ASR) to tackle tasks with limited labeled data. Following this, hardware limitations and applications give rise to the question ho...
详细信息
暂无评论