In this work, multiple hierarchical language modeling strategies for a zero OOV rate large vocabulary continuous speech recognition system are investigated. In our previously proposed hierarchical approach, a full-wor...
详细信息
In this paper we present an investigation of sequence-discriminative training of deep neural networks for automatic speech recognition. We evaluate different sequence-discriminative training criteria (MMI and MPE) and...
详细信息
We investigate sequence-discriminative training of long shortterm memory recurrent neural networks using the maximum mutual information criterion. We show that although recurrent neural networks already make use of th...
详细信息
German is a morphologically rich language having a high degree of word inflections, derivations and compounding. This leads to high out-of-vocabulary (OOV) rates and poor language model (LM) probabilities in the large...
详细信息
Performing large vocabulary continuous speech recognition (LVCSR) for morphologically rich languages is considered a challenging task. The morphological richness of such languages leads to high out-of-vocabulary (OOV)...
详细信息
We present a novel toolkit that implements the long short-term memory (LSTM) neural network concept for language modeling. The main goal is to provide a software which is easy to use, and which allows fast training of...
详细信息
In the hybrid approach, neural network output directly serves as hidden Markov model (HMM) state posterior probability estimates. In contrast to this, in the tandem approach neural network output is used as input feat...
详细信息
In this paper, three different voicing features are studied as additional acoustic features for continuous speech recognition. The harmonic product spectrum based feature is extracted in frequency domain while the aut...
详细信息
In this paper, three different voicing features are studied as additional acoustic features for continuous speech recognition. The harmonic product spectrum based feature is extracted in frequency domain while the autocorrelation and the average magnitude difference based methods work in time domain. The algorithms produce a measure of voicing for each time frame. The voicing measure was combined with the standard Mel Frequency Cepstral Coefficients (MFCC) using linear discriminant analysis to choose the most relevant features. Experiments have been performed on small and large vocabulary tasks. The three different voicing measures combined with MFCCs resulted in similar improvements in word error rate: improvements of up to 14% on the small-vocabulary task and improvements of up to 6% on the large-vocabulary task relative to using MFCC alone with the same overall number of parameters in the system.
Current neural translation networks are based on an effective attention mechanism that can be considered as an implicit probabilistic notion of alignment. Such architectures do not guarantee a high quality alignment, ...
详细信息
Manual analysis and decryption of enciphered documents is a tedious and error prone work. Often-even after spending large amounts of time on a particular cipher-no decipherment can be found. Automating the decryption ...
详细信息
暂无评论