Word posterior probabilities are a common approach for confidence estimation in automatic speech recognition and machine translation. We will generalize this idea and introduce n-gram posterior probabilities and show ...
详细信息
We propose and study three different novel approaches for tackling the problem of development set selection in Statistical Machine Translation. We focus on a scenario where a machine translation system is leveraged fo...
详细信息
We show that most search errors can be identified by aligning the results of a symmetric forward and backward decoding pass. Based on this knowledge, we introduce an efficient high-level decoding architecture which yi...
详细信息
ISBN:
(纸本)9781479927579
We show that most search errors can be identified by aligning the results of a symmetric forward and backward decoding pass. Based on this knowledge, we introduce an efficient high-level decoding architecture which yields virtually no search errors, and requires virtually no manual tuning. We perform an initial forward- and backward decoding with tight initial beams, then we identify search errors, and then we recursively increment the beam sizes and perform new forward and backward decodings for erroneous intervals until no more search errors are detected. Consequently, each utterance and even each single word is decoded with the smallest beam size required to decode it correctly. On all tested systems we achieve an error rate equal or very close to classical decoding with ideally tuned beam size, but unsupervisedly without specific tuning, and at around 2 times faster runtime. An additional speedup by factor 2 can be achieved by decoding the forward and backward pass in separate threads.
We present discriminative reordering models for phrase-based statistical machine translation. The models are trained using the maximum entropy principle. We use several types of features: based on words, based on word...
详细信息
Training deep neural networks is often challenging in terms of training stability. It often requires careful hyperparameter tuning or a pretraining scheme to converge. Layer normalization (LN) has shown to be a crucia...
ISBN:
(数字)9781509066315
ISBN:
(纸本)9781509066322
Training deep neural networks is often challenging in terms of training stability. It often requires careful hyperparameter tuning or a pretraining scheme to converge. Layer normalization (LN) has shown to be a crucial ingredient in training deep encoder-decoder models. We explore various LN long short-term memory (LSTM) recurrent neural networks (RNN) variants by applying LN to different parts of the internal recurrency of LSTMs. There is no previous work that investigates this. We carry out experiments on the Switchboard 300h task for both hybrid and end-to-end ASR models and we show that LN improves the final word error rate (WER), the stability during training, allows to train even deeper models, requires less hyperparameter tuning, and works well even without pre-training. We find that applying LN to both forward and recurrent inputs globally, which we denoted by Global Joined Norm variant, gives a 10% relative improvement in WER.
We propose several tracking adaptation approaches to recover from early tracking errors in sign languagerecognition by optimizing the obtained tracking paths w.r.t. to the hypothesized word sequences of an automatic ...
详细信息
We propose several tracking adaptation approaches to recover from early tracking errors in sign languagerecognition by optimizing the obtained tracking paths w.r.t. to the hypothesized word sequences of an automatic sign languagerecognition system. Hand or head tracking is usually only optimized according to a tracking criterion. As a consequence, methods which depend on accurate detection and tracking of body parts lead to recognition errors in gesture and sign language processing. We analyze an integrated tracking and recognition approach addressing these problems and propose approximation approaches over multiple hand hypotheses to ease the time complexity of the integrated approach. Most state-of-the-art systems consider tracking as a preprocessing feature extraction part. Experiments on a publicly available benchmark database show that the proposed methods strongly improve the recognition accuracy of the system.
Long short-term memory (LSTM) networks are the dominant architecture for large vocabulary continuous speech recognition (LVCSR) acoustic modeling due to their good performance. However, LSTMs are hard to tune and comp...
详细信息
ISBN:
(数字)9781509066315
ISBN:
(纸本)9781509066322
Long short-term memory (LSTM) networks are the dominant architecture for large vocabulary continuous speech recognition (LVCSR) acoustic modeling due to their good performance. However, LSTMs are hard to tune and computationally expensive. To build a system with lower computational costs and which allows online streaming applications, we explore convolutional neural networks (CNN). To the best of our knowledge there is no overview on CNN hyper-parameter tuning for LVCSR in the literature, so we present our results explicitly. Apart from recognition performance, we focus on the training and evaluation speed and provide a time-efficient setup for CNNs. We faced an overfitting problem in training and solved it with data augmentation, namely SpecAugment. The system achieves results competitive with the top LSTM results. We significantly increased the speed of CNN in training and decoding approaching the speed of the offline LSTM.
We give an overview of the RWTH phrase-based statistical machine translation system that was used in the evaluation campaign of the International Workshop on Spoken language Translation 2005. We use a two pass approac...
详细信息
One of the most difficult challenges in face recognition is the large variation in pose. One approach to handle this problem is to use a 2D-Warping algorithm in a nearest-neighbor classifier. The 2D-Warping algorithm ...
详细信息
One of the most difficult challenges in face recognition is the large variation in pose. One approach to handle this problem is to use a 2D-Warping algorithm in a nearest-neighbor classifier. The 2D-Warping algorithm optimizes an energy function that captures the cost of matching pixels between two images while respecting the 2D dependencies defined by local pixel neighborhoods. Optimizing this energy function is an NP-complete problem and is therefore approached with algorithms that aim to approximate the optimal solution. In this paper we compare two algorithms that do this without discarding any 2D dependencies and we study the effect of the quality of the approximate solutions on the classification performance. Additionally, we propose a new algorithm that is capable of finding better solutions and obtaining better energies than the other methods. The experimental evaluation on the CMU-MultiPIE database shows that the proposed algorithm also achieves state-of-the-art recognition accuracies.
The task of fine-grained visual classification (FGVC) deals with classification problems that display a small inter-class variance such as distinguishing between different bird species or car models. State-of-the-art ...
详细信息
ISBN:
(数字)9781728165530
ISBN:
(纸本)9781728165547
The task of fine-grained visual classification (FGVC) deals with classification problems that display a small inter-class variance such as distinguishing between different bird species or car models. State-of-the-art approaches typically tackle this problem by integrating an elaborate attention mechanism or (part-) localization method into a standard convolutional neural network (CNN). Also in this work the aim is to enhance the performance of a backbone CNN such as ResNet by including three efficient and lightweight components specifically designed for FGVC. This is achieved by using global k-max pooling, a discriminative embedding layer trained by optimizing class means and an efficient localization module that estimates bounding boxes using only class labels for training. The resulting model achieves state-of-the-art recognition accuracies on multiple FGVC benchmark datasets.
暂无评论