Document-level context for neural machine translation (NMT) is crucial to improve the translation consistency and cohesion, the translation of ambiguous inputs, as well as several other linguistic phenomena. Many work...
详细信息
Encoder-decoder architecture is widely adopted for sequence-to-sequence modeling tasks. For machine translation, despite the evolution from long short-term memory networks to Transformer networks, plus the introductio...
详细信息
Prominently used in support vector machines and logistic regressions, kernel functions (kernels) can implicitly map data points into high dimensional spaces and make it easier to learn complex decision boundaries. In ...
详细信息
This work investigates the alignment problem in state-of-the-art multi-head attention models based on the transformer architecture. We demonstrate that alignment extraction in transformer models can be improved by aug...
详细信息
Egyptian Arabic (EA) is a colloquial version of Arabic. It is a low-resource morphologically rich language that causes problems in Large Vocabulary Continuous Speech recognition (LVCSR). Building LMs on morpheme level...
详细信息
Neural Network language models (NNLMs) have recently become an important complement to conventional n-gram language models (LMs) in speech-to-text systems. However, little is known about the behavior of NNLMs. The ana...
详细信息
Neural Network language models (NNLMs) have recently become an important complement to conventional n-gram language models (LMs) in speech-to-text systems. However, little is known about the behavior of NNLMs. The analysis presented in this paper aims to understand which types of events are better modeled by NNLMs as compared to n-gram LMs, in what cases improvements are most substantial and why this is the case. Such an analysis is important to take further benefit from NNLMs used in combination with conventional n-gram models. The analysis is carried out for different types of neural network (feed-forward and recurrent) LMs. The results showing for which type of events NNLMs provide better probability estimates are validated on two setups that are different in their size and the degree of data homogeneity.
This paper proposes the improvement of context dependent modeling for Arabic handwriting recognition. Since the number of parameters in context dependent models is huge, CART trees are used for state tying. This work ...
详细信息
This paper proposes the improvement of context dependent modeling for Arabic handwriting recognition. Since the number of parameters in context dependent models is huge, CART trees are used for state tying. This work is based on a new set of questions for the CART tree construction based on a "lossy mapping" categorization of the Arabic shapes. The used system is a combination of Hidden Markov Models and Recurrent Neural Networks using the hybrid approach. A comparison between a Neural network trained using the baseline labels and another one based on the CART tree labels is done. The experimental results show that the use of the CART labels for the Neural Network training beneficial. The lossy mapping based CART tree performed better than the baseline system. An absolute improvement of 2.9% in terms of Word Error Rate is performed on the test set of the Open Hart database.
In this work we release our extensible and easily configurable neural network training software. It provides a rich set of functional layers with a particular focus on efficient training of recurrent neural network to...
详细信息
ISBN:
(纸本)9781509041183
In this work we release our extensible and easily configurable neural network training software. It provides a rich set of functional layers with a particular focus on efficient training of recurrent neural network topologies on multiple GPUs. The source of the software package is public and freely available for academic research purposes and can be used as a framework or as a standalone tool which supports a flexible configuration. The software allows to train state-of-the-art deep bidirectional long short-term memory (LSTM) models on both one dimensional data like speech or two dimensional data like handwritten text and was used to develop successful submission systems in several evaluation campaigns.
Log-linear models find a wide range of applications in patternrecognition. The training of log-linear models is a convex optimization problem. In this work, we compare the performance of stochastic and batch optimiza...
详细信息
ISBN:
(纸本)9781479903573
Log-linear models find a wide range of applications in patternrecognition. The training of log-linear models is a convex optimization problem. In this work, we compare the performance of stochastic and batch optimization algorithms. Stochastic algorithms are fast on large data sets but can not be parallelized well. In our experiments on a broadcast conversations recognition task, stochastic methods yield competitive results after only a short training period, but when spending enough computational resources for parallelization, batch algorithms are competitive with stochastic algorithms. We obtained slight improvements by using a stochastic second order algorithm. Our best log-linear model outperforms the maximum likelihood trained Gaussian mixture model baseline although being ten times smaller.
Conditional Random Fields (CRFs) have proven to per form well on natural language processing tasks like name transliteration, concept tagging or grapheme-to-phoneme (g2p) conversion. The aim of this paper is to propos...
详细信息
ISBN:
(纸本)9781457705380
Conditional Random Fields (CRFs) have proven to per form well on natural language processing tasks like name transliteration, concept tagging or grapheme-to-phoneme (g2p) conversion. The aim of this paper is to propose some extension to the state-of-the-art CRF systems for these tasks. Since the number of features can grow rapidly, a method for features selection is very helpful to boost performance. A combination of L1 and L2 regularization (elastic net) has been adopted and implemented within the Rprop optimization algorithm. Usually, dependencies on the target side are limited to bigram dependencies since the computational complexity grows exponentially with the history length. We present a modified CRF decoding where a conventional language model on target side is integrated into the CRF search process. Thus, larger contexts can be taken into account. Besides these two main parts, the already published margin-extension to the CRF training criterion has been adopted.
暂无评论