We present NMT-Keras, a flexible toolkit for training deep learning models, which puts a particularemphasis on the development of advanced applications of neural machine translation systems, such as interactive-predic...
详细信息
This paper introduces a new procedure to improve table header detection in handwritten text images from the fusion of the posterior probabilities provided by two baseline classifiers. Each classifier considers a diffe...
This paper introduces a new procedure to improve table header detection in handwritten text images from the fusion of the posterior probabilities provided by two baseline classifiers. Each classifier considers a different modality, namely visual or textual features. Both baseline classifiers implements convolutional neural networks, particularly adopting the U-Net architecture. Four fusion methods are considered: the mean; linear discriminant analysis and random forest as meta-classifiers; and a recently developed method called alpha integration. The testing dataset consisted of 89 page images drawn from the Passau dataset. The improved performance provided by the fusion methods in the specific experiments is interesting considering the complexity of the challenging problem approached. In terms of area under the receiver operating characteristic curve the best results were obtained by alpha integration. This method incorporates least mean square parameter optimization. The improvement is relevant in the context of the targeted problem.
We present our demonstration of two machine translation applications to historical documents. The first task consists in generating a new version of a historical document, written in the modern version of its original...
详细信息
Accessibility to historical documents is mostly limited to scholars. This is due to the language barrier inherent in humanlanguage and the linguistic properties of these documents. Given a historical document, modern...
详细信息
Neural machine translation systems require large amounts of training data and resources. Even with this, the quality of the translations may be insufficient for some users or domains. In such cases, the output of the ...
详细信息
Automatic Speech recognition (ASR) based on Recurrent Neural Network Transducers (RNN-T) is gaining interest in the speech community. We investigate data selection and preparation choices aiming for improved robustnes...
详细信息
We propose a tracking adaptation to recover from early tracking errors in sign languagerecognition by optimizing the obtained tracking paths w.r.t. the hypothesized word sequences of an automatic sign language recogn...
详细信息
We propose a tracking adaptation to recover from early tracking errors in sign languagerecognition by optimizing the obtained tracking paths w.r.t. the hypothesized word sequences of an automatic sign languagerecognition system. Hand or head tracking is usually only optimized according to a tracking criterion. As a consequence, methods which depend on accurate detection and tracking of body parts lead to recognition errors in gesture and sign language processing. Similar to speaker dependent feature adaptation methods in automatic speech recognition, we propose an automatic visual alignment of signers for vision-based sign languagerecognition. Furthermore, the generation of additional virtual training samples is proposed to reduce the lack of data problem in sign language processing, which often leads to "one-shot" trained models. Most state-of-the- art systems are speaker dependent, and consider tracking as a preprocessing feature extraction part. Experiments on a publicly available benchmark database show that the proposed methods strongly improve the recognition accuracy of the system.
Recent advances in Audio-Visual Speech recognition (AVSR) have led to unprecedented achievements in the field, improving the robustness of this type of system in adverse, noisy environments. In most cases, this task h...
详细信息
In current speech recognition systems mainly Short-Time Fourier Transform based features like MFCC are applied. Dropping the short-time stationarity assumption of the voiced speech, this paper introduces the non-stati...
详细信息
In current speech recognition systems mainly Short-Time Fourier Transform based features like MFCC are applied. Dropping the short-time stationarity assumption of the voiced speech, this paper introduces the non-stationary signal analysis into the ASR framework. We present new acoustic features extracted by a pitch-adaptive Gammatone filter bank. The noise robustness was proved on AURORA 2 and 4 tasks, where the proposed features outperform the standard MFCC. Furthermore, successful combination experiments via ROVER indicate the differences between the new features and MFCC.
Multiple types of models are used in handwriting recognition and can be broadly categorized into generative and discriminative models. Gaussian Hidden Markov Models are used successfully in most of the systems. Discri...
详细信息
Multiple types of models are used in handwriting recognition and can be broadly categorized into generative and discriminative models. Gaussian Hidden Markov Models are used successfully in most of the systems. Discriminative training can be applied to these models to improve them further. Alternatively, Segmental Conditional Random Fields have the advantage of being discriminative as well as segmental. The novelty of this work is the investigation of Segmental Conditional Random Fields for handwriting recognition. In addition, Multi-Layer Perceptrons and Long Short Term Memory Recurrent Neural Networks are compared for the observations generation in this framework. Various types of features are investigated in the segmental models for handwriting recognition. Furthermore, class-based language model features are proposed to extend this model. Visual features based on moments are extracted at a word level to make the model more robust. Experimental results on English handwriting show a relative reduction of 13.7% in terms of word error rate w.r.t. the baseline system. The proposed system also outperforms the Gaussian Hidden Markov Models trained discriminatively using the minimum phone error criterion by a relative reduction of 6.9% in terms of word error rate.
暂无评论