This work studies knowledge distillation (KD) and addresses its constraints for recurrent neural network transducer (RNNT) models. In hard distillation, a teacher model transcribes large amounts of unlabelled speech t...
详细信息
MLP based front-ends have shown significant complementary properties to conventional spectral features. As part of the DARPA GALE program, different MLP features were developed for Mandarin ASR. In this paper, all the...
详细信息
We present NMT-Keras, a flexible toolkit for training deep learning models, which puts a particularemphasis on the development of advanced applications of neural machine translation systems, such as interactive-predic...
详细信息
This paper introduces a new procedure to improve table header detection in handwritten text images from the fusion of the posterior probabilities provided by two baseline classifiers. Each classifier considers a diffe...
This paper introduces a new procedure to improve table header detection in handwritten text images from the fusion of the posterior probabilities provided by two baseline classifiers. Each classifier considers a different modality, namely visual or textual features. Both baseline classifiers implements convolutional neural networks, particularly adopting the U-Net architecture. Four fusion methods are considered: the mean; linear discriminant analysis and random forest as meta-classifiers; and a recently developed method called alpha integration. The testing dataset consisted of 89 page images drawn from the Passau dataset. The improved performance provided by the fusion methods in the specific experiments is interesting considering the complexity of the challenging problem approached. In terms of area under the receiver operating characteristic curve the best results were obtained by alpha integration. This method incorporates least mean square parameter optimization. The improvement is relevant in the context of the targeted problem.
We present our demonstration of two machine translation applications to historical documents. The first task consists in generating a new version of a historical document, written in the modern version of its original...
详细信息
Accessibility to historical documents is mostly limited to scholars. This is due to the language barrier inherent in humanlanguage and the linguistic properties of these documents. Given a historical document, modern...
详细信息
Neural machine translation systems require large amounts of training data and resources. Even with this, the quality of the translations may be insufficient for some users or domains. In such cases, the output of the ...
详细信息
Automatic Speech recognition (ASR) based on Recurrent Neural Network Transducers (RNN-T) is gaining interest in the speech community. We investigate data selection and preparation choices aiming for improved robustnes...
详细信息
We propose a tracking adaptation to recover from early tracking errors in sign languagerecognition by optimizing the obtained tracking paths w.r.t. the hypothesized word sequences of an automatic sign language recogn...
详细信息
We propose a tracking adaptation to recover from early tracking errors in sign languagerecognition by optimizing the obtained tracking paths w.r.t. the hypothesized word sequences of an automatic sign languagerecognition system. Hand or head tracking is usually only optimized according to a tracking criterion. As a consequence, methods which depend on accurate detection and tracking of body parts lead to recognition errors in gesture and sign language processing. Similar to speaker dependent feature adaptation methods in automatic speech recognition, we propose an automatic visual alignment of signers for vision-based sign languagerecognition. Furthermore, the generation of additional virtual training samples is proposed to reduce the lack of data problem in sign language processing, which often leads to "one-shot" trained models. Most state-of-the- art systems are speaker dependent, and consider tracking as a preprocessing feature extraction part. Experiments on a publicly available benchmark database show that the proposed methods strongly improve the recognition accuracy of the system.
Recent advances in Audio-Visual Speech recognition (AVSR) have led to unprecedented achievements in the field, improving the robustness of this type of system in adverse, noisy environments. In most cases, this task h...
详细信息
暂无评论