检索结果-内蒙古大学图书馆

Fusion of Visual and Textual Features for Table Header Detection in Handwritten Text Images

学校读者我要写书评

暂无评论

Fusion of Visual and Textual Features for Table Header Detec...

International Conference on Computational Science and Computational Intelligence (CSCI)

作者： Addisson Salazar Jose Ramó n Prieto Enrique Vidal Gonzalo Safont Luis Vergara Institute of Telecommunications and Multimedia Applications iTEAM Universitat Polit&#x00E8 cnica de Val&#x00E8 ncia Valencia Spain Pattern Recognition and Human Language Technology PRHLT Universitat Polit&#x00E8

This paper introduces a new procedure to improve table header detection in handwritten text images from the fusion of the posterior probabilities provided by two baseline classifiers. Each classifier considers a different modality, namely visual or textual features. Both baseline classifiers implements convolutional neural networks, particularly adopting the U-Net architecture. Four fusion methods are considered: the mean; linear discriminant analysis and random forest as meta-classifiers; and a recently developed method called alpha integration. The testing dataset consisted of 89 page images drawn from the Passau dataset. The improved performance provided by the fusion methods in the specific experiments is interesting considering the complexity of the challenging problem approached. In terms of area under the receiver operating characteristic curve the best results were obtained by alpha integration. This method incorporates least mean square parameter optimization. The improvement is relevant in the context of the targeted problem.

关键词：

Two demonstrations of the machine translation applications to historical documents

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Domingo, Miguel Casacuberta, Francisco Pattern Recognition and Human Language Technology Research Center Universitat Politècnica de València Camino de Vera s/n Valencia46022 Spain

We present our demonstration of two machine translation applications to historical documents. The first task consists in generating a new version of a historical document, written in the modern version of its original language. The second application is limited to a document's orthography. It adapts the document's spelling to modern standards in order to achieve an orthography consistency and accounting for the lack of spelling conventions. We followed an interactive, adaptive framework that allows the user to introduce corrections to the system's hypothesis. The system reacts to these corrections by generating a new hypothesis that takes them into account. Once the user is satisfied with the system's hypothesis and validates it, the system adapts its model following an online learning strategy. This system is implemented following a client-server architecture. We developed a website which communicates with the neural models. All code is open-source and publicly available. © 2021, CC BY-SA.

关键词： History

Modernizing Historical Documents: a User Study

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Domingo, Miguel Casacuberta, Francisco Pattern Recognition and Human Language Technology Research Center Universitat Politècnica de València Camino de Vera s/n Valencia46022 Spain

Accessibility to historical documents is mostly limited to scholars. This is due to the language barrier inherent in human language and the linguistic properties of these documents. Given a historical document, modernization aims to generate a new version of it, written in the modern version of the document’s language. Its goal is to tackle the language barrier, decreasing the comprehension difficulty and making historical documents accessible to a broader audience. In this work, we proposed a new neural machine translation approach that profits from modern documents to enrich its systems. We tested this approach with both automatic and human evaluation, and conducted a user study. Results showed that modernization is successfully reaching its goal, although it still has room for improvement. Copyright © 2019, The Authors. All rights reserved.

关键词： History

Online learning for effort reduction in interactive neural machine translation

学校读者我要写书评

暂无评论

arXiv 2018年

作者： Peris, Álvaro Casacuberta, Francisco Pattern Recognition and Human Language Technology Research Center Universitat Politècnica de València Camino de Vera s/n Valencia46022 Spain

Neural machine translation systems require large amounts of training data and resources. Even with this, the quality of the translations may be insufficient for some users or domains. In such cases, the output of the system must be revised by a human agent. This can be done in a post-editing stage or following an interactive machine translation protocol. We explore the incremental update of neural machine translation systems during the post-editing or interactive translation processes. Such modifications aim to incorporate the new knowledge, from the edited sentences, into the translation system. Updates to the model are performed on-the-fly, as sentences are corrected, via online learning techniques. In addition, we implement a novel interactive, adaptive system, able to react to single-character interactions. This system greatly reduces the human effort required for obtaining high-quality translations. In order to stress our proposals, we conduct exhaustive experiments varying the amount and type of data available for training. Results show that online learning effectively achieves the objective of reducing the human effort required during the post-editing or the interactive machine translation stages. Moreover, these adaptive systems also perform well in scenarios with scarce resources. We show that a neural machine translation system can be rapidly adapted to a specific domain, exclusively by means of online learning techniques. Copyright © 2018, The Authors. All rights reserved.

关键词： Deep learning

Improved robustness to disfluencies in RNN-transducer based speech recognition

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Mendelev, Valentin Raissi, Tina Camporese, Guglielmo Giollo, Manuel Amazon Alexa United States Human Language Technology and Pattern Recognition Group RWTH Aachen University Germany Department of Mathematics "Tullio Levi-Civita" University of Padova Italy

Automatic Speech recognition (ASR) based on Recurrent Neural Network Transducers (RNN-T) is gaining interest in the speech community. We investigate data selection and preparation choices aiming for improved robustness of RNN-T ASR to speech disfluencies with a focus on partial words. For evaluation we use clean data, data with disfluencies and a separate dataset with speech affected by stuttering. We show that after including a small amount of data with disfluencies in the training set the recognition accuracy on the tests with disfluencies and stuttering improves. Increasing the amount of training data with disfluencies gives additional gains without degradation on the clean data. We also show that replacing partial words with a dedicated token helps to get even better accuracy on utterances with disfluencies and stutter. The evaluation of our best model shows 22.5% and 16.4% relative WER reduction on those two evaluation sets. Copyright © 2020, The Authors. All rights reserved.

关键词： Speech

Visual Modeling and Feature Adaptation in Sign language recognition

学校读者我要写书评

暂无评论

Visual Modeling and Feature Adaptation in Sign Language Reco...

Informationstechnische Gesellschaft-FachtagungAachen, Germany Sprach-kommunikation

作者： Philippe Dreuw Hermann Ney Human Language Technology and Pattern Recognition RWTH Aachen University

We propose a tracking adaptation to recover from early tracking errors in sign language recognition by optimizing the obtained tracking paths w.r.t. the hypothesized word sequences of an automatic sign language recognition system. Hand or head tracking is usually only optimized according to a tracking criterion. As a consequence, methods which depend on accurate detection and tracking of body parts lead to recognition errors in gesture and sign language processing. Similar to speaker dependent feature adaptation methods in automatic speech recognition, we propose an automatic visual alignment of signers for vision-based sign language recognition. Furthermore, the generation of additional virtual training samples is proposed to reduce the lack of data problem in sign language processing, which often leads to "one-shot" trained models. Most state-of-the- art systems are speaker dependent, and consider tracking as a preprocessing feature extraction part. Experiments on a publicly available benchmark database show that the proposed methods strongly improve the recognition accuracy of the system.

关键词： Levee Adaptation model

Tailored Design of Audio-Visual Speech recognition Models using Branchformers

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Gimeno-Gómez, David Martínez-Hinarejos, Carlos D. Pattern Recognition and Human Language Technology research center Universitat Politècnica de València Camino de Vera s/n València46022 Spain

Recent advances in Audio-Visual Speech recognition (AVSR) have led to unprecedented achievements in the field, improving the robustness of this type of system in adverse, noisy environments. In most cases, this task has been addressed through the design of models composed of two independent encoders, each dedicated to a specific modality. However, while recent works have explored unified audio-visual encoders, determining the optimal cross-modal architecture remains an ongoing challenge. Furthermore, such approaches often rely on models comprising vast amounts of parameters and high computational cost training processes. In this paper, we aim to bridge this research gap by introducing a novel audio-visual framework. Our proposed method constitutes, to the best of our knowledge, the first attempt to harness the flexibility and interpretability offered by encoder architectures, such as the Branchformer, in the design of parameter-efficient AVSR systems. To be more precise, the proposed framework consists of two steps: first, estimating audio- and video-only systems, and then designing a tailored audiovisual unified encoder based on the layer-level branch scores provided by the modality-specific models. Extensive experiments on English and Spanish AVSR benchmarks covering multiple data conditions and scenarios demonstrated the effectiveness of our proposed method. Even when trained on a moderate scale of data, our models achieve competitive word error rates (WER) of approximately 2.5% for English and surpass existing approaches for Spanish, establishing a new benchmark with an average WER of around 9.1%. These results reflect how our tailored AVSR system is able to reach state-of-the-art recognition rates while significantly reducing the model complexity w.r.t. the prevalent approach in the field. Code and pre-trained models are available at https://***/david-gimeno/tailored-avsr. © 2024, CC BY-NC-ND.

关键词： Speech enhancement

Non-stationary feature extraction for automatic speech recognition

学校读者我要写书评

暂无评论

Non-stationary feature extraction for automatic speech recog...

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Zoltán Tüske Pavel Golik Ralf Schlüter Friedhelm R. Drepper Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen Germany Zentralinstitut für Elektronik Forschungszentrum Jülich (KFA) Julich Germany

In current speech recognition systems mainly Short-Time Fourier Transform based features like MFCC are applied. Dropping the short-time stationarity assumption of the voiced speech, this paper introduces the non-stationary signal analysis into the ASR framework. We present new acoustic features extracted by a pitch-adaptive Gammatone filter bank. The noise robustness was proved on AURORA 2 and 4 tasks, where the proposed features outperform the standard MFCC. Furthermore, successful combination experiments via ROVER indicate the differences between the new features and MFCC.

关键词： Mel frequency cepstral coefficient Harmonic analysis Feature extraction Speech Speech recognition Time frequency analysis

Investigation of Segmental Conditional Random Fields for large vocabulary handwriting recognition

学校读者我要写书评

暂无评论

Investigation of Segmental Conditional Random Fields for lar...

International Conference on Document Analysis and recognition

作者： Mahdi Hamdani M. Ali Basha Shaik Patrick Doetsch Hermann Ney Human Language Technology and Pattern Recognition Group RWTH Aachen University Germany Rheinisch-Westfalische Technische Hochschule Aachen Aachen Nordrhein-Westfalen DE Spoken Language Processing Group LIMSI CNRS Paris France

Multiple types of models are used in handwriting recognition and can be broadly categorized into generative and discriminative models. Gaussian Hidden Markov Models are used successfully in most of the systems. Discriminative training can be applied to these models to improve them further. Alternatively, Segmental Conditional Random Fields have the advantage of being discriminative as well as segmental. The novelty of this work is the investigation of Segmental Conditional Random Fields for handwriting recognition. In addition, Multi-Layer Perceptrons and Long Short Term Memory Recurrent Neural Networks are compared for the observations generation in this framework. Various types of features are investigated in the segmental models for handwriting recognition. Furthermore, class-based language model features are proposed to extend this model. Visual features based on moments are extracted at a word level to make the model more robust. Experimental results on English handwriting show a relative reduction of 13.7% in terms of word error rate w.r.t. the baseline system. The proposed system also outperforms the Gaussian Hidden Markov Models trained discriminatively using the minimum phone error criterion by a relative reduction of 6.9% in terms of word error rate.

关键词： Hidden Markov models Adaptation models Handwriting recognition Computer architecture Speech recognition Computational modeling Markov processes