检索结果-内蒙古大学图书馆

International Conference on Document Analysis and recognition

作者： Mahdi Hamdani M. Ali Basha Shaik Patrick Doetsch Hermann Ney Human Language Technology and Pattern Recognition Group RWTH Aachen University Germany Rheinisch-Westfalische Technische Hochschule Aachen Aachen Nordrhein-Westfalen DE Spoken Language Processing Group LIMSI CNRS Paris France

Multiple types of models are used in handwriting recognition and can be broadly categorized into generative and discriminative models. Gaussian Hidden Markov Models are used successfully in most of the systems. Discriminative training can be applied to these models to improve them further. Alternatively, Segmental Conditional Random Fields have the advantage of being discriminative as well as segmental. The novelty of this work is the investigation of Segmental Conditional Random Fields for handwriting recognition. In addition, Multi-Layer Perceptrons and Long Short Term Memory Recurrent Neural Networks are compared for the observations generation in this framework. Various types of features are investigated in the segmental models for handwriting recognition. Furthermore, class-based language model features are proposed to extend this model. Visual features based on moments are extracted at a word level to make the model more robust. Experimental results on English handwriting show a relative reduction of 13.7% in terms of word error rate w.r.t. the baseline system. The proposed system also outperforms the Gaussian Hidden Markov Models trained discriminatively using the minimum phone error criterion by a relative reduction of 6.9% in terms of word error rate.

关键词： Hidden Markov models Adaptation models Handwriting recognition Computer architecture Speech recognition Computational modeling Markov processes

来源：评论

学校读者我要写书评

暂无评论

Maschinelle Sprachverarbeitung

引用

Informatik-Spektrum 2003年第2期26卷 94-102页

作者： Ney, Hermann Human Language Technology and Pattern Recognition Lehrstuhl für Informatik Rheinisch-Westfälische Technische Hochschule Aachen Ahornstraße 55 52056 Aachen E-Mail: ney@informatik.rwth-aachen.de DE

Dieser Beitrag behandlt die Rolle des statistischen Ansatzes in der maschinellen (oder automatischen) Sprachverarbeitung.

关键词：

来源：评论

学校读者我要写书评

暂无评论

On the choice of modeling unit for sequence-to-sequence speech recognition

arXiv

引用

arXiv 2019年

作者： Irie, Kazuki Prabhavalkar, Rohit Kannan, Anjuli Bruguier, Antoine Rybach, David Nguyen, Patrick Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany Google Mountain ViewCA94043 United States

来源：评论

学校读者我要写书评

暂无评论

Two Semi-Supervised Training Approaches for Automated Text recognition

Two Semi-Supervised Training Approaches for Automated Text R...

引用

International Workshop on Frontiers in Handwriting recognition

作者： Gundram Leifert Roger Labahn Joan Andreu Sánchez Computational Intelligence Technology Lab University of Rostock Rostock Germany Pattern Recognition and Human Language Technologies Center Universitat Politécnica de València València Spain

ISBN: (数字)9781728199665

ISBN: (纸本)9781728199672

Automated text recognition is a fundamental problem in Document Image Analysis. Optical models are used for modeling characters while language models are used for composing sentences. Since the scripts and linguistic context differ widely, it is mandatory to specialize the models by training on task-dependent ground-truth. However, to create a sufficient amount of ground-truth, at least for historical handwritten scripts, well-qualified persons have to mark and transcribe text lines, which is very time-consuming. On the other hand, in many cases unassigned transcripts are already available on page-level from another process chain, or at least transcripts from similar linguistic context are available. In this work we present two approaches that make use of such transcripts: whereas the first one creates training data by automatically assigning page-dependent transcripts to text lines, the second one uses a task-specific language model to generate highly confident training data. Both approaches are successfully applied on a very challenging historical handwritten collection.

关键词： Training Training data Text recognition Layout Task analysis Production Linguistics

来源：评论

学校读者我要写书评

暂无评论

Controllable Factuality in Document-Grounded Dialog Systems Using a Noisy Channel Model

arXiv

引用

arXiv 2022年

作者： Daheim, Nico Thulke, David Dugast, Christian Ney, Hermann Ubiquitous Knowledge Processing Lab Department of Computer Science Technical University of Darmstadt Germany Human Language Technology and Pattern Recognition RWTH Aachen University Germany AppTek GmbH Germany

In this work, we present a model for document-grounded response generation in dialog that is decomposed into two components according to Bayes' theorem. One component is a traditional ungrounded response generation model and the other component models the reconstruction of the grounding document based on the dialog context and generated response. We propose different approximate decoding schemes and evaluate our approach on multiple open-domain and task-oriented document-grounded dialog datasets. Our experiments show that the model is more factual in terms of automatic factuality metrics than the baseline model. Furthermore, we outline how introducing scaling factors between the components allows for controlling the tradeoff between factuality and fluency in the model output. Finally, we compare our approach to a recently proposed method to control factuality in grounded dialog, CTRL (Rashkin et al., 2021), and show that both approaches can be combined to achieve additional improvements. © 2022, CC BY.

关键词：

来源：评论

学校读者我要写书评

暂无评论

The LIMSI handwriting recognition system for the HTRtS 2014 contest

The LIMSI handwriting recognition system for the HTRtS 2014 ...

引用

International Conference on Document Analysis and recognition

作者： Théodore Bluche Hermann Ney Christopher Kermorvant LIMSI CNRS Spoken Language Processing Group Orsay France A2iA SA Paris France RWTH Aachen University Human Language Technology and Pattern Recognition Aachen Germany Teklia SAS Paris France

In this paper we present the handwriting recognition systems submitted by the LIMSI to the HTRtS 2014 contest. The systems for both the restricted and unrestricted tracks consisted of combination of several optical models. We extracted handcrafted features as well as pixels values with a sliding window. We trained Deep Neural Networks (DNNs) and Bidirectional Long Short-Term Memory Recurrent Neural Networks (BLSTM-RNNs), which where plugged as the optical model in Hidden Markov Models (HMMs). We propose a novel method to build language models that can cope with hyphenation in the text. The combination was performed from lattices generated from the different systems. We were the only team participating in both tracks and ranked second in each. The final Word Error Rates were 15.0% and 11.0% for the restricted (resp. unrestricted) track. We studied the impact of adding data for optical and language modeling. After the evaluation, we also used the same corpus for the language model as the winning team and obtained comparable results.

关键词： Hidden Markov models Lattices RNA Analytical models

来源：评论

学校读者我要写书评

暂无评论

Framewise and CTC training of Neural Networks for handwriting recognition

Framewise and CTC training of Neural Networks for handwritin...

引用

International Conference on Document Analysis and recognition

作者： Théodore Bluche Hermann Ney Jérôme Louradour Christopher Kermorvant LIMSI CNRS Spoken Language Processing Group Orsay France A2iA SA Paris France RWTH Aachen University Human Language Technology and Pattern Recognition Aachen Germany Teklia SAS Paris France

In recent years, Long Short-Term Memory Recurrent Neural Networks (LSTM-RNNs) trained with the Connectionist Temporal Classification (CTC) objective won many international handwriting recognition evaluations. The CTC algorithm is based on a forward-backward procedure, avoiding the need of a segmentation of the input before training. The network outputs are characters labels, and a special non-character label. On the other hand, in the hybrid Neural Network / Hidden Markov Models (NN/HMM) framework, networks are trained with framewise criteria to predict state labels. In this paper, we show that CTC training is close to forward-backward training of NN/HMMs, and can be extended to more standard HMM topologies. We apply this method to Multi-Layer Perceptrons (MLPs), and investigate the properties of CTC, namely the modeling of character by single labels and the role of the special label.

关键词： Hidden Markov models Artificial neural networks Continuous wavelet transforms Training Topology Labeling Text recognition

来源：评论

学校读者我要写书评

暂无评论

HTR-Flor: A Deep Learning System for Offline Handwritten Text recognition

HTR-Flor: A Deep Learning System for Offline Handwritten Tex...

引用

Brazilian Symposium on Computer Graphics and Image Processing (SIBGRAPI)

作者： Arthur Flor de Sousa Neto Byron Leite Dantas Bezerra Alejandro Héctor Toselli Estanislau Baptista Lima Escola Politécnica de Pernambuco Universidade de Pernambuco Recife Brazil Pattern Recognition and Human Language Technology Universitat Politècnica de València València Spain

ISBN: (数字)9781728192741

ISBN: (纸本)9781728192758

In recent years, Handwritten Text recognition (HTR) has captured a lot of attention among the researchers of the computer vision community. Current state-of-the-art approaches for offline HTR are based on Convolutional Recurrent Neural Networks (CRNNs) excel at scene text recognition. Unfortunately, deep models such as CRNNs, Recur-rent Neural Networks (RNNs) are likely to suffer from vanishing/exploding gradient problems when processing long text images, which are commonly found in scanned documents. Besides, they usually have millions of parameters which require huge amount of data, and computational resource. Recently, a new class of neural net-work architecture, called Gated Convolutional Neural Networks (Gated-CNN), has demonstrated potentials to complement CRNN methods in modeling. Therefore, in this paper, we present a new architecture for HTR, based on Gated-CNN, with fewer parameters and fewer layers, which is able to outperform the current state-of-the-art architectures for HTR. The experiment validates that the proposed model has statistically significant recognition results, surpassing previous HTR systems by an average of 33% over five important handwritten benchmark datasets. Moreover, the proposed model is able to achieve satisfactory recognition rates even in case of few training data. Finally, its compact architecture requires less computational resources, which can be applied for real-world applications that have hardware limitations, such as robots and smartphones.

关键词： Logic gates Convolution Computer architecture Text recognition Hidden Markov models Optical imaging Computational modeling

来源：评论

学校读者我要写书评

暂无评论

Sample drop detection for distant-speech recognition with asynchronous devices distributed in space

arXiv

引用

arXiv 2019年

作者： Raissi, Tina Pascual, Santiago Omologo, Maurizio Human Language Technology and Pattern Recognition RWTH Aachen University Aachen Germany Universitat Politècnica de Catalunya Barcelona Spain Center for Information and Communication Technology Fondazione Bruno Kessler Trento Italy

In many applications of multi-microphone multi-device processing, the synchronization among different input channels can be affected by the lack of a common clock and isolated drops of samples. In this work, we address the issue of sample drop detection in the context of a conversational speech scenario, recorded by a set of microphones distributed in space. The goal is to design a neural-based model that given a short window in the time domain, detects whether one or more devices have been subjected to a sample drop event. The candidate time windows are selected from a set of large time intervals, possibly including a sample drop, and by using a preprocessing step. The latter is based on the application of normalized cross-correlation between signals acquired by different devices. The architecture of the neural network relies on a CNN-LSTM encoder, followed by multi-head attention. The experiments are conducted using both artificial and real data. Our proposed approach obtained F1 score of 88% on an evaluation set extracted from the CHiME-5 corpus. A comparable performance was found in a larger set of experiments conducted on a set of multi-channel artificial scenes. Copyright © 2019, The Authors. All rights reserved.

关键词： Speech recognition

来源：评论

学校读者我要写书评

暂无评论

RADMM: RECURRENT ADAPTIVE MIXTURE MODEL WITH APPLICATIONS TO DOMAIN ROBUST language MODELING

RADMM: RECURRENT ADAPTIVE MIXTURE MODEL WITH APPLICATIONS TO...

引用

IEEE International Conference on Acoustics, Speech and Signal Processing

作者： Kazuki Irie Shankar Kumar Michael Nirschl Hank Liao Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University D-52056 Aachen Germany Google Inc. New York NY 10011 USA

ISBN: (纸本)9781538646595

We present a new architecture and a training strategy for an adaptive mixture of experts with applications to domain robust language modeling. The proposed model is designed to benefit from the scenario where the training data are available in diverse domains as is the case for YouTube speech recognition. The two core components of our model are an ensemble of parallel long short-term memory (LSTM) expert layers for each domain and another LSTM based network which generates state dependent mixture weights for combining expert LSTM states by linear interpolation. The resulting model is a recurrent adaptive mixture model (RADMM) of domain experts. We train our model on 4.4B words from YouTube speech recognition data. We report results on the YouTube speech recognition test set. Compared with a background LSTM model, we obtain up to 12% relative improvement in perplexity and an improvement in word error rate from 12.3% to 12.1% while using a lattice rescoring with strong pruning.

关键词： language modeling neural networks speech recognition mixture of experts domain adaptation modelling languages Speech recognition Neural network Professional personnel YouTube Mixture models training policy new buildings

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：