检索结果-内蒙古大学图书馆

arXiv 2020年

作者： Hanselmann, Harald Ney, Hermann Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany AppTek GmbH Aachen Germany

The term fine-grained visual classification (FGVC) refers to classification tasks where the classes are very similar and the classification model needs to be able to find subtle differences to make the correct prediction. State-of-the-art approaches often include a localization step designed to help a classification network by localizing the relevant parts of the input images. However, this usually requires multiple iterations or passes through a full classification network or complex training schedules. In this work we present an efficient localization module that can be fused with a classification network in an end-to-end setup. On the one hand the module is trained by the gradient flowing back from the classification network. On the other hand, two self-supervised loss functions are introduced to increase the localization accuracy. We evaluate the new model on the three benchmark datasets CUB200-2011, Stanford Cars and FGVC-Aircraft and are able to achieve competitive recognition performance. Copyright © 2020, The Authors. All rights reserved.

关键词： Benchmarking

来源：评论

学校读者我要写书评

暂无评论

Leave-One-Out Phrase Model Training for Large-Scale Deployment 12

Leave-One-Out Phrase Model Training for Large-Scale Deployme...

引用

Workshop on Statistical Machine Translation

作者： Joern Wuebker Mei-Yuh Hwang Chris Quirk Human Language Technology and Pattern Recognition Group RWTH Aachen University Germany Microsoft Corporation Redmond WA USA

ISBN: (纸本)9781622765928

Training the phrase table by force-aligning (FA) the training data with the reference translation has been shown to improve the phrasal translation quality while significantly reducing the phrase table size on medium sized tasks. We apply this procedure to several large-scale tasks, with the primary goal of reducing model sizes without sacrificing translation quality. To deal with the noise in the automatically crawled parallel training data, we introduce on-demand word deletions, insertions, and backoffs to achieve over 99% successful alignment rate. We also add heuristics to avoid any increase in OOV rates. We are able to reduce already heavily pruned baseline phrase tables by more than 50% with little to no degradation in quality and occasionally slight improvement, without any increase in OOVs. We further introduce two global scaling factors for re-estimation of the phrase table via posterior phrase alignment probabilities and a modified absolute discounting method that can be applied to fractional counts.

关键词： reduced mass Model trains Heuristics Tables

来源：评论

学校读者我要写书评

暂无评论

Combining handwriting and speech recognition for transcribing historical handwritten documents

Combining handwriting and speech recognition for transcribin...

引用

International Conference on Document Analysis and recognition

作者： Emilio Granell Carlos-D. Martínez-Hinarejos Pattern Recognition and Human Language Technology Research Center Universitat Politècnica de València Valencia Spain

Transcription of historical documents is an interesting task for libraries in order to make available their funds. In the lasts years, the use of Handwritten Text recognition allowed paleographs to speed up the manual transcription process, since they are able to correct on a draft transcription. Another alternative is obtaining the draft transcription by dictating the contents to an Automatic Speech recognition system. When both sources (image and speech) are available, a multimodal combination is possible, and an iterative process can be used in order to refine the final hypothesis. In this work, a multimodal combination based on confusion networks is presented. Results on two different sets of data, with different difficulty level, show that the proposed technique provides similar or better draft transcriptions than a previously proposed approach, allowing for a faster transcription process.

关键词： Iterative decoding Acoustics Proposals Laplace equations Integrated optics Optical imaging

来源：评论

学校读者我要写书评

暂无评论

Improved Robustness to Disfluencies in Rnn-Transducer Based Speech recognition

Improved Robustness to Disfluencies in Rnn-Transducer Based ...

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Valentin Mendelev Tina Raissi Guglielmo Camporese Manuel Giollo Amazon Alexa Human Language Technology and Pattern Recognition Group RWTH Aachen University Germany University of Padova Italy

Automatic Speech recognition (ASR) based on Recurrent Neural Network Transducers (RNN-T) is gaining interest in the speech community. We investigate data selection and preparation choices aiming for improved robustness of RNN-T ASR to speech disfluencies with a focus on partial words. For evaluation we use clean data, data with disfluen- cies and a separate dataset with speech affected by stuttering. We show that after including a small amount of data with disfluencies in the training set the recognition accuracy on the tests with disfluencies and stuttering improves. Increasing the amount of training data with disfluencies gives additional gains without degradation on the clean data. We also show that replacing partial words with a dedicated token helps to get even better accuracy on utterances with disfluencies and stutter. The evaluation of our best model shows 22.5% and 16.4% relative WER reduction on those two evaluation sets.

关键词： Training Degradation Transducers Recurrent neural networks Training data Speech recognition Signal processing

来源：评论

学校读者我要写书评

暂无评论

Does Joint Training Really Help Cascaded Speech Translation?

arXiv

引用

arXiv 2022年

作者： Tran, Viet Anh Khoa Thulke, David Gao, Yingbo Herold, Christian Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany

Currently, in speech translation, the straightforward approach - cascading a recognition system with a translation system - delivers state-of-the-art results. However, fundamental challenges such as error propagation from the automatic speech recognition system still remain. To mitigate these problems, recently, people turn their attention to direct data and propose various joint training methods. In this work, we seek to answer the question of whether joint training really helps cascaded speech translation. We review recent papers on the topic and also investigate a joint training criterion by marginalizing the transcription posterior probabilities. Our findings show that a strong cascaded baseline can diminish any improvements obtained using joint training, and we suggest alternatives to joint training. We hope this work can serve as a refresher of the current speech translation landscape, and motivate research in finding more efficient and creative ways to utilize the direct data for speech translation. Copyright © 2022, The Authors. All rights reserved.

关键词： Speech recognition

来源：评论

学校读者我要写书评

暂无评论

Performance analysis of Neural Networks in combination with n-gram language models

Performance analysis of Neural Networks in combination with ...

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Ilya Oparin Martin Sundermeyer Hermann Ney Jean-Luc Gauvain LIMSI CNRS Spoken Language Processing Group France Computer Science Department Human Language Technology and Pattern Recognition RWTH Aachen University Germany

Neural Network language models (NNLMs) have recently become an important complement to conventional n-gram language models (LMs) in speech-to-text systems. However, little is known about the behavior of NNLMs. The analysis presented in this paper aims to understand which types of events are better modeled by NNLMs as compared to n-gram LMs, in what cases improvements are most substantial and why this is the case. Such an analysis is important to take further benefit from NNLMs used in combination with conventional n-gram models. The analysis is carried out for different types of neural network (feed-forward and recurrent) LMs. The results showing for which type of events NNLMs provide better probability estimates are validated on two setups that are different in their size and the degree of data homogeneity.

关键词： Artificial neural networks History Analytical models Training data Vocabulary Interpolation

来源：评论

学校读者我要写书评

暂无评论

ICFHR2014 Competition on Handwritten Text recognition on Transcriptorium Datasets (HTRtS)

ICFHR2014 Competition on Handwritten Text Recognition on Tra...

引用

International Workshop on Frontiers in Handwriting recognition

作者： Joan Andreu Sánchez Verónica Romero Alejandro H. Toselli Enrique Vidal Pattern Recognition and Human Language Technology Research Center Universitat Politècnica de València València Spain

A contest on Handwritten Text recognition organised in the context of the ICFHR 2014 conference is described. Two tracks with increased freedom on the use of training data were proposed and three research groups participated in these two tracks. The handwritten images for this contest were drawn from an English data set which is currently being considered in the Tran scriptorium project. The goal of this project is to develop innovative, efficient and cost-effective solutions for the transcription of historical handwritten document images, focusing on four languages: English, Spanish, German and Dutch. For the English language, the so-called "Bentham collection" is being considered in Tran scriptorium. It encompasses a large set of manuscripts written by the renowned English philosopher and reformer Jeremy Bentham (1748-1832). A small subset of this collection has been chosen for the present HTR competition. The selected subset has been written by several hands (Bentham himself and his secretaries) and entails significant variabilities and difficulties regarding the quality of text images and writing styles. Training and test data were provided in the form of carefully segmented line images, along with the corresponding transcripts. The three participants achieved very good results, with transcription word error rates ranging from 15.0% down to 8.6%.

关键词： Training Hidden Markov models Histograms Artificial neural networks Text recognition Training data Adaptive optics

来源：评论

学校读者我要写书评

暂无评论

Towards two-dimensional sequence to sequence model in neural machine translation

arXiv

引用

arXiv 2018年

作者： Bahar, Parnia Brix, Christopher Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department Rwth Aachen University AachenD-52056 Germany

This work investigates an alternative model for neural machine translation (NMT) and proposes a novel architecture, where we employ a multi-dimensional long short-term memory (MDLSTM) for translation modeling. In the state-of-the-art methods, source and target sentences are treated as one-dimensional sequences over time, while we view translation as a two-dimensional (2D) mapping using an MDLSTM layer to define the correspondence between source and target words. We extend beyond the current sequence to sequence backbone NMT models to a 2D structure in which the source and target sentences are aligned with each other in a 2D grid. Our proposed topology shows consistent improvements over attention-based sequence to sequence model on two WMT 2017 tasks, German↔English. Copyright © 2018, The Authors. All rights reserved.

关键词： Neural machine translation

来源：评论

学校读者我要写书评

暂无评论

Successfully Applying the Stabilized Lottery Ticket Hypothesis to the Transformer Architecture

arXiv

引用

arXiv 2020年

作者： Brix, Christopher Bahar, Parnia Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany

Sparse models require less memory for storage and enable a faster inference by reducing the necessary number of FLOPs. This is relevant both for time-critical and on-device computations using neural networks. The stabilized lottery ticket hypothesis states that networks can be pruned after none or few training iterations, using a mask computed based on the unpruned converged model. On the transformer architecture and the WMT 2014 English→German and English→French tasks, we show that stabilized lottery ticket pruning performs similar to magnitude pruning for sparsity levels of up to 85%, and propose a new combination of pruning techniques that outperforms all other techniques for even higher levels of sparsity. Furthermore, we confirm that the parameter’s initial sign and not its specific value is the primary factor for successful training, and show that magnitude pruning cannot be used to find winning lottery tickets. Copyright © 2020, The Authors. All rights reserved.

关键词： Network architecture

来源：评论

学校读者我要写书评

暂无评论

Improvements in RWTH's System for Off-Line Handwriting recognition

Improvements in RWTH's System for Off-Line Handwriting Recog...

引用

International Conference on Document Analysis and recognition

作者： Michal Michał Kozielski Patrick Doetsch Hermann Ney Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany Rheinisch-Westfalische Technische Hochschule Aachen Aachen Nordrhein-Westfalen DE Human Language Technol. & Pattern Recognition Group RWTH Aachen Univ. Aachen Germany

ISBN: (纸本)9780769549993

In this paper we describe a novel HMM-based system for off-line handwriting recognition. We adapt successful techniques from the domains of large vocabulary speech recognition and image object recognition: moment-based image normalization, writer adaptation, discriminative feature extraction and training, and open-vocabulary recognition. We evaluate those methods and examine their cumulative effect on the recognition performance. The final system outperforms current state-of-the-art approaches on two standard evaluation corpora for English and French handwriting.

关键词： Hidden Markov models Training Handwriting recognition Feature extraction Databases Error analysis Standards

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：