检索结果-内蒙古大学图书馆

IEEE International Conference on Acoustics, Speech and Signal Processing

作者： Parnia Bahar Nikita Makarov Albert Zeyer Ralf Schluter Hermann Ney Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany

ISBN: (数字)9781509066315

ISBN: (纸本)9781509066322

In this paper, we study a simple yet elegant latent variable attention model for automatic speech recognition (ASR) which enables an integration of attention sequence modeling into the direct hidden Markov model (HMM) concept. We use a sequence of hidden variables that establishes a mapping from output labels to input frames. Inspired by the direct HMM model, we assume a decomposition of the label sequence posterior into emission and transition probabilities using zero-order assumption and incorporate both Transformer and LSTM attention models into it. The method keeps the explicit alignment as part of the stochastic model and combines the ease of the end-to-end training of the attention model as well as an efficient and simple beam search. To study the effect of the latent model, we qualitatively analyze the alignment behavior of the different approaches. Our experiments on three ASR tasks show promising results in WER with more focused alignments in comparison to the attention models.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Fine-Grained Visual Categorization with 2D-Warping

Fine-Grained Visual Categorization with 2D-Warping

引用

International Conference on pattern recognition

作者： Harald Hanselmann Hermann Ney Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany

The task of fine-grained visual categorization is related to both general object recognition and specialized tasks such as face recognition. Hence, we propose to combine two methods popular for general object recognition and face recognition to build a new model-free system for fine-grained visual categorization. Specifically, we use Local Naive-Bayes Nearest Neighbor as a pre-selection method and 2D-Warping as a refinement step. For the latter, we explore different ways to use the alignments computed by a 2D-Warping algorithm for classification. We demonstrate the performance of our approach on the CUB200-2011 database and show that our approach outperforms the recognition accuracy of current state-of-the-art methods.

关键词： Training Accuracy Visualization Face recognition Databases Birds Feature extraction

来源：评论

学校读者我要写书评

暂无评论

ELoPE: Fine-Grained Visual Classification with Efficient Localization, Pooling and Embedding

arXiv

引用

arXiv 2019年

作者： Hanselmann, Harald Ney, Hermann Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany

The task of fine-grained visual classification (FGVC) deals with classification problems that display a small inter-class variance such as distinguishing between different bird species or car models. State-of-the-art approaches typically tackle this problem by integrating an elaborate attention mechanism or (part-) localization method into a standard convolutional neural network (CNN). Also in this work the aim is to enhance the performance of a backbone CNN such as ResNet by including three efficient and lightweight components specifically designed for FGVC. This is achieved by using global k-max pooling, a discriminative embedding layer trained by optimizing class means and an efficient bounding box estimator that only needs class labels for training. The resulting model achieves new best state-of-the-art recognition accuracies on the Stanford cars and FGVC-Aircraft datasets. Copyright © 2019, The Authors. All rights reserved.

关键词： Embeddings

来源：评论

学校读者我要写书评

暂无评论

Optimizing Energies for Pose-Invariant Face recognition

Optimizing Energies for Pose-Invariant Face Recognition

引用

International Conference on pattern recognition

作者： Harald Hanselmann Hermann Ney Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany

One of the most difficult challenges in face recognition is the large variation in pose. One approach to handle this problem is to use a 2D-Warping algorithm in a nearest-neighbor classifier. The 2D-Warping algorithm optimizes an energy function that captures the cost of matching pixels between two images while respecting the 2D dependencies defined by local pixel neighborhoods. Optimizing this energy function is an NP-complete problem and is therefore approached with algorithms that aim to approximate the optimal solution. In this paper we compare two algorithms that do this without discarding any 2D dependencies and we study the effect of the quality of the approximate solutions on the classification performance. Additionally, we propose a new algorithm that is capable of finding better solutions and obtaining better energies than the other methods. The experimental evaluation on the CMU-MultiPIE database shows that the proposed algorithm also achieves state-of-the-art recognition accuracies.

关键词： Two dimensional displays Face recognition Approximation algorithms Dynamic programming Face Heuristic algorithms Message passing

来源：评论

学校读者我要写书评

暂无评论

ELoPE: Fine-Grained Visual Classification with Efficient Localization, Pooling and Embedding

ELoPE: Fine-Grained Visual Classification with Efficient Loc...

引用

IEEE Workshop on Applications of Computer Vision (WACV)

作者： Harald Hanselmann Hermann Ney Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany

ISBN: (数字)9781728165530

ISBN: (纸本)9781728165547

The task of fine-grained visual classification (FGVC) deals with classification problems that display a small inter-class variance such as distinguishing between different bird species or car models. State-of-the-art approaches typically tackle this problem by integrating an elaborate attention mechanism or (part-) localization method into a standard convolutional neural network (CNN). Also in this work the aim is to enhance the performance of a backbone CNN such as ResNet by including three efficient and lightweight components specifically designed for FGVC. This is achieved by using global k-max pooling, a discriminative embedding layer trained by optimizing class means and an efficient localization module that estimates bounding boxes using only class labels for training. The resulting model achieves state-of-the-art recognition accuracies on multiple FGVC benchmark datasets.

关键词： Training Task analysis Automobiles Standards Visualization Birds Testing

来源：评论

学校读者我要写书评

暂无评论

A Comprehensive Study of Residual CNNS for Acoustic Modeling in ASR

A Comprehensive Study of Residual CNNS for Acoustic Modeling...

引用

IEEE International Conference on Acoustics, Speech and Signal Processing

作者： Vitalii Bozheniuk Albert Zeyer Ralf Schluter Hermann Ney Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany

ISBN: (数字)9781509066315

ISBN: (纸本)9781509066322

Long short-term memory (LSTM) networks are the dominant architecture for large vocabulary continuous speech recognition (LVCSR) acoustic modeling due to their good performance. However, LSTMs are hard to tune and computationally expensive. To build a system with lower computational costs and which allows online streaming applications, we explore convolutional neural networks (CNN). To the best of our knowledge there is no overview on CNN hyper-parameter tuning for LVCSR in the literature, so we present our results explicitly. Apart from recognition performance, we focus on the training and evaluation speed and provide a time-efficient setup for CNNs. We faced an overfitting problem in training and solved it with data augmentation, namely SpecAugment. The system achieves results competitive with the top LSTM results. We significantly increased the speed of CNN in training and decoding approaching the speed of the offline LSTM.

关键词： acoustic modeling CNN ResNet LACE dense prediction

来源：评论

学校读者我要写书评

暂无评论

Multilingual Off-Line Handwriting recognition in Real-World Images

Multilingual Off-Line Handwriting Recognition in Real-World ...

引用

IAPR International Workshop on Document Analysis Systems, DAS

作者： Michal Kozielski Patrick Doetsch Mahdi Hamdani Hermann Ney Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany

We propose a state-of-the-art system for recognizing real-world handwritten images exposing a huge degree of noise and a high out-of-vocabulary rate. We describe methods for successful image demising, line removal, deskewing, deslanting, and text line segmentation. We demonstrate how to use a HMM-based recognition system to obtain competitive results, and how to further improve it using LSTM neural networks in the tandem approach. The final system outperforms other approaches on a new dataset for English and French handwriting. The presented framework scales well across other standard datasets.

关键词： Hidden Markov models Handwriting recognition Standards Training Image segmentation Vocabulary Algorithm design and analysis

来源：评论

学校读者我要写书评

暂无评论

Layer-Normalized LSTM for Hybrid-Hmm and End-To-End ASR

Layer-Normalized LSTM for Hybrid-Hmm and End-To-End ASR

引用

IEEE International Conference on Acoustics, Speech and Signal Processing

作者： Mohammad Zeineldeen Albert Zeyer Ralf Schluter Hermann Ney Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany

ISBN: (数字)9781509066315

ISBN: (纸本)9781509066322

Training deep neural networks is often challenging in terms of training stability. It often requires careful hyperparameter tuning or a pretraining scheme to converge. Layer normalization (LN) has shown to be a crucial ingredient in training deep encoder-decoder models. We explore various LN long short-term memory (LSTM) recurrent neural networks (RNN) variants by applying LN to different parts of the internal recurrency of LSTMs. There is no previous work that investigates this. We carry out experiments on the Switchboard 300h task for both hybrid and end-to-end ASR models and we show that LN improves the final word error rate (WER), the stability during training, allows to train even deeper models, requires less hyperparameter tuning, and works well even without pre-training. We find that applying LN to both forward and recurrent inputs globally, which we denoted by Global Joined Norm variant, gives a 10% relative improvement in WER.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Analysis of Preprocessing Techniques for Latin Handwriting recognition

Analysis of Preprocessing Techniques for Latin Handwriting R...

引用

International Workshop on Frontiers in Handwriting recognition

作者： Hendrik Pesch Mahdi Hamdani Jens Forster Hermann Ney Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany

In this work we analyze the contribution of preprocessing steps for Latin handwriting recognition. A preprocessing pipeline based on geometric heuristics and image statistics is used. This pipeline is applied to French and English handwriting recognition in an HMM based framework. Results show that preprocessing improves recognition performance for the two tasks. The Maximum Likelihood (ML)-trained HMM system reaches a competitive WER of 16.7% and outperforms many sophisticated systems for the French handwriting recognition task. The results for English handwriting are comparable to other ML-trained HMM recognizers. Using MLP preprocessing a WER of 35.3% is achieved.

关键词： Hidden Markov models Databases Handwriting recognition Pipelines Text recognition Noise

来源：评论

学校读者我要写书评

暂无评论

Phrase Model Training for Statistical Machine Translation with Word Lattices of Preprocessing Alternatives 12

Phrase Model Training for Statistical Machine Translation wi...

引用

Workshop on Statistical Machine Translation

作者： Joern Wuebker Hermann Ney Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany

ISBN: (纸本)9781622765928

In statistical machine translation, word lattices are used to represent the ambiguities in the preprocessing of the source sentence, such as word segmentation for Chinese or morphological analysis for German. Several approaches have been proposed to define the probability of different paths through the lattice with external tools like word segmenters, or by applying indicator features. We introduce a novel lattice design, which explicitly distinguishes between different preprocessing alternatives for the source sentence. It allows us to make use of specific features for each preprocessing type and to lexicalize the choice of lattice path directly in the phrase translation model. We argue that forced alignment training can be used to learn lattice path and phrase translation model simultaneously. On the news-commentary portion of the German→English WMT 2011 task we can show moderate improvements of up to 0.6% Bleu over a state-of-the-art baseline system.

关键词： machine translation crystal lattices Word Pretreatment

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：