检索结果-内蒙古大学图书馆

IEEE International Conference on Acoustics, Speech and Signal Processing

作者： Wei Zhou Ralf Schluter Hermann Ney Human Language Technology and Pattern Recognition RWTH Aachen University Aachen Germany

ISBN: (数字)9781509066315

ISBN: (纸本)9781509066322

In hybrid HMM based speech recognition, LSTM language models have been widely applied and achieved large improvements. The theoretical capability of modeling any unlimited context suggests that no recombination should be applied in decoding. This motivates to reconsider full summation over the HMM-state sequences instead of Viterbi approximation in decoding. We explore the potential gain from more accurate probabilities in terms of decision making and apply the full-sum decoding with a modified prefix-tree search framework. The proposed full-sum decoding is evaluated on both Switchboard and Librispeech corpora. Different models using CE and sMBR training criteria are used. Additionally, both MAP and confusion network decoding as approximated variants of general Bayes decision rule are evaluated. Consistent improvements over strong baselines are achieved in almost all cases without extra cost. We also discuss tuning effort, efficiency and some limitations of full-sum decoding.

关键词：

来源：评论

学校读者我要写书评

暂无评论

When and Why is Unsupervised Neural Machine Translation Useless?

arXiv

引用

arXiv 2020年

作者： Kim, Yunsu Graça, Miguel Ney, Hermann Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany

This paper studies the practicality of the current state-of-the-art unsupervised methods in neural machine translation (NMT). In ten translation tasks with various data settings, we analyze the conditions under which the unsupervised methods fail to produce reasonable translations. We show that their performance is severely affected by linguistic dissimilarity and domain mismatch between source and target monolingual data. Such conditions are common for low-resource language pairs, where unsupervised learning works poorly. In all of our experiments, supervised and semi-supervised baselines with 50k-sentence bilingual data outperform the best unsupervised results. Our analyses pinpoint the limits of the current unsupervised NMT and also suggest immediate research directions. Copyright © 2020, The Authors. All rights reserved.

关键词： Neural machine translation

来源：评论

学校读者我要写书评

暂无评论

How Much Self-Attention Do We Needƒ Trading Attention for Feed-Forward Layers

How Much Self-Attention Do We Needƒ Trading Attention for F...

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Kazuki Irie Alexander Gerstenberger Ralf Schlüter Hermann Ney Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany

ISBN: (数字)9781509066315

ISBN: (纸本)9781509066322

We propose simple architectural modifications in the standard Transformer with the goal to reduce its total state size (defined as the number of self-attention layers times the sum of the key and value dimensions, times position) without loss of performance. Large scale Transformer language models have been empirically proved to give very good performance. However, scaling up results in a model that needs to store large states at evaluation time. This can increase the memory requirement dramatically for search e.g., in speech recognition (first pass decoding, lattice rescoring, or shallow fusion). In order to efficiently increase the model capacity without increasing the state size, we replace the single-layer feed-forward module in the Transformer layer by a deeper network, and decrease the total number of layers. In addition, we also evaluate the effect of key-value tying which directly divides the state size in half. On TED-LIUM 2, we obtain a model of state size 4 times smaller than the standard Transformer, with only 2% relative loss in terms of perplexity, which makes the deployment of Transformer language models more convenient.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Layer-Normalized LSTM for Hybrid-Hmm and End-To-End ASR

Layer-Normalized LSTM for Hybrid-Hmm and End-To-End ASR

引用

IEEE International Conference on Acoustics, Speech and Signal Processing

作者： Mohammad Zeineldeen Albert Zeyer Ralf Schluter Hermann Ney Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany

ISBN: (数字)9781509066315

ISBN: (纸本)9781509066322

Training deep neural networks is often challenging in terms of training stability. It often requires careful hyperparameter tuning or a pretraining scheme to converge. Layer normalization (LN) has shown to be a crucial ingredient in training deep encoder-decoder models. We explore various LN long short-term memory (LSTM) recurrent neural networks (RNN) variants by applying LN to different parts of the internal recurrency of LSTMs. There is no previous work that investigates this. We carry out experiments on the Switchboard 300h task for both hybrid and end-to-end ASR models and we show that LN improves the final word error rate (WER), the stability during training, allows to train even deeper models, requires less hyperparameter tuning, and works well even without pre-training. We find that applying LN to both forward and recurrent inputs globally, which we denoted by Global Joined Norm variant, gives a 10% relative improvement in WER.

关键词：

来源：评论

学校读者我要写书评

暂无评论

When and why is document-level context useful in neural machine translation? 4

When and why is document-level context useful in neural mach...

引用

4th Workshop on Discourse in Machine Translation, DiscoMT@EMNLP 2019

作者： Kim, Yunsu Tran, Duc Thanh Ney, Hermann Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany

ISBN: (纸本)9781950737741

Document-level context has received lots of attention for compensating neural machine translation (NMT) of isolated sentences. However, recent advances in document-level NMT focus on sophisticated integration of the context, explaining its improvement with only a few selected examples or targeted test sets. We extensively quantify the causes of improvements by a document-level model in general test sets, clarifying the limit of the usefulness of document-level context in NMT. We show that most of the improvements are not interpretable as utilizing the context. We also show that a minimal encoding is sufficient for the context modeling and very long context is not helpful for NMT. © 2019 Association for Computational Linguistics.

关键词： Neural machine translation

来源：评论

学校读者我要写书评

暂无评论

A Comprehensive Study of Residual CNNS for Acoustic Modeling in ASR

A Comprehensive Study of Residual CNNS for Acoustic Modeling...

引用

IEEE International Conference on Acoustics, Speech and Signal Processing

作者： Vitalii Bozheniuk Albert Zeyer Ralf Schluter Hermann Ney Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany

ISBN: (数字)9781509066315

ISBN: (纸本)9781509066322

Long short-term memory (LSTM) networks are the dominant architecture for large vocabulary continuous speech recognition (LVCSR) acoustic modeling due to their good performance. However, LSTMs are hard to tune and computationally expensive. To build a system with lower computational costs and which allows online streaming applications, we explore convolutional neural networks (CNN). To the best of our knowledge there is no overview on CNN hyper-parameter tuning for LVCSR in the literature, so we present our results explicitly. Apart from recognition performance, we focus on the training and evaluation speed and provide a time-efficient setup for CNNs. We faced an overfitting problem in training and solved it with data augmentation, namely SpecAugment. The system achieves results competitive with the top LSTM results. We significantly increased the speed of CNN in training and decoding approaching the speed of the offline LSTM.

关键词： acoustic modeling CNN ResNet LACE dense prediction

来源：评论

学校读者我要写书评

暂无评论

ELoPE: Fine-Grained Visual Classification with Efficient Localization, Pooling and Embedding

ELoPE: Fine-Grained Visual Classification with Efficient Loc...

引用

IEEE Workshop on Applications of Computer Vision (WACV)

作者： Harald Hanselmann Hermann Ney Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany

ISBN: (数字)9781728165530

ISBN: (纸本)9781728165547

The task of fine-grained visual classification (FGVC) deals with classification problems that display a small inter-class variance such as distinguishing between different bird species or car models. State-of-the-art approaches typically tackle this problem by integrating an elaborate attention mechanism or (part-) localization method into a standard convolutional neural network (CNN). Also in this work the aim is to enhance the performance of a backbone CNN such as ResNet by including three efficient and lightweight components specifically designed for FGVC. This is achieved by using global k-max pooling, a discriminative embedding layer trained by optimizing class means and an efficient localization module that estimates bounding boxes using only class labels for training. The resulting model achieves state-of-the-art recognition accuracies on multiple FGVC benchmark datasets.

关键词： Training Task analysis Automobiles Standards Visualization Birds Testing

来源：评论

学校读者我要写书评

暂无评论

Exploring A Zero-Order Direct Hmm Based on Latent Attention for Automatic Speech recognition

Exploring A Zero-Order Direct Hmm Based on Latent Attention ...

引用

IEEE International Conference on Acoustics, Speech and Signal Processing

作者： Parnia Bahar Nikita Makarov Albert Zeyer Ralf Schluter Hermann Ney Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany

ISBN: (数字)9781509066315

ISBN: (纸本)9781509066322

In this paper, we study a simple yet elegant latent variable attention model for automatic speech recognition (ASR) which enables an integration of attention sequence modeling into the direct hidden Markov model (HMM) concept. We use a sequence of hidden variables that establishes a mapping from output labels to input frames. Inspired by the direct HMM model, we assume a decomposition of the label sequence posterior into emission and transition probabilities using zero-order assumption and incorporate both Transformer and LSTM attention models into it. The method keeps the explicit alignment as part of the stochastic model and combines the ease of the end-to-end training of the attention model as well as an efficient and simple beam search. To study the effect of the latent model, we qualitatively analyze the alignment behavior of the different approaches. Our experiments on three ASR tasks show promising results in WER with more focused alignments in comparison to the attention models.

关键词：

来源：评论

学校读者我要写书评

暂无评论

A neural, interactive-predictive system for multimodal sequence to sequence tasks 57

A neural, interactive-predictive system for multimodal seque...

引用

57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, ACL 2019

作者： Peris, Álvaro Casacuberta, Francisco Pattern Recognition and Human Language Technology Research Center Universitat Politcnica de Valncia Valncia Spain

ISBN: (纸本)9781950737499

We present a demonstration of a neural interactive-predictive system for tackling mul- timodal sequence to sequence tasks. The sys- tem generates text predictions to different se- quence to sequence tasks: machine translation, image and video captioning. These predictions are revised by a human agent, who introduces corrections in the form of characters. The sys- tem reacts to each correction, providing alter- native hypotheses, compelling with the feed- back provided by the user. The final objective is to reduce the human effort required during this correction process. This system is implemented following a client-server architecture. For accessing the system, we developed a website, which com- municates with the neural model, hosted in a local server. From this website, the dif- ferent tasks can be tackled following the interactive-predictive framework. We open- source all the code developed for building this system. The demonstration in hosted in http://***/ interactive-seq2seq. © 2019 All rights reserved.

关键词： Websites

来源：评论

学校读者我要写书评

暂无评论

Automatic semantic segmentation of structural elements related to the spinal cord in the lumbar region by using convolutional neural networks 25

Automatic semantic segmentation of structural elements relat...

引用

25th International Conference on pattern recognition, ICPR 2020

作者： Saenz-Gamboa, Jhon Jairo de la Iglesia-Vayá, Maria Gómez, Jon A. Biomedical Imaging Joint Unit Foundation for the Promotion of Health and Biomedical Research FISABIO-CIPF València Spain Pattern Recognition and Human Language Technology research center Universitat Politècnica de València València Spain

ISBN: (纸本)9781728188089

This work addresses the problem of automatically segmenting the MR images corresponding to the lumbar spine. The purpose is to detect and delimit the different structural elements like vertebrae, intervertebral discs, nerves, blood vessels, etc. This task is known as semantic segmentation. The approach proposed in this work is based on convolutional neural networks whose output is a mask where each pixel from the input image is classified into one of the possible classes. Classes were defined by radiologists and correspond to structural elements and tissues. The proposed network architectures are variants of the U-Net. Several complementary blocks were used to define the variants: spatial attention models, deep supervision and multi-kernels at input, this last block type is based on the idea of inception. Those architectures which got the best results are described in this paper, and their results are discussed. Two of the proposed architectures outperform the standard U-Net used as baseline. © 2020 IEEE

关键词： Semantic Segmentation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：