检索结果-内蒙古大学图书馆

arXiv 2019年

作者： Gao, Yingbo Herold, Christian Wang, Weiyue Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany

Prominently used in support vector machines and logistic regressions, kernel functions (kernels) can implicitly map data points into high dimensional spaces and make it easier to learn complex decision boundaries. In this work, by replacing the inner product function in the softmax layer, we explore the use of kernels for contextual word classification. In order to compare the individual kernels, experiments are conducted on standard language modeling and machine translation tasks. We observe a wide range of performances across different kernel settings. Extending the results, we look at the gradient properties, investigate various mixture strategies and examine the disambiguation abilities. Copyright © 2019, The Authors. All rights reserved.

关键词： Support vector machines

来源：评论

学校读者我要写书评

暂无评论

Performance analysis of Neural Networks in combination with n-gram language models

Performance analysis of Neural Networks in combination with ...

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Ilya Oparin Martin Sundermeyer Hermann Ney Jean-Luc Gauvain LIMSI CNRS Spoken Language Processing Group France Computer Science Department Human Language Technology and Pattern Recognition RWTH Aachen University Germany

Neural Network language models (NNLMs) have recently become an important complement to conventional n-gram language models (LMs) in speech-to-text systems. However, little is known about the behavior of NNLMs. The analysis presented in this paper aims to understand which types of events are better modeled by NNLMs as compared to n-gram LMs, in what cases improvements are most substantial and why this is the case. Such an analysis is important to take further benefit from NNLMs used in combination with conventional n-gram models. The analysis is carried out for different types of neural network (feed-forward and recurrent) LMs. The results showing for which type of events NNLMs provide better probability estimates are validated on two setups that are different in their size and the degree of data homogeneity.

关键词： Artificial neural networks History Analytical models Training data Vocabulary Interpolation

来源：评论

学校读者我要写书评

暂无评论

A study of latent monotonic attention variants

arXiv

引用

arXiv 2021年

作者： Zeyer, Albert Schlüter, Ralf Ney, Hermann Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen Germany AppTek GmbH Aachen Germany

End-to-end models reach state-of-the-art performance for speech recognition, but global soft attention is not monotonic, which might lead to convergence problems, to instability, to bad generalisation, cannot be used for online streaming, and is also inefficient in calculation. Monotonicity can potentially fix all of this. There are several ad-hoc solutions or heuristics to introduce monotonicity, but a principled introduction is rarely found in literature so far. In this paper, we present a mathematically clean solution to introduce monotonicity, by introducing a new latent variable which represents the audio position or segment boundaries. We compare several monotonic latent models to our global soft attention baseline such as a hard attention model, a local windowed soft attention model, and a segmental soft attention model. We can show that our monotonic models perform as good as the global soft attention model. We perform our experiments on Switchboard 300h. We carefully outline the details of our training and release our code and configs. © 2021, CC BY.

关键词： Speech recognition

来源：评论

学校读者我要写书评

暂无评论

Robust Knowledge Distillation from RNN-T Models with Noisy Training Labels Using Full-Sum Loss

Robust Knowledge Distillation from RNN-T Models with Noisy T...

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Mohammad Zeineldeen Kartik Audhkhasi Murali Karthick Baskar Bhuvana Ramabhadran Computer Science Department Human Language Technology and Pattern Recognition RWTH Aachen University Aachen Germany Google LLC New York

This work studies knowledge distillation (KD) and addresses its constraints for recurrent neural network transducer (RNN-T) models. In hard distillation, a teacher model transcribes large amounts of unlabelled speech to train a student model. Soft distillation is another popular KD method that distills the output logits of the teacher model. Due to the nature of RNN-T alignments, applying soft distillation between RNNT architectures having different posterior distributions is challenging. In addition, bad teachers having high word-error-rate (WER) reduce the efficacy of KD. We investigate how to effectively distill knowledge from variable quality ASR teachers, which has not been studied before to the best of our knowledge. We show that a sequence-level KD, full-sum distillation, outperforms other distillation methods for RNN-T models, especially for bad teachers. We also propose a variant of full-sum distillation that distills the sequence discriminative knowledge of the teacher leading to further improvement in WER. We conduct experiments on public datasets namely SpeechStew and LibriSpeech, and on in-house production data.

关键词： Knowledge engineering Training Recurrent neural networks Transducers Production Signal processing Acoustics

来源：评论

学校读者我要写书评

暂无评论

Enhancing and Adversarial: Improve ASR with Speaker Labels

Enhancing and Adversarial: Improve ASR with Speaker Labels

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Wei Zhou Haotian Wu Jingjing Xu Mohammad Zeineldeen Christoph Lüscher Ralf Schlüter Hermann Ney Computer Science Department Human Language Technology and Pattern Recognition RWTH Aachen University Aachen Germany AppTek GmbH Aachen Germany

ASR can be improved by multi-task learning (MTL) with domain enhancing or domain adversarial training, which are two opposite objectives with the aim to increase/decrease domain variance towards domain-aware/agnostic ASR, respectively. In this work, we study how to best apply these two opposite objectives with speaker labels to improve conformer-based ASR. We also propose a novel adaptive gradient reversal layer for stable and effective adversarial training without tuning effort. Detailed analysis and experimental verification are conducted to show the optimal positions in the ASR neural network (NN) to apply speaker enhancing and adversarial training. We also explore their combination for further improvement, achieving the same performance as i-vectors plus adversarial training. Our best speaker-based MTL achieves 7% relative improvement on the Switchboard Hub5’00 set. We also investigate the effect of such speaker-based MTL w.r.t. cleaner dataset and weaker ASR NN.

关键词： Training Artificial neural networks Switches Signal processing Multitasking Acoustics Speech processing

来源：评论

学校读者我要写书评

暂无评论

Moment-Based Image Normalization for Handwritten Text recognition

Moment-Based Image Normalization for Handwritten Text Recogn...

引用

International Workshop on Frontiers in Handwriting recognition

作者： Michal Kozielski Jens Forster Hermann Ney Human Language Technology and Pattern Recognition Group Chair of Computer Science 6 RWTH Aachen University Aachen Germany

In this paper, we extend the concept of moment-based normalization of images from digit recognition to the recognition of handwritten text. Image moments provide robust estimates for text characteristics such as size and position of words within an image. For handwriting recognition the normalization procedure is applied to image slices independently. Additionally, a novel moment-based algorithm for line-thickness normalization is presented. The proposed normalization methods are evaluated on the RIMES database of French handwriting and the IAM database of English handwriting. For RIMES we achieve an improvement from 16.7% word error rate to 13.4% and for IAM from 46.6% to 37.3%.

关键词： Hidden Markov models Databases Shape Image segmentation Vectors Handwriting recognition Error analysis

来源：评论

学校读者我要写书评

暂无评论

Lattice-Free Sequence Discriminative Training for Phoneme-Based Neural Transducers

Lattice-Free Sequence Discriminative Training for Phoneme-Ba...

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Zijian Yang Wei Zhou Ralf Schlüter Hermann Ney Computer Science Department Human Language Technology and Pattern Recognition RWTH Aachen University Aachen Germany AppTek GmbH Aachen Germany

Recently, RNN-Transducers have achieved remarkable results on various automatic speech recognition tasks. However, lattice-free sequence discriminative training methods, which obtain superior performance in hybrid models, are rarely investigated in RNN-Transducers. In this work, we propose three lattice-free training objectives, namely lattice-free maximum mutual information, lattice-free segment-level minimum Bayes risk, and lattice-free minimum Bayes risk, which are used for the final posterior output of the phoneme-based neural transducer with a limited context dependency. Compared to criteria using N-best lists, lattice-free methods eliminate the decoding step for hypotheses generation during training, which leads to more efficient training. Experimental results show that lattice-free methods gain up to 6.5% relative improvement in word error rate compared to a sequence-level cross-entropy trained model. Compared to the N-best-list based minimum Bayes risk objectives, lattice-free methods gain 40% - 70% relative training time speedup with a small degradation in performance.

关键词： Training Degradation Transducers Error analysis Signal processing Decoding Speech processing

来源：评论

学校读者我要写书评

暂无评论

On the Relation Between Internal language Model and Sequence Discriminative Training for Neural Transducers

On the Relation Between Internal Language Model and Sequence...

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Internal language model (ILM) subtraction has been widely applied to improve the performance of the RNN-Transducer with external language model (LM) fusion for speech recognition. In this work, we show that sequence discriminative training has a strong correlation with ILM subtraction from both theoretical and empirical points of view. Theoretically, we derive that the global optimum of maximum mutual information (MMI) training shares a similar formula as ILM subtraction. Empirically, we show that ILM subtraction and sequence discriminative training achieve similar effects across a wide range of experiments on Librispeech, including both MMI and minimum Bayes risk (MBR) criteria, as well as neural transducers and LMs of both full and limited context. The benefit of ILM subtraction also becomes much smaller after sequence discriminative training. We also provide an indepth study to show that sequence discriminative training has a minimal effect on the commonly used zero-encoder ILM estimation, but a joint effect on both encoder and prediction + joint network for posterior probability reshaping including both ILM and blank suppression.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Why does CTC result in peaky behavior?

arXiv

引用

arXiv 2021年

作者： Zeyer, Albert Schlüter, Ralf Ney, Hermann Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen Germany AppTek GmbH Aachen Germany

The peaky behavior of CTC models is well known experimentally. However, an understanding about why peaky behavior occurs is missing, and whether this is a good property. We provide a formal analysis of the peaky behavior and gradient descent convergence properties of the CTC loss and related training criteria. Our analysis provides a deep understanding why peaky behavior occurs and when it is suboptimal. On a simple example which should be trivial to learn for any model, we prove that a feed-forward neural network trained with CTC from uniform initialization converges towards peaky behavior with a 100% error rate. Our analysis further explains why CTC only works well together with the blank label. We further demonstrate that peaky behavior does not occur on other related losses including a label prior model, and that this improves convergence. © 2021, CC BY-SA.

关键词： Gradient methods

来源：评论

学校读者我要写书评

暂无评论

One decade of statistical machine translation: 1996-2005

One decade of statistical machine translation: 1996-2005

引用

IEEE Workshop on Automatic Speech recognition and Understanding

作者： H. Ney Human Language Technology and Pattern Recognition Lehrstuhl fur Informatik VI-Computer Science Department RWTH Aachen University Aachen Germany

In the last decade, the statistical approach has found widespread use in machine translation both for written and spoken language and has had a major impact on the translation accuracy. The goal of this paper is to cover the state of the art in statistical machine translation. We would re-visit the underlying principles of the statistical approach to machine translation and summarize the progress that has been made over the last decade

关键词： Natural languages Context modeling Hidden Markov models Speech humans pattern recognition computer science Transducers Tides Vocabulary

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：