检索结果-内蒙古大学图书馆

Self-Normalized Importance Sampling for Neural language Modeling

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Yang, Zijian Gao, Yingbo Gerstenberger, Alexander Jiang, Jintao Schlüter, Ralf Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University Aachen52074 Germany AppTek GmbH Aachen52062 Germany

To mitigate the problem of having to traverse over the full vocabulary in the softmax normalization of a neural language model, sampling-based training criteria are proposed and investigated in the context of large vocabulary word-based neural language models. These training criteria typically enjoy the benefit of faster training and testing, at a cost of slightly degraded performance in terms of perplexity and almost no visible drop in word error rate. While noise contrastive estimation is one of the most popular choices, recently we show that other sampling-based criteria can also perform well, as long as an extra correction step is done, where the intended class posterior probability is recovered from the raw model outputs. In this work, we propose self-normalized importance sampling. Compared to our previous work, the criteria considered in this work are self-normalized and there is no need to further conduct a correction step. Through self-normalized language model training as well as lattice rescoring experiments, we show that our proposed self-normalized importance sampling is competitive in both research-oriented and production-oriented automatic speech recognition tasks. Copyright © 2021, The Authors. All rights reserved.

关键词： Importance sampling

Successfully Applying the Stabilized Lottery Ticket Hypothesis to the Transformer Architecture

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Brix, Christopher Bahar, Parnia Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany

Sparse models require less memory for storage and enable a faster inference by reducing the necessary number of FLOPs. This is relevant both for time-critical and on-device computations using neural networks. The stabilized lottery ticket hypothesis states that networks can be pruned after none or few training iterations, using a mask computed based on the unpruned converged model. On the transformer architecture and the WMT 2014 English→German and English→French tasks, we show that stabilized lottery ticket pruning performs similar to magnitude pruning for sparsity levels of up to 85%, and propose a new combination of pruning techniques that outperforms all other techniques for even higher levels of sparsity. Furthermore, we confirm that the parameter’s initial sign and not its specific value is the primary factor for successful training, and show that magnitude pruning cannot be used to find winning lottery tickets. Copyright © 2020, The Authors. All rights reserved.

关键词： Network architecture

An evolutionary lion optimization algorithm-based image compression technique for biomedical applications

学校读者我要写书评

暂无评论

An evolutionary lion optimization algorithm-based image comp...

作者： Geetha, Karuppaiah Anitha, Veerasamy Elhoseny, Mohamed Kathiresan, Shankar Shamsolmoali, Pourya Selim, Mahmoud M. Department of Computer Science & Engineering MIET Engineering College Tamilnadu India Department of Computer Science and Engineering Imayam College of Engineering Anna University ChennaiTamilnadu India Faculty of Computers and Information Mansoura University Mansoura Egypt Scientific Group for Research and Technology Cairo Egypt Department of Computer Applications Alagappa University Karaikudi India Huawei Technologies Corporate Shanghai China. & Institute of Image Processing and Pattern Recognition Shanghai Jiao Tong University Shanghai China Department of Mathematics Al-Aflaj College of Science and Human Studies Prince Sattam Bin Abdulaziz University Al-Kharj Saudi Arabia

Recently, medical image compression becomes essential to effectively handle large amounts of medical data for storage and communication purposes. Vector quantization (VQ) is a popular image compression technique, and the commonly used VQ model is Linde–Buzo–Gray (LBG) that constructs a local optimal codebook to compress images. The codebook construction was considered as an optimization problem, and a bioinspired algorithm was employed to solve it. This article proposed a VQ codebook construction approach called the L2-LBG method utilizing the Lion optimization algorithm (LOA) and Lempel Ziv Markov chain Algorithm (LZMA). Once LOA constructed the codebook, LZMA was applied to compress the index table and further increase the compression performance of the LOA. A set of experimentation has been carried out using the benchmark medical images, and a comparative analysis was conducted with Cuckoo Search-based LBG (CS-LBG), Firefly-based LBG (FF-LBG) and JPEG2000. The compression efficiency of the presented model was validated in terms of compression ratio (CR), compression factor (CF), bit rate, and peak signal to noise ratio (PSNR). The proposed L2-LBG method obtained a higher CR of 0.3425375 and PSNR value of 52.62459 compared to CS-LBG, FA-LBG, and JPEG2000 methods. The experimental values revealed that the L2-LBG process yielded effective compression performance with a better-quality reconstructed image. © 2020 John Wiley & Sons Australia, Ltd

关键词： Vector quantization

Phoneme based neural transducer for large vocabulary speech recognition

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Zhou, Wei Berger, Simon Schlüter, Ralf Ney, Hermann Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen52074 Germany AppTek GmbH Aachen52062 Germany

To join the advantages of classical and end-to-end approaches for speech recognition, we present a simple, novel and competitive approach for phoneme-based neural transducer modeling. Different alignment label topologies are compared and word-end-based phoneme label augmentation is proposed to improve performance. Utilizing the local dependency of phonemes, we adopt a simplified neural network structure and a straightforward integration with the external word-level language model to preserve the consistency of seq-to-seq modeling. We also present a simple, stable and efficient training procedure using frame-wise cross-entropy loss. A phonetic context size of one is shown to be sufficient for the best performance. A simplified scheduled sampling approach is applied for further improvement and different decoding approaches are briefly compared. The overall performance of our best model is comparable to state-of-the-art (SOTA) results for the TED-LIUM Release 2 and Switchboard corpora. Copyright © 2020, The Authors. All rights reserved.

关键词： Transducers

FULL-SUM DECODING FOR HYBRID HMM BASED SPEECH recognition USING LSTM language MODEL

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Zhou, Wei Schlüter, Ralf Ney, Hermann Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen52074 Germany AppTek GmbH Aachen52062 Germany

In hybrid HMM based speech recognition, LSTM language models have been widely applied and achieved large improvements. The theoretical capability of modeling any unlimited context suggests that no recombination should be applied in decoding. This motivates to reconsider full summation over the HMM-state sequences instead of Viterbi approximation in decoding. We explore the potential gain from more accurate probabilities in terms of decision making and apply the full-sum decoding with a modified prefix-tree search framework. The proposed full-sum decoding is evaluated on both Switchboard and Librispeech corpora. Different models using CE and sMBR training criteria are used. Additionally, both MAP and confusion network decoding as approximated variants of general Bayes decision rule are evaluated. Consistent improvements over strong baselines are achieved in almost all cases without extra cost. We also discuss tuning effort, efficiency and some limitations of full-sum decoding. Copyright © 2020, The Authors. All rights reserved.

关键词： Decoding

Early stage LM integration using local and global log-linear combination

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Michel, Wilfried Schlüter, Ralf Ney, Hermann Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen52056 Germany AppTek GmbH Aachen52062 Germany

Sequence-to-sequence models with an implicit alignment mechanism (e.g. attention) are closing the performance gap towards traditional hybrid hidden Markov models (HMM) for the task of automatic speech recognition. One important factor to improve word error rate in both cases is the use of an external language model (LM) trained on large text-only corpora. language model integration is straightforward with the clear separation of acoustic model and language model in classical HMM-based modeling. In contrast, multiple integration schemes have been proposed for attention models. In this work, we present a novel method for language model integration into implicit-alignment based sequence-to-sequence models. Log-linear model combination of acoustic and language model is performed with a per-token renormalization. This allows us to compute the full normalization term efficiently both in training and in testing. This is compared to a global renormalization scheme which is equivalent to applying shallow fusion in training. The proposed methods show good improvements over standard model combination (shallow fusion) on our state-of-the-art Librispeech system. Furthermore, the improvements are persistent even if the LM is exchanged for a more powerful one after training. Copyright © 2020, The Authors. All rights reserved.

关键词： Hidden Markov models

A new training pipeline for an improved neural transducer

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Zeyer, Albert Merboldt, André Schlüter, Ralf Ney, Hermann Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen52062 Germany AppTek GmbH Aachen52062 Germany

The RNN transducer is a promising end-to-end model candidate. We compare the original training criterion with the full marginalization over all alignments, to the commonly used maximum approximation, which simplifies, improves and speeds up our training. We also generalize from the original neural network model and study more powerful models, made possible due to the maximum approximation. We further generalize the output label topology to cover RNN-T, RNA and CTC. We perform several studies among all these aspects, including a study on the effect of external alignments. We find that the transducer model generalizes much better on longer sequences than the attention model. Our final transducer model outperforms our attention model on Switchboard 300h by over 6% relative WER. Copyright © 2020, The Authors. All rights reserved.

关键词： RNA

The rwth asr system for ted-lium release 2: improving hybrid hmm with specaugment

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Zhou, Wei Michel, Wilfried Irie, Kazuki Kitza, Markus Schlüter, Ralf Ney, Hermann Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen52074 Germany AppTek GmbH Aachen52062 Germany

We present a complete training pipeline to build a state-of-the-art hybrid HMM-based ASR system on the 2nd release of the TED-LIUM corpus. Data augmentation using SpecAugment is successfully applied to improve performance on top of our best SAT model using i-vectors. By investigating the effect of different maskings, we achieve improvements from SpecAugment on hybrid HMM models without increasing model size and training time. A subsequent sMBR training is applied to fine-tune the final acoustic model, and both LSTM and Transformer language models are trained and evaluated. Our best system achieves a 5.6% WER on the test set, which outperforms the previous state-of-the-art by 27% relative. Copyright © 2020, The Authors. All rights reserved.

关键词： Speech recognition

A systematic comparison of grapheme-based vs. phoneme-based label units for encoder-decoder-attention models

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Zeineldeen, Mohammad Zeyer, Albert Zhou, Wei Ng, Thomas Schlüter, Ralf Ney, Hermann Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University 52062 Aachen Germany AppTek GmbH Aachen52062 Germany

Following the rationale of end-to-end modeling, CTC, RNN-T or encoder-decoder-attention models for automatic speech recognition (ASR) use graphemes or grapheme-based subword units based on e.g. byte-pair encoding (BPE). The mapping from pronunciation to spelling is learned completely from data. In contrast to this, classical approaches to ASR employ secondary knowledge sources in the form of phoneme lists to define phonetic output labels and pronunciation lexica. In this work, we do a systematic comparison between grapheme- and phoneme-based output labels for an encoder-decoder-attention ASR model. We investigate the use of single phonemes as well as BPE-based phoneme groups as output labels of our model. To preserve a simplified and efficient decoder design, we also extend the phoneme set by auxiliary units to be able to distinguish homophones. Experiments performed on the Switchboard 300h and LibriSpeech benchmarks show that phoneme-based modeling is competitive to grapheme-based encoder-decoder-attention modeling. Copyright © 2020, The Authors. All rights reserved.

关键词： Speech recognition