检索结果-内蒙古大学图书馆

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Amr El-Desoky Mousa Ralf Schlüter Hermann Ney Human Language Technology and Pattern Recognition-Computer Science Department RWTH Aachen University Aachen Germany

A major challenge for Arabic Large Vocabulary Continuous Speech recognition (LVCSR) is the rich morphology of Arabic, which leads to high Out-of-vocabulary (OOV) rates, and poor language Model (LM) probabilities. In such cases, the use of morphemes rather than full-words is considered a better choice for LMs. Thereby, higher lexical coverage and less LM perplexities are achieved. On the other side, an effective way to increase the robustness of LMs is to incorporate features of words into LMs. In this paper, we investigate the use of features derived for morphemes rather than words. Thus, we combine the benefits of both morpheme level and feature rich modeling. We compare the performance of stream-based, class-based and Factored LMs (FLMs) estimated over sequences of morphemes and their features for performing Arabic LVCSR. A relative reduction of 3.9% in Word Error Rate (WER) is achieved compared to a word-based system.

关键词： Computational modeling Mathematical model Lattices humans Speech recognition USA Councils Interpolation

来源：评论

学校读者我要写书评

暂无评论

Comparison and combination of different CRBE based MLP features for LVCSR

Comparison and combination of different CRBE based MLP featu...

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Zoltán Tüske Ralf Schlüter Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen Germany

Multi Layer Perceptron (MLP) features extracted from different types of critical band energies (CRBE) - derived from MFCC, GT, and PLP pipeline - are compared on French broadcast news and conversational speech recognition task. Though the MLP structure is kept fixed, ROVER combination of different CRBE based systems leads to 4% relative improvement. Furthermore, aiming at the combination of state-of-the-art features based on various signal analysis methods into one single stream, posterior feature space based combination technique is proposed. The speaker normalized features originated from different CRBEs are merged after additional MLP training by Dempster-Shafer rule. The performance of these posterior features unifying the different CRBE based features is superior to the best single CRBE based posterior features by 6% relative. Further results reveal that the concatenated cepstral and unified posterior features perform nearly as well as the ROVER combination of the different CRBE based systems.

关键词： Feature extraction Mel frequency cepstral coefficient Speech Training Hidden Markov models

来源：评论

学校读者我要写书评

暂无评论

Phase difference of filter-stable part-tones as acoustic feature

Phase difference of filter-stable part-tones as acoustic fea...

引用

IEEE/SP Workshop on Statistical Signal Processing (SSP)

作者： Zoltán Tüske Friedhelm R. Drepper Ralf Schlüter Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen Germany

A part-tone decomposition of voiced sections of speech is introduced, which is adapted with high accuracy to the frequency of the glottal oscillator of the speaker. The iterative replacement of the center filter frequency contours (chosen locally as linear chirp) of the non-stationary bandpass filters converges extremely fast and leads to the extraction of filter-stable part-tones with uncorrupted phases. In contrast to phases of frequency decomposition with a priori defined, constant filter frequencies, the phase differences of filter-stable part-tones promise to become a useful supplement of the amplitude based acoustic features used for conventional automatic speech recognition. The derived phase features are tested in vowel classification experiments based on the phonetically rich TIMIT database.

关键词： Speech Harmonic analysis Equations Time frequency analysis Mel frequency cepstral coefficient Speech processing

来源：评论

学校读者我要写书评

暂无评论

Silence is golden: Modeling non-speech events in WFST-based dynamic network decoders

Silence is golden: Modeling non-speech events in WFST-based ...

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： David Rybach Ralf Schlüter Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen Germany

Models for silence are a fundamental part of continuous speech recognition systems. Depending on application requirements, audio data segmentation, and availability of detailed training data annotations, it may be necessary or beneficial to differentiate between other non-speech events, for example breath and background noise. The integration of multiple non-speech models in a WFST-based dynamic network decoder is not straightforward, because these models do not perfectly fit in the transducer framework. This paper describes several options for the transducer construction with multiple non-speech models, shows their considerable different characteristics in memory and runtime efficiency, and analyzes the impact on the recognition performance.

关键词： Hidden Markov models Transducers Decoding Context Speech Noise Speech recognition

来源：评论

学校读者我要写书评

暂无评论

Moment-Based Image Normalization for Handwritten Text recognition

Moment-Based Image Normalization for Handwritten Text Recogn...

引用

International Workshop on Frontiers in Handwriting recognition

作者： Michal Kozielski Jens Forster Hermann Ney Human Language Technology and Pattern Recognition Group Chair of Computer Science 6 RWTH Aachen University Aachen Germany

In this paper, we extend the concept of moment-based normalization of images from digit recognition to the recognition of handwritten text. Image moments provide robust estimates for text characteristics such as size and position of words within an image. For handwriting recognition the normalization procedure is applied to image slices independently. Additionally, a novel moment-based algorithm for line-thickness normalization is presented. The proposed normalization methods are evaluated on the RIMES database of French handwriting and the IAM database of English handwriting. For RIMES we achieve an improvement from 16.7% word error rate to 13.4% and for IAM from 46.6% to 37.3%.

关键词： Hidden Markov models Databases Shape Image segmentation Vectors Handwriting recognition Error analysis

来源：评论

学校读者我要写书评

暂无评论

The RWTH Aachen Machine Translation System for WMT 2012 12

The RWTH Aachen Machine Translation System for WMT 2012

引用

Workshop on Statistical Machine Translation

作者： Matthias Huck Stephan Peitz Markus Freitag Malte Nuhn Hermann Ney Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University D-52056 Aachen Germany

ISBN: (纸本)9781622765928

This paper describes the statistical machine translation (SMT) systems developed at RWTH Aachen University for the translation task of the NAACL 2012 Seventh Workshop on Statistical Machine Translation (WMT 2012). We participated in the evaluation campaign for the French-English and German-English language pairs in both translation directions. Both hierarchical and phrase-based SMT systems are applied. A number of different techniques are evaluated, including an insertion model, different lexical smoothing methods, a discriminative reordering extension for the hierarchical system, reverse translation, and system combination. By application of these methods we achieve considerable improvements over the respective baseline systems.

关键词： machine translation system machine translation Surface mount technology Hierarchical application methods Translations Translation Translation Process smoothing methods Hierarchical systems

来源：评论

学校读者我要写书评

暂无评论

Performance analysis of Neural Networks in combination with n-gram language models

Performance analysis of Neural Networks in combination with ...

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Ilya Oparin Martin Sundermeyer Hermann Ney Jean-Luc Gauvain LIMSI CNRS Spoken Language Processing Group France Computer Science Department Human Language Technology and Pattern Recognition RWTH Aachen University Germany

Neural Network language models (NNLMs) have recently become an important complement to conventional n-gram language models (LMs) in speech-to-text systems. However, little is known about the behavior of NNLMs. The analysis presented in this paper aims to understand which types of events are better modeled by NNLMs as compared to n-gram LMs, in what cases improvements are most substantial and why this is the case. Such an analysis is important to take further benefit from NNLMs used in combination with conventional n-gram models. The analysis is carried out for different types of neural network (feed-forward and recurrent) LMs. The results showing for which type of events NNLMs provide better probability estimates are validated on two setups that are different in their size and the degree of data homogeneity.

关键词： Artificial neural networks History Analytical models Training data Vocabulary Interpolation

来源：评论

学校读者我要写书评

暂无评论

Skin-color based videos categorization

引用

International Journal of computer science Issues 2012年第1 1-3期9卷 473-477页

作者： Khan, Rehanullah Maqsood, Asad Khan, Zeeshan Ishaq, Muhammad Arif, Arsalan Sarhad University of Science and Information Technology Peshawar Pakistan RWTH Aachen Human Language Technology and Pattern Recognition Peshawar Pakistan UET Mardan Peshawar Pakistan

On dedicated websites, people can upload videos and share it with the rest of the world. Currently these videos are categorized manually by the help of the user community. In this paper, we propose a combination of color spaces with the Bayesian network approach for robust detection of skin color followed by an automated video categorization. Experimental results show that our method can achieve satisfactory performance for categorizing videos based on skin color. © 2012 International Journal of computer science Issues.

关键词： Bayesian networks

来源：评论

学校读者我要写书评

暂无评论

Mobile music modeling, analysis and recognition

Mobile music modeling, analysis and recognition

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Pavel Golik Boulos Harb Ananya Misra Michael Riley Alex Rudnick Eugene Weinstein Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen Germany Google Inc. New York NY USA School of Informatics and Computing Indiana University Bloomington IN USA

We present an analysis of music modeling and recognition techniques in the context of mobile music matching, substantially improving on the techniques presented in [1]. We accomplish this by adapting the features specifically to this task, and by introducing new modeling techniques that enable using a corpus of noisy and channel-distorted data to improve mobile music recognition quality. We report the results of an extensive empirical investigation of the system's robustness under realistic channel effects and distortions. We show an improvement of recognition accuracy by explicit duration modeling of music phonemes and by integrating the expected noise environment into the training process. Finally, we propose the use of frame-to-phoneme alignment for high-level structure analysis of polyphonic music.

关键词： Training Accuracy Hidden Markov models Music Speech recognition USA Councils

来源：评论

学校读者我要写书评

暂无评论

The RWTH 2010 Quaero ASR evaluation system for English, French, and German

The RWTH 2010 Quaero ASR evaluation system for English, Fren...

引用

36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011

作者： Sundermeyer, M. Nussbaum-Thom, M. Wiesler, S. Plahl, C. El-Desoky Mousa, A. Hahn, S. Nolden, D. Schlüter, R. Ney, H. Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Germany

ISBN: (纸本)9781457705397

关键词： Speech recognition

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：