检索结果-内蒙古大学图书馆

作者： Valente, Fabio Doss, Mathew Magimai Plahl, Christian Ravuri, Suman Wang, Wen IDIAP Research Institute CH-1920 Martigny Switzerland Human Language Technology and Pattern Recognition RWTH Aachen University Germany International Computer Science Institute 1947 Center Street Berkeley CA 94704 United States Speech Technology and Research Laboratory SRI International Menlo Park CA United States

MLP based front-ends have shown significant complementary properties to conventional spectral features. As part of the DARPA GALE program, different MLP features were developed for Mandarin ASR. In this paper, all the proposed frontends are compared in systematic manner and we extensively investigate the scalability of these features in terms of the amount of training data (from 100 hours to 1600 hours) and system complexity (maximum likelihood training, SAT, lattice level combination, and discriminative training). Results on 5 hours of evaluation data from the GALE project reveal that the MLP features consistently produce relative improvements in the range of 15% - 23% at the different steps of a multipass system when compared to the conventional short-term spectral based features like MFCC and PLP. The largest improvement is obtained using a hierarchical MLP approach. © 2010 ISCA.

关键词： Maximum likelihood

来源：评论

学校读者我要写书评

暂无评论

Face-sketch learning with human sketch-drawing order enforcement

引用

science China(Information sciences) 2020年第11期63卷 298-311页

作者： Liang CHANG Lihua JIN Lifen WENG Wentao CHAO Xuguang WANG Xiaoming DENG Qiulei DONG School of Artificial Intelligence Beijing Normal University Department of Design Art Xiamen University of Technology Department of Automation North China Electric Power University Beijing Key Laboratory of Human Computer Interactions Institute of Software Chinese Academy of Sciences National Laboratory of Pattern Recognition Institute of Automation Chinese Academy of Sciences School of Artificial Intelligence University of Chinese Academy of Sciences Center for Excellence in Brain Science and Intelligence Technology Chinese Academy of Sciences

Dear editor,Although face-sketch synthesis generates a sketch from a given face photo automatically [1], it is an open research problem in computer vision [2–4]. Recently, several deep neural network (DNN)methods for... 详细信息

关键词： face sketch synthesis deep neural network order enforcement image synthesis generative adversarial network

来源：评论

学校读者我要写书评

暂无评论

Non-stationary feature extraction for automatic speech recognition

Non-stationary feature extraction for automatic speech recog...

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Zoltán Tüske Pavel Golik Ralf Schlüter Friedhelm R. Drepper Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen Germany Zentralinstitut für Elektronik Forschungszentrum Jülich (KFA) Julich Germany

In current speech recognition systems mainly Short-Time Fourier Transform based features like MFCC are applied. Dropping the short-time stationarity assumption of the voiced speech, this paper introduces the non-stationary signal analysis into the ASR framework. We present new acoustic features extracted by a pitch-adaptive Gammatone filter bank. The noise robustness was proved on AURORA 2 and 4 tasks, where the proposed features outperform the standard MFCC. Furthermore, successful combination experiments via ROVER indicate the differences between the new features and MFCC.

关键词： Mel frequency cepstral coefficient Harmonic analysis Feature extraction Speech Speech recognition Time frequency analysis

来源：评论

学校读者我要写书评

暂无评论

On the choice of modeling unit for sequence-to-sequence speech recognition

arXiv

引用

arXiv 2019年

作者： Irie, Kazuki Prabhavalkar, Rohit Kannan, Anjuli Bruguier, Antoine Rybach, David Nguyen, Patrick Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany Google Mountain ViewCA94043 United States

来源：评论

学校读者我要写书评

暂无评论

Controllable Factuality in Document-Grounded Dialog Systems Using a Noisy Channel Model

arXiv

引用

arXiv 2022年

作者： Daheim, Nico Thulke, David Dugast, Christian Ney, Hermann Ubiquitous Knowledge Processing Lab Department of Computer Science Technical University of Darmstadt Germany Human Language Technology and Pattern Recognition RWTH Aachen University Germany AppTek GmbH Germany

In this work, we present a model for document-grounded response generation in dialog that is decomposed into two components according to Bayes' theorem. One component is a traditional ungrounded response generation model and the other component models the reconstruction of the grounding document based on the dialog context and generated response. We propose different approximate decoding schemes and evaluate our approach on multiple open-domain and task-oriented document-grounded dialog datasets. Our experiments show that the model is more factual in terms of automatic factuality metrics than the baseline model. Furthermore, we outline how introducing scaling factors between the components allows for controlling the tradeoff between factuality and fluency in the model output. Finally, we compare our approach to a recently proposed method to control factuality in grounded dialog, CTRL (Rashkin et al., 2021), and show that both approaches can be combined to achieve additional improvements. © 2022, CC BY.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Skin-color based videos categorization

引用

International Journal of computer science Issues 2012年第1 1-3期9卷 473-477页

作者： Khan, Rehanullah Maqsood, Asad Khan, Zeeshan Ishaq, Muhammad Arif, Arsalan Sarhad University of Science and Information Technology Peshawar Pakistan RWTH Aachen Human Language Technology and Pattern Recognition Peshawar Pakistan UET Mardan Peshawar Pakistan

On dedicated websites, people can upload videos and share it with the rest of the world. Currently these videos are categorized manually by the help of the user community. In this paper, we propose a combination of color spaces with the Bayesian network approach for robust detection of skin color followed by an automated video categorization. Experimental results show that our method can achieve satisfactory performance for categorizing videos based on skin color. © 2012 International Journal of computer science Issues.

关键词： Bayesian networks

来源：评论

学校读者我要写书评

暂无评论

RADMM: RECURRENT ADAPTIVE MIXTURE MODEL WITH APPLICATIONS TO DOMAIN ROBUST language MODELING

RADMM: RECURRENT ADAPTIVE MIXTURE MODEL WITH APPLICATIONS TO...

引用

IEEE International Conference on Acoustics, Speech and Signal Processing

作者： Kazuki Irie Shankar Kumar Michael Nirschl Hank Liao Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University D-52056 Aachen Germany Google Inc. New York NY 10011 USA

ISBN: (纸本)9781538646595

We present a new architecture and a training strategy for an adaptive mixture of experts with applications to domain robust language modeling. The proposed model is designed to benefit from the scenario where the training data are available in diverse domains as is the case for YouTube speech recognition. The two core components of our model are an ensemble of parallel long short-term memory (LSTM) expert layers for each domain and another LSTM based network which generates state dependent mixture weights for combining expert LSTM states by linear interpolation. The resulting model is a recurrent adaptive mixture model (RADMM) of domain experts. We train our model on 4.4B words from YouTube speech recognition data. We report results on the YouTube speech recognition test set. Compared with a background LSTM model, we obtain up to 12% relative improvement in perplexity and an improvement in word error rate from 12.3% to 12.1% while using a lattice rescoring with strong pruning.

关键词： language modeling neural networks speech recognition mixture of experts domain adaptation modelling languages Speech recognition Neural network Professional personnel YouTube Mixture models training policy new buildings

来源：评论

学校读者我要写书评

暂无评论

Warp that smile on your face: Optimal and smooth deformations for face recognition

Warp that smile on your face: Optimal and smooth deformation...

引用

International Conference on Automatic Face and Gesture recognition

作者： Tobias Gass Leonid Pishchulin Philippe Dreuw Hermann Ney Human Language Technology and Pattern Recognition Group RWTH Aachen University Germany Computer Vision Laboratory ETH Zurich Switzerland Computer Vision and Multimodal Computing MPI Informatics Saarbruecken Germany

In this work, we present novel warping algorithms for full 2D pixel-grid deformations for face recognition. Due to high variation in face appearance, face recognition is considered a very difficult task, especially if only a single reference image, for example a mug-shot, per face is available. Usually model-based approaches with additional training data are used to cope with several types of variation occurring in facial imaging. Image warping contrarily yields a distance measure which is invariant with regard to several types of variation. This allows for precise recognition even using only very few reference observations. Due to the computationally complex problem of optimal 2D warping, pseudo-2D warping-based approaches in the past represented strong approximations of the original problem, and were mainly successful on data with low variability or rectified images. We propose a novel 2D warping method which is globally optimal and makes no prior assumtions on the data variability besides two-dimensional smootheness constraints which both avoid local mirroring and gaps and significantly speed up the optimization. Furthermore, we show that occlusion handling is imperative to obtain smooth warpings in a variety of domains. We evaluate our novel algorithm on various well known databases, such as the AR-Face and CMU-PIE database, and provide a detailed comparison to existing warping approaches. We show that by using simple relative 2D constraints, strong local features and a kernel, which is robust w.r.t. occlusions, our computationally complex approaches outperform state-of-the-art results for recognizing faces under varying expressions, occlusions and poses. Most interestingly, we achieve higher accuracy using fewer training instances per class compared to methods learning a model of the 3D shape.

关键词： Pixel Face Face recognition Optimization Databases Approximation methods Hidden Markov models

来源：评论

学校读者我要写书评

暂无评论

Complementary fusion of multi-features and multi-modalities in sentiment analysis

arXiv

引用

arXiv 2019年

作者： Chen, Feiyang Luo, Ziqian Xu, Yanyan Ke, Dengfeng Department of Computer Science and Technology Beijing Forestry University School of Computer Science Language Technologies Institute Carnegie Mellon University National Laboratory of Pattern Recognition Institute of Automation Chinese Academy of Sciences

Sentiment analysis, mostly based on text, has been rapidly developing in the last decade and has attracted widespread attention in both academia and industry. However, information in the real world usually comes from multiple modalities, such as audio and text. Therefore, in this paper, based on audio and text, we consider the task of multimodal sentiment analysis and propose a novel fusion strategy including both multi-feature fusion and multi-modality fusion to improve the accuracy of audio-text sentiment analysis. We call it the DFF-ATMF (Deep Feature Fusion - Audio and Text Modality Fusion) model, which consists of two parallel branches, the audio modality based branch and the text modality based branch. Its core mechanisms are the fusion of multiple feature vectors and multiple modality attention. Experiments on the CMU-MOSI dataset and the recently released CMUMOSEI dataset, both collected from YouTube for sentiment analysis, show the very competitive results of our DFF-ATMF model. Furthermore, by virtue of attention weight distribution heatmaps, we also demonstrate the deep features learned by using DFF-ATMF are complementary to each other and robust. Surprisingly, DFF-ATMF also achieves new state-ofthe-art results on the IEMOCAP dataset, indicating that the proposed fusion strategy also has a good generalization ability for multimodal emotion recognition. Copyright © 2019, The Authors. All rights reserved.

关键词： Sentiment analysis

来源：评论

学校读者我要写书评

暂无评论

Mobile music modeling, analysis and recognition

Mobile music modeling, analysis and recognition

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Pavel Golik Boulos Harb Ananya Misra Michael Riley Alex Rudnick Eugene Weinstein Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen Germany Google Inc. New York NY USA School of Informatics and Computing Indiana University Bloomington IN USA

We present an analysis of music modeling and recognition techniques in the context of mobile music matching, substantially improving on the techniques presented in [1]. We accomplish this by adapting the features specifically to this task, and by introducing new modeling techniques that enable using a corpus of noisy and channel-distorted data to improve mobile music recognition quality. We report the results of an extensive empirical investigation of the system's robustness under realistic channel effects and distortions. We show an improvement of recognition accuracy by explicit duration modeling of music phonemes and by integrating the expected noise environment into the training process. Finally, we propose the use of frame-to-phoneme alignment for high-level structure analysis of polyphonic music.

关键词： Training Accuracy Hidden Markov models Music Speech recognition USA Councils

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：