检索结果-内蒙古大学图书馆

IEEE Workshop on Automatic speech Recognition and Understanding

作者： Puyang Xu Sanjeev Khudanpur Asela Gunawardana Department of Electrical & Computer Engineering Center of Language and Speech Processing Johns Hopkins University Baltimore MD USA Microsoft Research Redmond WA USA

We address the memory problem of maximum entropy language models (MELM) with very large feature sets. Randomized techniques are employed to remove all large, exact data structures in MELM implementations. To avoid the dictionary structure that maps each feature to its corresponding weight, the feature hashing trick [1] [2] can be used. We also replace the explicit storage of features with a Bloom filter. We show with extensive experiments that false positive errors of Bloom filters and random hash collisions do not degrade model performance. Both perplexity and WER improvements are demonstrated by building MELM that would otherwise be prohibitively large to estimate or store.

关键词： Vectors Training Computational modeling Entropy Dictionaries Memory management Vocabulary

来源：评论

学校读者我要写书评

暂无评论

A Universal Phoneme-Set Based language Independent Short Utterance Speaker Recognition

A Universal Phoneme-Set Based Language Independent Short Utt...

引用

第十一届全国人机语音通讯学术会议(NCMMSC2011)

作者： Nakhat FATIMA Xiaojun Wu Thomas Fang ZHENG ZHANG Chenhao WANG Gang Center for Speech and Language Technologies Division of Technical Innovation and DevelopmentTsinghua National Laboratory for Information Science and Technology Department of Computer Science and Technology Tsinghua University

来源：评论

学校读者我要写书评

暂无评论

Using Class Purity as Criterion for Speaker Clustering in Multi-Speaker Detection Tasks

Using Class Purity as Criterion for Speaker Clustering in Mu...

引用

2011年亚太信号与信息处理协会年会

作者： Thomas Fang Zheng Center for Speech and Language Technologies Division of Technical Innovation and Development Tsinghua National Laboratory for Information Science and Technology Department of Computer Science and Technology Tsinghua University

Speaker clustering is an important step in multispeaker detection tasks and its performance directly affects the speaker detection performance. It is observed that the shorter the average length of single-speaker speech segments after segmentation is, the worse performance of the following speaker recognition will be achieved, therefore a reasonable solution to better multi-speaker detection performance is to enlarge the average length of after-segmentation single-speaker speech segments, which is equivalently to cluster as many true samespeaker segments into one as possible. In other words, the average class purity of each speaker segment should be as bigger as possible. Accordingly, a speaker-clustering algorithm based on the class purity criterion is proposed, where a Reference Speaker Model (RSM) scheme is adopted to calculate the distance between speech segments, and the maximal class purity, or equivalently the minimal within-class dispersion, is taken as the criterion. Experiments on the NIST SRE 2006 database showed that, compared with the conventional Hierarchical Agglomerative Clustering (HAC) algorithm, for speech segments with average lengths of 2 seconds, 5 seconds and 8 seconds, the proposed algorithm increased the valid class speech length by 2.7%, 3.8% and 4.6%, respectively, and finally the target speaker detection recall rate was increased by 7.6%, 6.2% and 5.1%, respectively.

关键词： SRE Using Class Purity as Criterion for Speaker Clustering in Multi-Speaker Detection Tasks

来源：评论

学校读者我要写书评

暂无评论

A Universal Phoneme-Set Based language Independent Short Utterance Speaker Recognition

A Universal Phoneme-Set Based Language Independent Short Utt...

引用

第十一届全国人机语音通讯学术会议

作者： Nakhat FATIMA Thomas Fang ZHENG Center for Speech and Language Technologies Division of Technical Innovation and Development Tsinghua National Laboratory for Information Science and Technology Department of Computer Science and TechnologyTsinghua University

来源：评论

学校读者我要写书评

暂无评论

Discrimination-Emphasized Mel-Frequency-Warping for Time-Varying Speaker Recognition

Discrimination-Emphasized Mel-Frequency-Warping for Time-Var...

引用

2011年亚太信号与信息处理协会年会

Performance degradation with time varying is a generally acknowledged phenomenon in speaker recognition and it is widely assumed that speaker models should be updated from time to time to maintain representativeness. However, it is costly, user-unfriendly, and sometimes, perhaps unrealistic, which hinders the technology from practical applications. From a pattern recognition point of view, the time-varying issue in speaker recognition requires such features that are speakerspecific, and as stable as possible across time-varying sessions. Therefore, after searching and analyzing the most stable parts of feature space, a Discrimination-emphasized Mel-frequencywarping method is proposed. In implementation, each frequency band is assigned with a discrimination score, which takes into account both speaker and session information, and Melfrequency-warping is done in feature extraction to emphasize bands with higher scores. Experimental results show that in the time-varying voiceprint database, this method can not only improve speaker recognition performance with an EER reduction of 19.1%, but also alleviate performance degradation brought by time varying with a reduction of 8.9%.

关键词： session Discrimination-Emphasized Mel-Frequency-Warping for Time-Varying Speaker Recognition Mel

来源：评论

学校读者我要写书评

暂无评论

A Multi-Model Method for Short-Utterance Speaker Recognition

A Multi-Model Method for Short-Utterance Speaker Recognition

引用

2011年亚太信号与信息处理协会年会

作者： Jyh-Shing Roger Jang Thomas Fang Zheng Department of Computer Science Tsing Hua University Hsin-chu Center for Speech and Language Technologies Division of Technical Innovation and Development Tsinghua National Laboratory for Information Science and Technology Department of Computer Science and Technology Tsinghua University

The length of the test speech greatly influences the performance of GMM-UBM based text-independent speaker recognition system, for example when the length of valid speech is as short as 1～5 seconds, the performance decreases significantly because the GMM-UBM based speaker recognition method is a statistical one, of which sufficient data is the foundation. Considering that the use of text information will be helpful to speaker recognition, a multi-model method is proposed to improve short-utterance speaker recognition (SUSR) in Chinese. We build a few phoneme class models for each speaker to represent different parts of the characteristic space and fuse the scores to fit the test data on the models with the purpose of increasing the matching degree between training models and test utterance. Experimental results showed that the proposed method achieved a relative EER reduction of about 26% compared with the traditional GMM-UBM method.

关键词： GMM UBM A Multi-Model Method for Short-Utterance Speaker Recognition Model

来源：评论

学校读者我要写书评

暂无评论

An In-car Chinese Noise Corpus for speech Recognition

An In-car Chinese Noise Corpus for Speech Recognition

引用

International Conference on Asian language processing (IALP)

作者： Jue Hou Yi Liu Chao Zhang Shilei Huang Department of Computer Science and Technology Tsinghua University Beijing China Center of Speech and Language Technologies Division of Technology Innovation and Development Tsinghua National Laboratory for Information Science and Technology Beijing China Shenzhen Key Laboratory of Intelligent Media and Speech Shenzhen China

In this paper, we present an in-car Chinese noise corpus that can be used in simulating complicated car environment for robust speech recognition research and experiment. The corpus was collected in mainland China in 2009 and 2010. The corpus includes a diversity of car conditions including different car speed, open/close windows, weather conditions as well as environment conditions. Specially, the rumble strips are also taken into account due to the typical noise generated as the car is passing on. In order to use the corpus efficiently, we performed some acoustic signal analyses on those noise data, mainly focused on stationary properties and energy distribution in the frequency domain. We also performed ASR experiments using selected noise data from the corpus, by adding noise data to clean speech to simulate the in-car environment. The corpus is the first of its kind for in-car Chinese noise corpus, providing abundant and diversified samples for car noise speech recognition task.

关键词： Noise speech speech recognition Accuracy Databases Roads Noise measurement

来源：评论

学校读者我要写书评

暂无评论

MPI realization of high performance search for querying large RDF graphs using statistical semantics

MPI realization of high performance search for querying larg...

引用

1st Workshop on High-Performance Computing for the Semantic Web 2011, HPCSW 2011 - Co-located with the 8th Extended Semantic Web Conference, ESWC 2011

作者： Assel, Matthias Cheptsov, Alexey Czink, Blasius Damljanovic, Danica Quesada, Jose HLRS - High Performance Computing Center Stuttgart University of Stuttgart Nobelstrasse 19 70569 Stuttgart Germany Department of Computer Science Natural Language Processing Group University of Sheffield Regent Court 211 Portobello S1 4DP Sheffield United Kingdom Center for Adaptive Behavior and Cognition Max Planck Institute for Human Development Lentzeallee 94 14195 Berlin Germany

With billions of triples in the Linked Open Data cloud, which continues to grow exponentially, very challenging tasks begin to emerge related to the exploitation of large-scale reasoning. A considerable amount of work has been done in the area of using Information Retrieval methods to address these problems. However, although applied models work on Web scale, they downgrade the semantics contained in an RDF graph by observing each physical resource as a 'bag of words (URIs/literals)'. Distributional statistic methods can address this problem by capturing the structure of the graph more efficiently. However, these methods are continually confronting with efficiency and scalability problems on serial computing architectures due to their computational complexity. In this paper, we describe a parallelization algorithm of one such method (Random Indexing) based on the Message-Passing Interface (MPI), that enables efficient utilization of high performance parallel computers. Our evaluation results show significant performance improvement.

关键词： Message passing

来源：评论

学校读者我要写书评

暂无评论

Reliable accent specific unit generation with dynamic Gaussian mixture selection for multi-accent speech recognition

Reliable accent specific unit generation with dynamic Gaussi...

引用

IEEE International Conference on Multimedia and Expo (ICME)

作者： Chao Zhang Yi Liu Yunqing Xia Thomas Fang Zheng Jesper Olsen JiLei Tian Department of Computer Science and Technology Tsinghua University Beijing China Center of Speech and Language Technologies Division of Technology Innovation and Development Tsinghua National Laboratory for Information Science and Technology Beijing China Nokia Research Center Beijing China

Multiple accents are often present in Mandarin speech, as most Chinese have learned Mandarin as a second language. We propose generating reliable accent specific unit together with dynamic Gaussian mixture selection for multi-accent speech recognition. Time alignment phoneme recognition is used to generate such unit and to model accent variations explicitly and accurately. Dynamic Gaussian mixture selection scheme builds a dynamical observation density for each specified frame in decoding, and leads to use Gaussian mixture component efficiently. This method increases the covering ability for a diversity of accent variations in multi-accent, and alleviates the performance degradation caused by pruned beam search without augmenting the model size. The effectiveness of this approach is evaluated on three typical Chinese accents Chuan, Yue and Wu. Our approach outperforms traditional acoustic model reconstruction approach significantly by 6.30%, 4.93% and 5.53%, respectively on Syllable Error Rate (SER) reduction, without degrading on standard speech.

关键词： Acoustics Reliability speech Hidden Markov models speech recognition Decoding Adaptation models

来源：评论

学校读者我要写书评

暂无评论

Unsupervised word sense disambiguation using neighborhood knowledge

PACLIC 25 - Proceedings of the 25th Pacific Asia Conference ...

引用

PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on language, Information and Computation 2011年 333-342页

作者： Heyan, Huang Zhizhuo, Yang Ping, Jian Beijing Engineering Applications Research Center of High Volume Language Information Processing and Cloud Computing Beijing Institute of Technology No.5 Yard Zhong Guan Cun South Street Haidian District Beijing 100081 China Department of Computer Science Beijing Institute of Technology No.5 Yard Zhong Guan Cun South Street Haidian District Beijing 100081 China

ISBN: (纸本)9784905166023

Usually ambiguous words contained in article appear several times. Almost all existing methods for unsupervised word sense disambiguation make use of information contained only in ambiguous sentence. This paper presents a novel approach by considering neighborhood knowledge. The approach can naturally make full use of the within-sentence relationship from the ambiguous sentence and cross-sentence relationship from the neighborhood knowledge. Experimental results indicate the proposed method can significantly outperform the baseline method. © 2011 by Huang Heyan, Yang Zhizhuo, Jian Ping.

关键词： Information use

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：