检索结果-内蒙古大学图书馆

Proceedings - IEEE International Conference on Acoustics, speech, and Signal processing

作者： Zhang, Xianxian Hansen, John H.L. Arehart, Kathryn Robust Speech Processing Group Center for Spoken Language Research University of Colorado Boulder United States Department of Speech Science University of Colorado Boulder United States

While a number of studies have investigated various speech enhancement and noise suppression schemes, most consider either a single channel or array processing framework. Clearly there are potential advantages in leveraging the strengths of array processing solutions in suppressing noise from a direction other than the speaker, with that seen in single channel methods that include speech spectral constraints or psychoacoustically motivated processing. In this paper, we propose to integrate a combined fixed/adaptive beamforming algorithm (CFA-BF) for speech enhancement with two single channel methods based on speech spectral constrained iterative processing (Auto-LSP), and an auditory masked threshold based method using equivalent rectangular bandwidth filtering (GMMSE-AMT-ERB). After formulating the method, we evaluate performance on a subset of the TIMIT corpus with four real noise sources. We demonstrate a consistent level of noise suppression and voice communication quality improvement using the proposed method as reflected by an overall average 26dB increase in SegSNR from the original degraded audio corpus.

关键词： speech recognition

来源：评论

学校读者我要写书评

暂无评论

In-vehicle based speech processing for hearing impaired listeners 8

In-vehicle based speech processing for hearing impaired list...

引用

8th International Conference on Spoken language processing, ICSLP 2004

作者： Zhang, Xianxian Hansen, John H.L. Arehart, Kathryn Rossi-Katz, Jessica Robust Speech Processing Group Center for Spoken Language Research University of Colorado BoulderCO80309 United States Department of Speech Language and Hearing Science University of Colorado BoulderCO80309 United States

Noisy cars are very difficult listening environments for persons with hearing loss. While there have been numerous studies in the field of speech enhancement for car noise environments, the majority of these studies have focused on noise reduction for normal hearing individuals. In this paper, we present recent results in the development of more effective speech capture and enhancement processing for wireless voice interaction for persons with hearing loss in real car environments. We first present a data collection experiment for a proposed FM wireless transmission scenario using a 5-channel microphone array in the car, followed by several alternative speech enhancement algorithms. After formulating 6 different processing methods, we evaluate the performance by SegSNR improvement using data recorded in a moving car environment. Among the 6 processing configurations, the combined fixed/adaptive beamforming (CFA-BF) obtains the highest level of SegSNR improvement by up to 2.65 dB.

关键词： processing

来源：评论

学校读者我要写书评

暂无评论

A flexible example annotation schema: Translation corresponding tree representation 20

A flexible example annotation schema: Translation correspond...

引用

20th International Conference on Computational Linguistics, COLING 2004

作者： Wong, Fai Hu, Dong Cheng Mao, Yu Hang Dong, Ming Chui Speech and Language Processing Research Center Tsinghua University Beijing100084 China Faculty of Science and Technology University of Macao PO Box 3001 China

This paper presents work on the task of constructing an example base from a given bilingual corpus based on the annotation schema of Translation Corresponding Tree (TCT). Each TCT describes a translation example (a pair of bilingual sentences). It represents the syntactic structure of source language sentence, and more importantly is the facility to specify the correspondences between string (both the source and target sentences) and the representation tree. Furthermore, syntax transformation clues are also encapsulated at each node in the TCT representation to capture the differentiation of grammatical structure between the source and target languages. With this annotation schema, translation examples are effectively represented and organized in the bilingual knowledge database that we need for the Portuguese to Chinese machine translation system. © 2004 COLING 2004 - Proceedings of the 20th International Conference on Computational Linguistics. All rights reserved.

关键词： Syntactics

来源：评论

学校读者我要写书评

暂无评论

An Integrated method for Chinese unknown word extraction 3

An Integrated method for Chinese unknown word extraction

引用

3rd SIGHAN Workshop on Chinese language processing, SIGHAN@ACL 2004

作者： Luo, Zhiyong Song, Rou College of Computer Science Beijing University of Technology Beijing100022 China Center for Language Information Processing Beijing Language and Culture University Beijing100083 China

Unknown word recognition is an important problem in Chinese word segmentation systems. In this paper, we propose an integrated method for Chinese unknown word extraction for offline corpus processing, in which both context-entropy (on each side) and frequency ratio against background corpus are introduced to evaluate the candidate words. Both of the measures are computed efficiently on Suffix array with much less space overhead. Our method can also be reinforced when combined with a basic Segmentor by boundary-verification and arbitrary n-gram words can be extracted by our method. We test our method on Chinese novel Xiao Ao Jiang Hu, and obtain satisfactory achievements compared to traditional criteria such as Likelihood Ratio. © SIGHAN@ACL 2004 .All right reserved.

关键词： Extraction

来源：评论

学校读者我要写书评

暂无评论

Maximum entropy modeling in sparse semantic tagging

Maximum entropy modeling in sparse semantic tagging

引用

2004 Human language Technology Conference of the North American Chapter of the Association for Computational Linguistics - Student Research Workshop, HLT-NAACL 2004

作者： Cui, Jia Guthrie, David Center for Language and Speech Processing Johns Hopkins University BaltimoreMD21210 United States Department of Computer Science University of Sheffield SheffieldS1 4DP United Kingdom

In this work, we are concerned with a coarse grained semantic analysis over sparse data, which labels all nouns with a set of semantic categories. To get the benefit of unlabeled data, we propose a bootstrapping framework with Maximum Entropy modeling (MaxEnt) as the statistical learning component. During the iterative tagging process, unlabeled data is used not only for better statistical estimation, but also as a medium to integrate non-statistical knowledge into the model training. Two main issues are discussed in this paper. First, Association Rule principles are suggested to guide MaxEnt feature selections. Second, to guarantee the convergence of the bootstrapping process, three adjusting strategies are proposed to soft tag unlabeled data. © HLT-NAACL 2004 - Human language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Student Research Workshop.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

Detection of voice onset time (VOT) for unvoived stops (/p/, /t/, /k/) using the teager energy operator (TEO) for automatic detection of accented english

Detection of voice onset time (VOT) for unvoived stops (/p/,...

引用

Proceedings of the 6th Nordic Signal processing Symposium, NORSIG 2004

作者： Das, Sharmistha Hansen, John H.L. Department of Speech Science University of Colorado Boulder CO 80309-0594 United States Robust Speech Processing Group Center for Spoken Language Research University of Colorado Boulder CO 80309-0594 United States

Voice Onset Time (VOT) is an important temporal feature in speech perception and speech recognition. It also benefits for accent detection[1,2]. Fixed length frame based speech processing inherently ignores VOT. In this paper we propose a more effective VOT detection scheme using the non-linear energy tracking algorithm (Teager Energy Operator (TEO)) across a sub-frequency band partition for unvoiced stops (p, t and k). The VOT detection algorithm is applied to the problem of accent classification. Three different language groups (Indian, Chinese and American English) are used from CU-Accent-Corpus to compare VOT's of both accented and native American English. Some pathological cases are considered where speakers have breathy voices or other issues in recording procedure. The VOT is detected with less than 10% error when compared to the manual detected VOT. Also, pairwise English accent classification are 87% for Chinese accent, 80% for English accent, and 47% for Indian accent (includes atypical cases for Indian case).

关键词： speech recognition

来源：评论

学校读者我要写书评

暂无评论

Audio-visual speaker localization for car navigation systems 8

Audio-visual speaker localization for car navigation systems

引用

8th International Conference on Spoken language processing, ICSLP 2004

作者： Zhang, Xianxian Takeda, Kazuya Hansen, John H.L. Maeno, Toshiki Robust Speech Processing Group Center for Spoken Language Research University of Colorado BoulderCO United States Graduate School of Information Science Nagoya University Nagoya Japan

Human-computer interaction for in-vehicle information and navigation systems is a challenging problem because of the diverse and changing acoustic environments. It is proposed that the integration of video and audio information can significantly improve dialog system performance, since the visual modality is not impacted by acoustic noise. In this paper, we propose a robust audio-visual integration system for source tracking and speech enhancement for an in-vehicle speech dialog system. The proposed system integrates both audio and visual information to locate the desired speaker source. Using real data collected in car environments, the proposed system can improve speech accuracy by up to 40.75% compared with audio data alone.

关键词： speech enhancement

来源：评论

学校读者我要写书评

暂无评论

Using random forests in the structured language model 04

Using random forests in the structured language model

引用

Proceedings of the 18th International Conference on Neural Information processing Systems

作者： Peng Xu Frederick Jelinek Center for Language and Speech Processing Department of Electrical and Computer Engineering The Johns Hopkins University

In this paper, we explore the use of Random Forests (RFs) in the structured language model (SLM), which uses rich syntactic information in predicting the next word based on words already seen. The goal in this work is to construct RFs by randomly growing Decision Trees (DTs) using syntactic information and investigate the performance of the SLM modeled by the RFs in automatic speech ***, which were originally developed as classifiers, are a combination of decision tree classifiers. Each tree is grown based on random training data sampled independently and with the same distribution for all trees in the forest, and a random selection of possible questions at each node of the decision tree. Our approach extends the original idea of RFs to deal with the data sparseness problem encountered in language *** have been studied in the context of n-gram language modeling and have been shown to generalize well to unseen data. We show in this paper that RFs using syntactic information can also achieve better performance in both perplexity (PPL) and word error rate (WER) in a large vocabulary speech recognition system, compared to a baseline that uses Kneser-Ney smoothing.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Interpolated probabilistic tagging model optimized with genetic algorithm

Interpolated probabilistic tagging model optimized with gene...

引用

International Conference on Machine Learning and Cybernetics (ICMLC)

作者： Fai Wong S. Chao Dong-Cheng Hu Yu-Hang Mao Faculty of Science and Technology University of Macau Macau Macao China Speech and Language Processing Research Center Tsinghua University Beijing Beijing China

We present results of probabilistic tagging of Portuguese texts in order to show how these techniques work for one of the highly morphologically ambiguous inflective languages by using a limited corpus as the basic training source. In order to cope the ambiguities problem caused by the insufficient training data, especially the unknown words, we incorporate the lexical features into the probabilistic model. Different from other proposed tagging models, these features are introduced into the word probabilities by means of interpolation. A technique to determine the optimal set of interpolation parameters based on genetic algorithm is described. Our preliminary result shows that we can correctly tag 91.8% of the sentences based on our tagging model.

关键词： Tagging Genetic algorithms Natural languages speech Training data Interpolation Natural language processing Probability Statistical analysis Chaos

来源：评论

学校读者我要写书评

暂无评论

Effects of Spectro-Temporal Asynchrony in Auditory and Auditory-Visual speech processing

引用

Seminars in Hearing 2004年第3期25卷 241-255页

作者： Ken W. Grant Steven Greenberg David Poeppel Virginie van Wassenhove 1 Auditory-Visual Speech Recognition Laboratory Walter Reed Army Medical Center Army Audiology and Speech Center Washington District of Columbia 2 International Computer Speech Institute Berkeley California 3 Cognitive Neuroscience of Language Laboratory Neuroscience and Cognitive Science Program (NACS) Department of Biology and Department of Linguistics University of Maryland College Park Maryland

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：