检索结果-内蒙古大学图书馆

Combining classifiers for word sense disambiguation

Natural language Engineering 2002年第4期8卷 327-341页

作者： Florian, Radu Cucerzan, Silviu Schafer, Charles Yarowsky, David Department of Computer Science and Center for Language and Speech Processing Johns Hopkins University MD 21218 United States

Classifier combination is an effective and broadly useful method of improving system performance. This article investigates in depth a large number of both well-established and novel classifier combination approaches for the word sense disambiguation task, studied over a diverse classifier pool which includes feature-enhanced Naive Bayes, Cosine, Decision List, Transformation-based Learning and MMVC classifiers. Each classifier has access to the same rich feature space, comprised of distance weighted bag-of-lemmas, local ngram context and specific syntactic relations, such as Verb-Object and Noun-Modifier. This study examines several key issues in system combination for the word sense disambiguation task, ranging from algorithmic structure to parameter estimation. Experiments using the standard senseval2 lexical-sample data sets in four languages (English, Spanish, Swedish and Basque) demonstrate that the combination system obtains a significantly lower error rate when compared with other systems participating in the senseval2 exercise, yielding state-of-the-art performance on these data sets. © 2002, Cambridge University Press. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Bootstrapping a Multilingual Part-of-speech Tagger in One Person-day 6

Bootstrapping a Multilingual Part-of-speech Tagger in One Pe...

引用

6th Conference on Natural language Learning, CoNLL 2002

作者： Cucerzan, Silviu Yarowsky, David Department of Computer Science Center for Language and Speech Processing Johns Hopkins University BaltimoreMD21218 United States

This paper presents a method for bootstrapping a fine-grained, broad-coverage part-of-speech (POS) tagger in a new language using only one person-day of data acquisition effort. It requires only three resources, which are currently readily available in 60-100 world languages: (1) an online or hard-copy pocket-sized bilingual dictionary, (2) a basic library reference grammar, and (3) access to an existing monolingual text corpus in the language. The algorithm begins by inducing initial lexical POS distributions from English translations in a bilingual dictionary without POS tags. It handles irregular, regular and semi-regular morphology through a robust generative model using weighted Levenshtein alignments. Unsupervised induction of grammatical gender is performed via global modeling of context-window feature agreement. Using a combination of these and other evidence sources, interactive training of context and lexical prior models are accomplished for fine-grained POS tag spaces. Experiments show high accuracy, fine-grained tag resolution with minimal new human effort. © 2002 Proceedings of the Annual Meeting of the Association for Computational Linguistics. All Rights Reserved.

关键词： Data acquisition

来源：评论

学校读者我要写书评

暂无评论

Inducing Translation Lexicons via Diverse Similarity Measures and Bridge languages 6

Inducing Translation Lexicons via Diverse Similarity Measure...

引用

6th Conference on Natural language Learning, CoNLL 2002

作者： Schafer, Charles Yarowsky, David Department of Computer Science Center for Language and Speech Processing Johns Hopkins University BaltimoreMD21218 United States

This paper presents a method for inducing translation lexicons between two distant languages without the need for either parallel bilingual corpora or a direct bilingual seed dictionary. The algorithm successfully combines temporal occurrence similarity across dates in news corpora, wide and local cross-language context similarity, weighted Levenshtein distance, relative frequency and burstiness similarity measures. These similarity measures are integrated with the bridge language concept under a robust method of classifier combination for both the Slavic and Northern Indian language families. © 2002 Proceedings of the Annual Meeting of the Association for Computational Linguistics. All Rights Reserved.

关键词： Translation (languages)

来源：评论

学校读者我要写书评

暂无评论

language Independent NER using a Unified Model of Internal and Contextual Evidence 6

Language Independent NER using a Unified Model of Internal a...

引用

6th Conference on Natural language Learning, CoNLL 2002

作者： Cucerzan, Silviu Yarowsky, David Department of Computer Science Center for Language and Speech Processing Johns Hopkins University BaltimoreMD21218 United States

This paper investigates the use of a language independent model for named entity recognition based on iterative learning in a co-training fashion, using word-internal and contextual information as independent evidence sources. Its bootstrapping process begins with only seed entities and seed contexts extracted from the provided annotated corpus. F-measure exceeds 77 in Spanish and 72 in Dutch. © 2002 Proceedings of the Annual Meeting of the Association for Computational Linguistics. All Rights Reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Named Entity Recognition as a House of Cards: Classifier Stacking 6

Named Entity Recognition as a House of Cards: Classifier Sta...

引用

6th Conference on Natural language Learning, CoNLL 2002

作者： Florian, Radu Department of Computer Science Center for Language and Speech Processing Johns Hopkins University 3400 N. Charles St. BaltimoreMD21218 United States

来源：评论

学校读者我要写书评

暂无评论

Perception of tone and vowel quantity in Thai 7

Perception of tone and vowel quantity in Thai

引用

7th International Conference on Spoken language processing, ICSLP 2002

作者： Mixdorff, Hansjörg Luksaneeyanawin, Sudaporn Fujisaki, Hiroya Charnvivit, Patavee Faculty of Computer Science Berlin University of Applied Sciences Germany Center for Research in Speech and Language Processing Chulalongkorn University Thailand Emeritus University of Tokyo Japan

The current study examines the interaction of syllable tones and vowel quantity in the production and perception of monosyllabic words of Thai. A speech corpus containing groups of words differing only as to tone type and vowel quantity was designed. These were embedded in a short carrier sentence of five mid tone syllables, with the target word being the center syllable. The utterances were analyzed with respect to the tonal and segmental features of the target words and F0 contours modeled using the Fujisaki model. Analysis shows that all mid tone sequences can be modeled using the phrase component only whereas the remaining tones require either single tone commands of positive or negative polarity, or a command pair. Based on the analysis results, a perception experiment was designed to explore the perceptual space between words of tone/vowel quantity contrasts. Results indicate, inter alia, that vowel quantity is perceived as shorter when words are presented in isolation than when embedded in a carrier sentence. Confusions generally occur more frequently between words of different vowel quantity than of different tones.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Reducing pronunciation lexicon confusion and using more data without phonetic transcription for pronunciation modeling 7

Reducing pronunciation lexicon confusion and using more data...

引用

7th International Conference on Spoken language processing, ICSLP 2002

作者： Zheng, Fang Song, Zhanjiang Fung, Pascale Byrne, William Center of Speech Technology State Key Lab of Intelligent Technology and Systems Department of Computer Science and Technology Tsinghua Univ. Beijing100084 China Department of Electrical and Electronic Engineering Hong Kong Univ. of Science and Technology Hong Kong Center for Language and Speech Processing Johns Hopkins Univ. United States Beijing D-Ear Technologies Co. Ltd. China

The multiple-pronunciation lexicon (MPL) is very important to model the pronunciation variations for spontaneous speech recognition. But the introduction of MPL brings out two problems. First, the MPL will increase the among-lexicon confusion and degrade the recognizer's performance. Second, the MPL needs more data with phonetic transcription so as to cover as many surface forms as possible. Accordingly, two solutions are proposed, they are the context-dependent weighting method and the iterative forced-alignment based transcription method. The use of them can compensate what the MPL causes and improve the overall performance. Experiments across a naturally spontaneous speech database show that the proposed methods are effective and better than other methods.

关键词： Iterative methods

来源：评论

学校读者我要写书评

暂无评论

Sequence estimation and channel equalization using forward decoding kernel machines

Sequence estimation and channel equalization using forward d...

引用

International Conference on Acoustics, speech, and Signal processing (ICASSP)

作者： Shantanu Chakrabartty Gert Cauwenberghs Center for Language and Speech Processing Department of Electrical and Computer Engineering Johns Hopkins University Baltimore MD USA

A forward decoding approach to kernel machine learning is presented. The method combines concepts from Markovian dynamics, large margin classifiers and reproducing kernels for robust sequence detection by learning inter-data dependencies. A MAP (maximum a posteriori) sequence estimator is obtained by regressing transition probabilities between symbols as a function of received data. The training procedure involves maximizing a lower bound of a regularized cross-entropy on the posterior probabilities, which simplifies into direct estimation of transition probabilities using kernel logistic regression. Applied to channel equalization, forward decoding kernel machines outperform support vector machines and other techniques by about 5dB in SNR for given BER, within 1 dB of theoretical limits.

关键词： Support vector machines Training Decoding Kernel Equalizers

来源：评论

学校读者我要写书评

暂无评论

Forward-Decoding Kernel-Based Phone Sequence Recognition 02

Forward-Decoding Kernel-Based Phone Sequence Recognition

引用

Annual Conference on Neural Information processing Systems

作者： Shantanu Chakrabartty Gert Cauwenberghs Center for Language and Speech Processing Department of Electrical and Computer Engineering Johns Hopkins University Baltimore MD 21218

ISBN: (纸本)0262025507

Forward decoding kernel machines (FDKM) combine large-margin classifiers with hidden Markov models (HMM) for maximum a posteriori (MAP) adaptive sequence estimation. State transitions in the sequence are conditioned on observed data using a kernel-based probability model trained with a recursive scheme that deals effectively with noisy and partially labeled data. Training over very large datasets is accomplished using a sparse probabilistic support vector machine (SVM) model based on quadratic entropy, and an on-line stochastic steepest descent algorithm. For speaker-independent continuous phone recognition, FDKM trained over 177, 080 samples of the TIMIT database achieves 80.6% recognition accuracy over the full test set, without use of a prior phonetic language model.

关键词： Telephone modelling languages Support Vector Network observational data label data steepest descent algorithm speaker-independent

来源：评论

学校读者我要写书评

暂无评论

Tone recognition in Thai continuous speech based on coarticulaion, intonation and stress effects 7

Tone recognition in Thai continuous speech based on coarticu...

引用

7th International Conference on Spoken language processing, ICSLP 2002

作者： Thubthong, Nuttakorn Kijsirikul, Boonserm Luksaneeyanawin, Sudaporn Department of Physics Faculty of Science Chulalongkorn Uiveristy Phayathai Rd. Bangkok10330 Thailand Department of Computer Engineering Faculty of Engineering Chulalongkorn Uiveristy Phayathai Rd. Bangkok10330 Thailand Centre for Research in Speech and Language Processing Faculty of Arts Chulalongkorn Uiveristy Phayathai Rd. Bangkok10330 Thailand

Tone recognition is a critical component for speech recognition in a tone language. One of the main problems of tone recognition in continuous speech is that several interacting factors affect F0 realization of tones. In this paper, we focus on the coarticulatory, intonation, and stress effects. These effects are compensated by the tone information of neighboring syllables, the adjustment of F0 heights and the stress acoustic features, respectively. The experiments, which compare all tone features, were conducted by feedforward neural networks. The highest recognition rates are improved from 84.07% to 93.60% and 82.48% to 92.67% for Thai proper name and Thai animal story corpora, respectively.

关键词： Feedforward neural networks

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：