Semantic similarity between words is a fundamental issue for many natural languageprocessing applications. The difficulty lies in that how to develop a computational method that is capable of generating satisfactory ...
详细信息
Semantic similarity between words is a fundamental issue for many natural languageprocessing applications. The difficulty lies in that how to develop a computational method that is capable of generating satisfactory results close to how humans perceive. In this paper, a novel method is proposed to measure semantic similarity between words using HowNet, which is a renowned Chinese-English bilingual knowledge base. Furthermore, a Chinese thesaurus is used to improve the similarity measuring. Theoretically, our method can be used in many languages while in this case it is applied for English and Chinese. Experiments on English and Chinese word pairs show that our method are closest to human similarity judgments when compared to the major state-of-the-art methods.
In this paper we present a multimodal approach for the recognition of eight emotions. Our approach integrates information from facial expressions, body movement and gestures and speech. We trained and tested a model w...
详细信息
A notable gap in research on statistical dependency parsing is a proper conditional probability distribution over nonprojective dependency trees for a given sentence. We exploit the Matrix Tree Theorem (Tutte, 1984) t...
详细信息
This letter presents a new chunking method based on Maximum Entropy (ME) model with N-fold template correction *** two types of machine learning models are *** on the analysis of the two models,then the chunking model...
详细信息
This letter presents a new chunking method based on Maximum Entropy (ME) model with N-fold template correction *** two types of machine learning models are *** on the analysis of the two models,then the chunking model which combines the profits of conditional probability model and rule based model is *** selection of features and rule templates in the chunking model is *** results for the CoNLL-2000 corpus show that this approach achieves impressive accuracy in terms of the F-score:92.93%.Compared with the ME model and ME Markov model,the new chunking model achieves better performance.
This paper explains an overview of research results of "Fusion of Communication Content and Broadcast Content", one of the two main pillars of "Content Fusion" research project conducted at the Int...
详细信息
This paper explains an overview of research results of "Fusion of Communication Content and Broadcast Content", one of the two main pillars of "Content Fusion" research project conducted at the Interactive Communication and Media Contents Group of NICT. "Fusion of Communication and Broadcast" is a conventional keyword which means technology of converging communication and broadcasting networks as an infrastructure, whereas "Fusion of Communication and Broadcast Content" represents a technology of converging Web content and TV programs at content level. Fundamental technologies and model systems were established which can efficiently utilize Internet and TV programs without complicated operations even for people who are not familiar with computer operation, such as efficient methods of accessing information and utilization methods of newly added value of information, towards the age of multitude content of TV programs and Web content available in daily lives.
The dysarthric speech characteristics of 14 Thai stroke patients were assessed by the computerized Articulation Test [1]. speech accuracy and error pattern were analyzed. Vowels and tonal characteristics were the most...
详细信息
We propose to use graph-based diffusion techniques with data-dependent kernels to build unigram language models. Our approach entails building graphs, where each vertex corresponds uniquely to a word from a closed voc...
详细信息
We propose to use graph-based diffusion techniques with data-dependent kernels to build unigram language models. Our approach entails building graphs, where each vertex corresponds uniquely to a word from a closed vocabulary, and the existence of an edge (with an appropriate weight) between two words indicates some form of similarity between them. In one of our constructions, we place an edge between two words if the number of times these words were seen in a training set differs by at most one count. This graph construction results in a similarity matrix with small intrinsic dimension, since words with the same counts have the same neighbors. Experimental results from a benchmark task from language modeling show that our method is competitive with the Good-Turing estimator.
We address the problem of extracting bilingual chunk pairs from parallel text to create training sets for statistical machine translation. We formulate the problem in terms of a stochastic generative process over text...
详细信息
Summary form only given. We study a simplified version of the problem of target detectability in the presence of clutter. The target (the needle) is a sample of size N from a discrete distribution p. The clutter (the ...
详细信息
Summary form only given. We study a simplified version of the problem of target detectability in the presence of clutter. The target (the needle) is a sample of size N from a discrete distribution p. The clutter (the haystack) is made up of M independent samples of size JV from a distribution q (which is different from p, but with the same support). Two cases can be easily shown: (i) If M is fixed and JV goes to infinity, the target can be detected with probability that approaches 1. (ii) If TV is fixed and M goes to infinity, then, with probability approaching 1, the target cannot be detected. For the case where both JV, M go to infinity, we show that the asymptotic behavior of the optimal detector (if p, q are known) and of a plug-in detector (which estimates p, q on the fly) is determined by the asymptotic behavior of the quantity Mexp(-ND(p\\q)) : if it goes to zero (resp. infinity), then, with high probability, the target can (resp. cannot) be detected.
We demonstrate an original and successful approach for both resolving and generating definite anaphora. We propose and evaluate unsupervised models for extracting hypernym relations by mining cooccurrence data of defi...
暂无评论