As online community question answering (cQA) portals like Yahoo! Answers1 and Baidu Zhidao2 have attracted over hundreds of millions of questions, how to utilize these questions and accordant answers becomes increasin...
详细信息
In opinion mining of product reviews, one often wants to produce a summary of opinions based on product features/attributes. However, for the same feature, people can express it with different words and phrases. To pr...
详细信息
In opinion mining of product reviews, one often wants to produce a summary of opinions based on product features/attributes. However, for the same feature, people can express it with different words and phrases. To produce a meaningful summary, these words and phrases, which are domain synonyms, need to be grouped under the same feature group. This paper proposes a constrained semisupervised learning method to solve the problem. Experimental results using reviews from five different domains show that the proposed method is competent for the task. It outperforms the original EM and the state-of-the-art existing methods by a large margin.
Multiword Expressions (MWEs) appear frequently and ungrammatically in the natural languages. Identifying MWEs in free texts is a very challenging problem. This paper proposes a knowledge-free, training-free, and langu...
详细信息
Multiword Expressions (MWEs) appear frequently and ungrammatically in the natural languages. Identifying MWEs in free texts is a very challenging problem. This paper proposes a knowledge-free, training-free, and language-independent Multiword Expression Distance (MED). The new metric is derived from an accepted physical principle, measures the distance from an n-gram to its semantics, and outperforms other state-of-the-art methods on MWEs in two applications: question answering and named entity extraction.
Opinion Question Answering (Opinion QA), which aims to find the authors’ sentimental opinions on a specific target, is more challenging than traditional fact-based question answering problems. To extract the opinion ...
详细信息
Learning graphical models with hidden variables can offer semantic insights to complex data and lead to salient structured predictors without relying on expensive, sometime unattainable fully annotated training data. ...
详细信息
ISBN:
(纸本)9781605609492
Learning graphical models with hidden variables can offer semantic insights to complex data and lead to salient structured predictors without relying on expensive, sometime unattainable fully annotated training data. While likelihood-based methods have been extensively explored, to our knowledge, learning structured prediction models with latent variables based on the max-margin principle remains largely an open problem. In this paper, we present a partially observed Maximum Entropy Discrimination Markov Network (PoMEN) model that attempts to combine the advantages of Bayesian and margin based paradigms for learning Markov networks from partially labeled data. PoMEN leads to an averaging prediction rule that resembles a Bayes predictor that is more robust to overfitting, but is also built on the desirable discriminative laws resemble those of the M3N. We develop an EM-style algorithm utilizing existing convex optimization algorithms for M3N as a subroutine. We demonstrate competent performance of PoMEN over existing methods on a real-world web data extraction task.
A new bridge recognition method in Synthetic Aperture Radar (SAR) image using bridge model and SVM is presented in this paper. Firstly, water region is extracted from original SAR image by self-adapt segmentation and ...
详细信息
Traditional relation extraction methods require pre-specified relations and relation-specific human-tagged examples. Bootstrapping systems significantly reduce the number of training examples, but they usually apply h...
详细信息
ISBN:
(纸本)9781605584874
Traditional relation extraction methods require pre-specified relations and relation-specific human-tagged examples. Bootstrapping systems significantly reduce the number of training examples, but they usually apply heuristic-based methods to combine a set of strict hard rules, which limit the ability to generalize and thus generate a low recall. Further-more, existing bootstrapping methods do not perform open information extraction (Open IE), which can identify various types of relations without requiring pre-specifications. In this paper, we propose a statistical extraction framework called Statistical Snowball (StatSnowball), which is a bootstrapping system and can perform both traditional relation extraction and Open IE. StatSnowball uses the discriminative Markov logic networks (MLNs) and softens hard rules by learning their weights in a maximum likelihood estimate sense. MLN is a general model, and can be configured to perform different levels of relation extraction. In StatSnwoball, pattern selection is performed by solving an 1-norm penalized maximum likelihood estimation, which enjoys well-founded theories and efficient solvers. We extensively evaluate the performance of StatSnowball in different configurations on both a small but fully labeled data set and large-scale Web data. Empirical results show that StatSnowball can achieve a significantly higher recall without sacrificing the high precision during iterations with a small number of seeds, and the joint inference of MLN can improve the performance. Finally, StatSnowball is efficient and we have developed a working entity relation search engine called Renlifang based on it. Copyright is held by the International World Wide Web Conference Committee (IW3C2).
The corpus to train the language model is a key factor to decide the performance of the Chinese pinyin input method. Traditional products take public documents as the corpus. The Internet provides a cheap, massive and...
详细信息
Chinese word segmentation (CWS) is the fundamental tech.ology for many NLPrelated applications. It is reported that more than 60% of segmentation errors is caused by the out-of-vocabulary (OOV) words. Recent studies i...
详细信息
In image retrieval and annotation, Multi-Instance Learning has been studied actively. Most of the methods solve the MIL problem in a supervised way. In this paper, we proposed two unsupervised frameworks for clusterin...
详细信息
暂无评论