Most text clustering techniques are based on words and/or phrases weights in the text. Such representation is often unsatisfactory because it ignores the relationships between terms, and considers them as independent ...
详细信息
ISBN:
(纸本)9783642030697
Most text clustering techniques are based on words and/or phrases weights in the text. Such representation is often unsatisfactory because it ignores the relationships between terms, and considers them as independent features. In this paper, a new semantic similarity based model (SSBM) is proposed. The semantic similarity based model computes semantic similarities by utilizing WordNet as an ontology. The proposed model captures the semantic similarities between documents that contain semantically similar terms but unnecessarily syntactically identical. The semantic similarity based model assigns a new weight to document terms reflecting the semantic relationships between terms that co-occur literally in the document. Our model in conjunction with the extended gloss overlaps measure and the adapted Lesk algorithm solves ambiguity, synonymy problems that are not detected using traditional term frequency based text milling techniques. The proposed model is evaluated on the Reuters-21578 and the 20-Newsgroups text collections datasets. The performance is assessed in terms of the Fmeasure, Purity and Entropy quality measures. The obtained results show promising performance improvements compared to the traditional term based vector space model (VSM) as well as other existing methods that include semantic similarity measures in text clustering.
Organ transplantation is a highly complex decision process that requires expert, decisions. The major problem ill a transplantation procedure is the possibility of the receiver39;s immune system attack and destroy t...
详细信息
ISBN:
(纸本)9783642030697
Organ transplantation is a highly complex decision process that requires expert, decisions. The major problem ill a transplantation procedure is the possibility of the receiver's immune system attack and destroy the transplanted tissue. It is therefore of capital importance to find a donor with the highest possible compatibility with the receiver, and thus reduce rejection. Finding a good donor is not a straightforward task because a complex network of relations exist's between the immunological and the clinical variables that, influence the receivers acceptance of the transplanted organ. Currently the process of analyzing these variables involves a careful study by the clinical transplant team. The number and complexity of the relations between variables make the manual process very slow. Ill this paper we propose and compare two machinelearning algorithms that might help the transplant team ill improving and Speeding up their decisions. We achieve that objective by analyzing past real cases and constructing models as set, of rules. Such models are accurate and understandable by experts.
In this paper we address the problem of using bet selections of a large number of mostly non-expert users to improve sports betting tips. A similarity based approach is used to describe individual users39; strategie...
详细信息
This book constitutes the refereed proceedings of the 8th internationalconference on Intelligent data Analysis, IDA 2009, held in Lyon, France, August 31 September 2, 2009. The 33 revised papers, 18 full oral present...
ISBN:
(数字)9783642039157
ISBN:
(纸本)9783642039140
This book constitutes the refereed proceedings of the 8th internationalconference on Intelligent data Analysis, IDA 2009, held in Lyon, France, August 31 September 2, 2009. The 33 revised papers, 18 full oral presentations and 15 poster and short oral presentations, presented were carefully reviewed and selected from almost 80 submissions. All current aspects of this interdisciplinary field are addressed; for example interactive tools to guide and support data analysis in complex scenarios, increasing availability of automatically collected data, tools that intelligently support and assist human analysts, how to control clustering results and isotonic classification trees. In general the areas covered include statistics, machinelearning, datamining, classification and patternrecognition, clustering, applications, modeling, and interactive dynamic data visualization.
In this paper we present a sequential expectation maximization algorithm to adapt in an unsupervised manner a Gaussian mixture model for a classification problem. The goal is to adapt the Gaussian mixture model to cop...
详细信息
User generated content is extremely valuable for mining market intelligence because it is unsolicited. We study the problem of analyzing users39; sentiment and opinion in their blog, message board, etc. posts with r...
详细信息
The main drawback of any lexicon-based sentiment analysis system is the lack of scalability. Thus, in this paper, we will describe methods to automatically generate and score a new sentiment lexicon, called SentiFul, ...
详细信息
The main drawback of any lexicon-based sentiment analysis system is the lack of scalability. Thus, in this paper, we will describe methods to automatically generate and score a new sentiment lexicon, called SentiFul, and expand it through direct synonymy relations and morphologic modifications with known lexical units. We propose to distinguish four types of affixes (used to derive new words) depending on the role they play with regard to sentiment features: propagating, reversing, intensifying, and weakening.
Coping with differences in the expression of emotions is a challenging task not only for a machine, but also for humans. Since individualism in the expression of emotions may occur at various stages of the emotion gen...
详细信息
Coping with differences in the expression of emotions is a challenging task not only for a machine, but also for humans. Since individualism in the expression of emotions may occur at various stages of the emotion generation process, human beings may react quite differently to the same stimulus. Consequently, it comes as no surprise that recognition rates reported for a user-dependent system are significantly higher than recognition rates for a user-independent system. Based on empirical data we obtained in our earlier work on the recognition of emotions from biosignals, speech and their combination, we discuss which consequences arise from individual user differences for automated recognition systems and outline how these systems could be adapted to particular user groups.
Organisms exhibit a close structure-function relationship and a slight change in structure may in turn change their outputs accordingly [1]. This feature is important as it is the main reason why organisms have better...
详细信息
Background: Finding relevant articles from PubMed is challenging because it is hard to express the user39;s specific intention in the given query interface, and a keyword query typically retrieves a large number of ...
详细信息
ISBN:
(纸本)9781605588032
Background: Finding relevant articles from PubMed is challenging because it is hard to express the user's specific intention in the given query interface, and a keyword query typically retrieves a large number of results. Researchers have applied machinelearning techniques to find relevant articles by ranking the articles according to the learned relevance function. However, the process of learning and ranking is usually done offline without integrated with the keyword queries, and the users have to provide a large amount of training documents to get a reasonable learning accuracy. This paper proposes a novel multi-level relevance feedback system for PubMed, called RefMed, which supports both ad-hoc keyword queries and a multi-level relevance feedback in real time on PubMed. Results: RefMed supports a multi-level relevance feedback by using the RankSVM as the learning method, and thus it achieves higher accuracy with less feedback. RefMed "tightly" integrates the RankSVM into rdBMS to support both keyword queries and the multi-level relevance feedback in real time;the tight coupling of the RankSVM and DBMS substantially improves the processing time. An efficient parameter selection method for the RankSVM is also proposed, which tunes the RankSVM parameter without performing validation. Thereby, RefMed achieves a high learning accuracy in real time without performing a validation process. RefMed is accessible at {\sf http://***/refmed}. Conclusions: RefMed is the first multi-level relevance feedback system for PubMed, which achieves a high accuracy with less feedback. It effectively learns an accurate relevance function from the user's feedback and efficiently processes the function to return relevant articles in real time. Copyright 2009 ACM.
暂无评论