Cross-document coreference resolution plays an import part in the filed of natural language processing (NLP). It captures the ability of gathering documents for information about a certain entity. Most previous algori...
详细信息
ISBN:
(纸本)9780769539232
Cross-document coreference resolution plays an import part in the filed of natural language processing (NLP). It captures the ability of gathering documents for information about a certain entity. Most previous algorithms identify the underlying entity of a given document depending on the original text, which is unreliable if the original text contains multiple parts of different themes. In this paper, we propose a cross-document coreference resolution algorithm based on automatic text summary instead of the original text. In our approach, we extract query-specific and informative-indicative summary from the original text by using Hobbs algorithm and measure the similarity between two summaries. This automatic text summary-based cross-document coreference resolution (ATSCDCR) system is effective in disambiguating different entities of the same mention name and identifying the same entity of different mention names. The results from our experiments show that the macro average of ATSCDCR system is up to 73.16% and the micro average of ATSCDCR system is 67.34%.
The proceedings contain 25 papers. The special focus in this conference is on Agents, Knowledge Acquisition, datamining, machinelearning, Neural Nets and Intelligent Systems Engineering. The topics include: Involvin...
ISBN:
(纸本)3642152856
The proceedings contain 25 papers. The special focus in this conference is on Agents, Knowledge Acquisition, datamining, machinelearning, Neural Nets and Intelligent Systems Engineering. The topics include: Involving the human user in the control architecture of an autonomous agent;transferring hopscotch from the schoolyard to the classroom;social relationships as a means for identifying an individual in large information spaces;a methodology for inducing pre-pruned modular classification rules;enhancement of infrequent purchased product recommendation using datamining techniques;a machinelearning approach to predicting winning patterns in track cycling omnium;learning motor control by dancing YMCA;analysis and comparison of probability transformations for fusing sensors with uncertain detection performance;a case-based approach to business process monitoring;a survey on the dynamic scheduling problem in astronomical observations;combining determinism and intuition through univariate decision strategies for target detection from multi-sensors;a UML profile oriented to the requirements modeling in intelligent tutoring systems projects;learning by collaboration in intelligent autonomous systems;full text search engine as scalable k-nearest neighbor recommendation system;following a developing story on the web;computer-aided estimation for the risk of development of gastric cancer by image processing;intelligent hybrid architecture for tourism services;knowledge-based geo-risk assessment for an intelligent measurement system and case-based decision support in time dependent medical domains.
Feature selections have seen growing importance placed on statistics, patternrecognition, machinelearning and datamining. Researchers have demonstrated the interest in the methods for improving the performance of t...
详细信息
This paper details the preliminary research into modeling the behavior of Electronic Gaming machines (EGM) for the task of proactive fault diagnostics. The EGMs operate within a state space and therefore their behavio...
详细信息
ISBN:
(纸本)9783642130588
This paper details the preliminary research into modeling the behavior of Electronic Gaming machines (EGM) for the task of proactive fault diagnostics. The EGMs operate within a state space and therefore their behavior was modeled, using supervised learning, as the frequency at which a given machine is operating in a particular state. The results indicated that EGMs did exhibit measurably different behavior when they were about to experience a fault and these relationships were modeled effectively by several algorithms.
In this paper, a novel fuzzy support vector machine based image watermarking scheme is proposed. Since the application of support vector machine in the process of watermarking technology is only a simple classificatio...
详细信息
User Navigation Behavior mining (UNBM) mainly studies the problems of extracting the interesting user access patterns from user access sequences (UAS), which are usually used for user access prediction and web page re...
详细信息
The proceedings contain 25 papers. The topics discussed include: exer-learning games: transferring hopscotch from the schoolyard to the classroom;social relationships as a means for identifying an individual in large ...
ISBN:
(纸本)3642152856
The proceedings contain 25 papers. The topics discussed include: exer-learning games: transferring hopscotch from the schoolyard to the classroom;social relationships as a means for identifying an individual in large information spaces;J-PMCRI: a methodology for inducing pre-pruned modular classification rules;enhancement of infrequent purchased product recommendation using datamining techniques;a machinelearning approach to predicting winning patterns in track cycling omnium;learning motor control by dancing YMCA;analysis and comparison of probability transformations for fusing sensors with uncertain detection performance;a case-based approach to business process monitoring;a survey on the dynamic scheduling problem in astronomical observations;and combining determinism and intuition through univariate decision strategies for target detection from multi-sensors.
In this study a novel framework for datamining in clinical decision making have been proposed. Our framework addresses the problems of assessing and utilizing datamining models in medical domain. The framework consi...
详细信息
ISBN:
(纸本)9783642148330
In this study a novel framework for datamining in clinical decision making have been proposed. Our framework addresses the problems of assessing and utilizing datamining models in medical domain. The framework consists of three stages. The first stage involves preprocessing of the data to improve its quality. The second stage employs k-means clustering algorithm to cluster the data into k clusters (in our case, k=2 i.e. cluster / no, cluster1 / yes) for validation the class labels associated with the data. After clustering, the class labels associated with the data is compared with the labels generated by clustering algorithm if both the labels are same it is assumed that the data is correctly classified. The instances for which the labels are not same are considered to be misclassified and are removed before further processing. In the third stage support vector machine classification is applied. The classification model is validated by using k-fold cross validation method. The performance of SVM (Support Vector machine) classifier is also compared with Naive Bayes classifier. In our case SVM classifier outperforms the Naive Bayes classifier. To validate the proposed framework, experiments have been carried out on benchmark datasets such as Indian Pima diabetes dataset and Wisconsin breast cancer dataset (WBCD). These datasets were obtained from the University of California at Irvine (UCI) machinelearning repository. Our proposed study obtained classification accuracy on both datasets, which is better with respect to the other classification algorithms applied on the same datasets as cited in the literature. The performance of the proposed framework was also evaluated using the sensitivity and specificity measures.
Modeling proximity search problems as a metric space provides a general framework usable in many areas, like patternrecognition, web search, clustering, datamining, knowledge management, textual and multimedia infor...
详细信息
ISBN:
(纸本)9781450304207
Modeling proximity search problems as a metric space provides a general framework usable in many areas, like patternrecognition, web search, clustering, datamining, knowledge management, textual and multimedia information retrieval, to name a few. Metric indexes have been improved over the years and many instances of the problem can be solved efficiently. However, when very large/high dimensional metric databases are indexed exact approaches are not yet capable of solving efficiently the problem, the performance in these circumstances is degraded to almost sequential search. To overcome the above limitation, non-exact proximity searching algorithms can be used to give answers that either in probability or in an approximation factor are close to the exact result. Approximation is acceptable in many contexts, specially when human judgement about closeness is involved. In vector spaces, on the other hand, there is a very successful approach dubbed Locality Sensitive Hashing which consist in making a succinct representation of the objects. This succinct representation is relatively insensitive to small variations of the locality. Unfortunately, the hashing function have to be carefully designed, very close to the data model, and different functions are used when objects come from different domains. In this paper we give a new schema to encode objects in a general metric space with a uniform framework, independent from the data model. Finally, we provide experimental support to our claims using several real life databases with different data models and distance functions obtaining excellent results in both the speed and the recall sense, specially for large databases. Copyright 2010 ACM.
A typical disaster recovery system will have mirrored storage at a site that is geographically separate from the main operational site. In many cases, communication between the local site and the backup repository sit...
详细信息
ISBN:
(纸本)9781605589084
A typical disaster recovery system will have mirrored storage at a site that is geographically separate from the main operational site. In many cases, communication between the local site and the backup repository site is performed over a network which is inherently slow, such as a WAN, or is highly strained, for example due to a whole-site disaster recovery operation. The goal of this work is to alleviate the performance impact of the network in such a scenario, and to do so using machinelearning techniques. We focus on two main areas, prefetching and read-ahead size determination. In both cases we significantly improve the performance of the system. Our main contributions are as follows: We introduce a theoretical model of the system and the problem we are trying to solve and bound the gain from prefetching techniques. We construct two frequent patternmining algorithms and use them for prefetching. A framework for controlling and combining multiple prefetch algorithms is presented as well. These algorithms, as well as various simple prefetch algorithms, are compared on a simulation environment. We introduce a novel algorithm for determining the amount of read ahead on such a system that is based on intuition from online competitive analysis and on regression techniques. The significant positive impact of this algorithm is demonstrated on IBM's FastBack system. Much of our improvements have been applied with little or no modification of the current implementation's internals. We therefore feel confident in stating that the techniques are general and are likely to have applications elsewhere. Copyright 2010 ACM.
暂无评论