Large repositories of spatial data have been formed in various applications such as Geographic Information Systems (GIS), environmental studies, banking, etc. the increasing demand for knowledge residing inside these ...
详细信息
ISBN:
(纸本)3540454853
Large repositories of spatial data have been formed in various applications such as Geographic Information Systems (GIS), environmental studies, banking, etc. the increasing demand for knowledge residing inside these databases has attracted much attention to the field of Spatial data Mining. Due to the common complexity and huge size of spatial databases the aspect of efficiency is of the main concerns in spatial knowledge discovery algorithms. In this paper, we introduce two novel nature-inspired algorithms for efficient discovery of spatial trends, as one of the most valuable patterns in spatial databases. the algorithms are developed using ant colony optimization and evolutionary search. We empirically study and compare the efficiency of the proposed algorithms on a real banking spatial database. the experimental results clearly confirm the improvement in performance and effectiveness of the discovery process compared to the previously proposed methods.
In this paper, a new learning method is proposed to build Support Vector Machines (SVMs) Binary Decision Functions (BDF) of reduced complexity and efficient generalization. the aim is to build a fast and efficient SVM...
详细信息
ISBN:
(纸本)3540454853
In this paper, a new learning method is proposed to build Support Vector Machines (SVMs) Binary Decision Functions (BDF) of reduced complexity and efficient generalization. the aim is to build a fast and efficient SVM classifier. A criterion is defined to evaluate the Decision Function Quality (DFQ) which blendes recognition rate and complexity of a BDF. Vector Quantization (VQ) is used to simplify the training set. A model selection based on the selection of the simplification level, of a feature subset and of SVM hyperparameters is performed to optimize the DFQ. Search space for selecting the best model being huge, Tabu Search (TS) is used to find a good sub-optimal model on tractable times. Experimental results show the efficiency of the method.
Leave-one-out Cross Validation (LOO-CV) gives an almost unbiased estimate of the expected generalization error. But the LOO-CV classical procedure with Support Vector Machines (SVM) is very expensive and cannot be app...
详细信息
ISBN:
(纸本)3540454853
Leave-one-out Cross Validation (LOO-CV) gives an almost unbiased estimate of the expected generalization error. But the LOO-CV classical procedure with Support Vector Machines (SVM) is very expensive and cannot be applied when training set has more that few hundred examples. We propose a new LOO-CV method which uses modified initialization of Sequential Minimal Optimization (SMO) algorithm for SVM to speed-up LOO-CV. Moreover, when SMO's stopping criterion is changed with our adaptive method, experimental results show that speed-up of LOO-CV is greatly increased while LOO error estimation is very close to exact LOO error estimation.
this paper presents a system for automatically detecting and filtering unsolicited electronic messages. the underlying hybrid filtering method is based on e-mail origin and content. the system classifies each of the t...
详细信息
ISBN:
(纸本)3540454853
this paper presents a system for automatically detecting and filtering unsolicited electronic messages. the underlying hybrid filtering method is based on e-mail origin and content. the system classifies each of the three parts of e-mails separately by using a sinole Bayesian filter together with a heuristic knowledge base. the system extracts heuristic knowledge from a set of labelled words as the basis on which to begin filtering instead of conducting a training stage using a historic body of pre-classified e-mails. the classification resulting from each part is then integrated to achieve optimum effectiveness. the heuristic knowledge base allows the system to carry out intelligent management of the increase in filter vocabularies and thus ensures efficient classification. the system is dynamic and interactive and the role of the user is essential to keep the evolution of the system up to date by incremental machine learning withthe evolution of spam. the user can interact withthe system over a customized, friendly interface, in real time or at intervals of the user's choosing.
Testing interactions in multi-agent systems is a complex task because of several reasons. Agents are distributed and can move through different nodes in a network, so their interactions can occur concurrently and from...
详细信息
ISBN:
(纸本)3540454853
Testing interactions in multi-agent systems is a complex task because of several reasons. Agents are distributed and can move through different nodes in a network, so their interactions can occur concurrently and from many different sites. Also, agents are autonomous entities with a variety of possible behaviours, which can evolve. during their lives by adapting to changes in the environment and new interaction patterns. Furthermore, the number of agents can vary during system execution, from a few dozens to thousands or more. therefore, the number of interactions can be huge and it is difficult to follow up their occurrence and relationships. In order to solve these issues we propose the use of a set of data mining tools, the ACLAnalyser, which processes the results of the execution of large scale multi-agent systems in a monitored environment. this has been integrated with an agent development toolset, the INGENIAS Development Kit, in order to facilitate the verification of multi-agent system models at the design level rather than at the programming level.
In telecom industry high installation and marketing costs make it between six to ten times more expensive to acquire a new customer than it is to retain the existing one. Prediction and prevention of customer chum is ...
详细信息
ISBN:
(纸本)3540454853
In telecom industry high installation and marketing costs make it between six to ten times more expensive to acquire a new customer than it is to retain the existing one. Prediction and prevention of customer chum is therefore a key priority for industrial research. While all the motives of customer decision to churn are highly uncertain there is lots of related temporal data sequences generated as a result of customer interaction withthe service provider. Existing churn prediction methods like decision tree typically just classify customers into chumers or non-chumers while completely ignoring the timing of chum event. Given histories of other customers and the current customer's data, the presented model proposes a new k nearest sequence (kNS) algorithm along with temporal sequence fusion technique to predict the whole remaining customer data sequence path up to the chum event. It is experimentally demonstrated that the new model better exploits time-ordered customer data sequences and surpasses the existing churn prediction methods in terms of performance and offered capabilities.
Computational fluid dynamics (CFD) techniques are currently widely adopted to simulate the behaviour of fire but it requires extensive computer storage and lengthy computational time. Using CFD in the course of buildi...
详细信息
ISBN:
(纸本)3540454853
Computational fluid dynamics (CFD) techniques are currently widely adopted to simulate the behaviour of fire but it requires extensive computer storage and lengthy computational time. Using CFD in the course of building design optimization is theoretically feasible but requires lengthy computational time. this paper proposes the application of an artificial neural network (ANN) approach as a quick alternative to CFD models. A novel ANN model that is denoted as GRNNFA has been developed specifically for fire studies. As the available training samples may not be sufficient to describe system behaviour, especially for fire data, additional knowledge of the system is acquired from a human expert. the expert intervention network training is developed to remedy the established system response surface. A genetic algorithm is applied to evaluate the close optimum set of the design parameters.
Measuring the length of a paththat a taxi must fare is an obvious task: when driving lower than certain speed threshold the fare is time dependent, but at higher speeds the length of the path is measured, and the far...
详细信息
ISBN:
(纸本)3540454853
Measuring the length of a paththat a taxi must fare is an obvious task: when driving lower than certain speed threshold the fare is time dependent, but at higher speeds the length of the path is measured, and the fare depends on such measure. When passing an indoor MOT test, the taximeter is calibrated simulating a cab run, while the taxi is placed on a device equipped with four rotating steel cylinders in touch withthe drive wheels. this indoor measure might be inaccurate, as the information given by the cylinders is affected by tires inflating pressure, and only straight trajectories are tested. Moreover, modern vehicles with driving aids such as ABS, ESP or TCS might have their electronics damaged in the test, since two wheels are spinning while the others are not. To surpass these problems, we have designed a small, portable GPS sensor that periodically logs the coordinates of the vehicle and computes the length of a discretionary circuit. We will show that all the legal issues withthe tolerance of such a procedure (GPS data are inherently imprecise) can be overcome if genetic and fuzzy techniques are used to process and analyze the raw data.
the incidence of breast cancer varies greatly among countries, but statistics show that every year 720,000 new cases will be diagnosed world-wide. However, a high percentage of these cases can be 100% healed if they a...
详细信息
ISBN:
(纸本)3540454853
the incidence of breast cancer varies greatly among countries, but statistics show that every year 720,000 new cases will be diagnosed world-wide. However, a high percentage of these cases can be 100% healed if they are detected in early stages. Because symptoms are not visible as far as advanced stages, it makes the treatments more aggressive and also less efficient. therefore, it is necessary to develop new strategies to detect the formation in early stages. We have developed a tool based on a Case-Based Reasoning kernel for retrieving mammographic images by content analysis. One of the main difficulties is the introduction of knowledge and abstract concepts from domain into the retrieval process. For this reason, the article proposes integrate the human experts perceptions into it by means of an interaction between human and system using a Relevance Feedback strategy. Furthermore, the strategy uses a Self-Organization Map to cluster the memory and improve the time interaction.
In this paper we develop and analyze methods for expanding automatedlearning of Relevance Vector Machines (RVM) to large scale text sets. RVM rely on Bayesian inference learning and while maintaining state-of-the-art...
详细信息
In this paper we develop and analyze methods for expanding automatedlearning of Relevance Vector Machines (RVM) to large scale text sets. RVM rely on Bayesian inference learning and while maintaining state-of-the-art performance, offer sparse and probabilistic solutions. However, efforts towards applying RVM to large scale sets have met with limited success in the past, due to computational constraints. We propose a diversified set of divide-and-conquer approaches where decomposition techniques promote the definition of smaller working sets that permit the use of all training examples. the rationale is that by exploring incremental, ensemble and boosting strategies, it is possible to improve classification performance, taking advantage of the large training set available. Results on Reuters-21578 and RCV1 are presented, showing performance gains and maintaining sparse solutions that can be deployed in distributed environments.
暂无评论