Associative classification is one of rule-based classifiers that has been applied in many real-world applications. Associative classifier is easily interpretable in terms of classification rules. However, there is roo...
详细信息
ISBN:
(纸本)9783319089799;9783319089782
Associative classification is one of rule-based classifiers that has been applied in many real-world applications. Associative classifier is easily interpretable in terms of classification rules. However, there is room for improvement when associative classification applied for imbalanced classification task. Existing associative classification algorithms can be limited in their performance on highly imbalanced datasets in which the class of interest is the minority class. Our objective is to improve the accuracy of the associative classifier on highly imbalanced datasets. In this paper, an effective cost-sensitive rule ranking method, named (SSCR Statistically Significant Cost-sensitive Rules), is proposed to estimate risks of a rule in classifying unseen data. Risk of a statistically significant association rule is estimated based on its classification cost induced from the training data. SSCR combines statistically significant association rules with cost-sensitive learning to build an associative classifier. Experimental results show that SSCR achieves best performance in terms of true positive rate and recall on real-world imbalanced datasets, compared with CBA and C4.5.
Terrorist attacks are the challenging issue across the world and need the attention of the practitioners to cope up deliberately. Predicting the responsible group of an event is a complicated task due to the lack of i...
详细信息
data processing and analysis has become a major task in a lot of application domains. Most tools for defining analytical processes lack a user oriented interface - especially when it comes to Big data analytics. In th...
详细信息
ISBN:
(数字)9783319076263
ISBN:
(纸本)9783319076263;9783319076256
data processing and analysis has become a major task in a lot of application domains. Most tools for defining analytical processes lack a user oriented interface - especially when it comes to Big data analytics. In this work we propose an abstraction layer for process design that enables domain experts to define their processes at an abstract level that matches their expertise. Based on that, we investigate the use of machinelearning to provide gesture recognition on input devices like tablets to provide these experts with a intuitive environment for process design.
Typically existing decision tree building algorithms use a single splitting criterion such as Gain Ratio and Gini Index. In this paper three existing splitting criteria are compared within the framework of the C4.5 de...
详细信息
In this paper we propose an algorithm which can identify varied shaped clusters from wide variety of input dataset with high degree of accuracy in presence of noise. The initial data processing module adopts a novel a...
详细信息
ISBN:
(纸本)9781479930807
In this paper we propose an algorithm which can identify varied shaped clusters from wide variety of input dataset with high degree of accuracy in presence of noise. The initial data processing module adopts a novel approach of Artificial Immune system to reduce data redundancy while preserving the original datapatterns. The clustering module pursues a density based approach to identify clusters from the compressed dataset produced by the preprocessing module. We introduced several new concepts like selective Antigenic binding, Local Reachability Factor, Global Reachability Factor to effectively recognize clusters with varied shape, varied density and low intercluster separation with acceptable computational cost. We performed experimental evaluation of our algorithm with wide variety of real and synthetic dataset and obtained higher cluster success rate for all dataset when compared to DBSCAN.
A discriminative dictionary-based approach to supporting the classification of 3D Optical Coherence Tomography (OCT) retinal images, so as to determine the presence of Age-related Macular Degeneration (AMD), is descri...
详细信息
ISBN:
(纸本)9783319089799;9783319089782
A discriminative dictionary-based approach to supporting the classification of 3D Optical Coherence Tomography (OCT) retinal images, so as to determine the presence of Age-related Macular Degeneration (AMD), is described. AMD is one of the leading causes of blindness in people aged over 50 years. The proposed approach is founded on the concept of a uniform 3D image decomposition into a set of sub-volumes where each sub-volume is described in terms of a "spatial gradient" histogram, which in turn is used to define a set of feature vectors (one per sub-volume). Feature selection is conducted using the maximum sum of the squared values of each feature vector for each sub-volume. After that, a "coding-pooling" framework is applied so that each image is represented as a single feature vector. The "coding-pooling" framework generates a representative subset of feature vectors called a dictionary, and then use this dictionary as a guide for the generation of a single feature vectors for each volume. Experiments conducted using the proposed approach, in comparison with range of alternatives, indicated that the approach outperformed other existing methods with an accuracy of 95.2%, sensitivity of 95.7% and specificity of 94.6%.
The 2014 edition of the Linked datamining Challenge, conducted in conjunction with Know@LOD 2014, has been the third edition of this challenge. The underlying data came from two domains: public procurement, and resea...
详细信息
The 2014 edition of the Linked datamining Challenge, conducted in conjunction with Know@LOD 2014, has been the third edition of this challenge. The underlying data came from two domains: public procurement, and researcher collaboration. Like in the previous year, when the challenge was held at the datamining on Linked data workshop co-located with the European conference on machinelearning and Principles and Practice of Knowledge Discovery in databases (ECML PKDD 2013), the response to the challenge appeared lower than expected, with only one solution submitted for the predictive task this year. We have tried to track the reasons for the continuously low participation in the challenge via a questionnaire survey, and principles have been distilled that could help organizers of future similar challenges.
The proceedings contain 294 papers. The special focus in this conference is on Information Technology and Management Innovation. The topics include: A composite model via proportional intensity function and additive h...
ISBN:
(纸本)9783038352396
The proceedings contain 294 papers. The special focus in this conference is on Information Technology and Management Innovation. The topics include: A composite model via proportional intensity function and additive hazard function;a research using correlation coefficient to make Bayesian classification datamining;grey target decision model of hesitant three-parameter interval grey number;trust calculation model in knowledge community;one fault diagnosis method based on fuzzy equal relationship and rough set theory;cloud computing application pattern library;the design and implementation of cloud computing model and platform;research of crop production system based on the CPS framework;application of datamining in sports in the consumer market segmentation;the design for zero emission greenhouse system;body recognition based on depth image;online tracking via stabilizer and attractor;a framework for stream datamining over wireless sensor network;design of a pipeline leak detection system;methods of path identification and processing in freescale intelligence car;design and implementation of an intelligent home control system;the design and analysis of control system applied in artificial grass tufting machine;the vector control system of induction motor based on fuzzy control;intelligent track guide control for hearth negative pressure;a polarization stabilization control system based on DSP;design of communication for commodity trading system;a well modularized computer network architecture;security protection and solving procedures of university network;design and implement of commodity client electronic trading system;predicting the shelf-life of single base propellants and the application of multimedia technology in mathematics teaching.
In patternrecognition and in image processing, feature extraction is a special type of dimensionality reduction. In datamining, Attribute subset selection or feature subset selection is normally helps for data reduc...
详细信息
ISBN:
(纸本)9781479933587
In patternrecognition and in image processing, feature extraction is a special type of dimensionality reduction. In datamining, Attribute subset selection or feature subset selection is normally helps for data reduction by removing unrelated and redundant dimensions. Given a set of image data features are extracted. From the extracted features, feature subset selection finds the subset of features that are most relevant to datamining task. The efficiency and effectiveness of the feature selection algorithm is evaluated. While the efficiency concerns the time required to find a subset of features, the effectiveness is related to the proportion of the selected features. Based on these criteria, we have used Spatial Gray Level Difference Method (SGLDM) feature extraction algorithm and Correlation based Feature Selection (CFS). Projected Classification algorithm (PROCLASS) is going to be proposed for brain image data. Experiments are going to do compare these plug-in algorithms with FAST, FCBF feature selection algorithms.
Topic modelling has been widely used in the fields of information retrieval, text mining, machinelearning, etc. In this paper, we propose a novel model, pattern Enhanced Topic Model (PETM), which makes improvements t...
详细信息
ISBN:
(纸本)9781450327459
Topic modelling has been widely used in the fields of information retrieval, text mining, machinelearning, etc. In this paper, we propose a novel model, pattern Enhanced Topic Model (PETM), which makes improvements to topic modelling by semantically representing topics with discriminative patterns, and also makes innovative contributions to information filtering by utilising the proposed PETM to determine document relevance based on topics distribution and maximum matched patterns proposed in this paper. Extensive experiments are conducted to evaluate the effectiveness of PETM by using the TREC data collection Reuters Corpus Volume 1. The results show that the proposed model significantly outperforms both state-of-the-art term-based models and pattern-based models.
暂无评论