Recently, negative association rule mining has received some attention and proved to be useful. this paper proposes an extended form for negative association rules and defines extended negative association rules. Furt...
详细信息
ISBN:
(纸本)3540335846
Recently, negative association rule mining has received some attention and proved to be useful. this paper proposes an extended form for negative association rules and defines extended negative association rules. Furthermore, a corresponding algorithm is devised for mining extended negative association rules. the extended form is more general and expressive than the three existing forms. the proposed mining algorithm overcomes some limitations of previous mining methods, and experimental results show that it is efficient on simple and sparse datasets when minimum support is high to some degree. Our work will extend related applications of negative association rules to a broader range.
It is generally recognized that recursive partitioning, as used in the construction of class claaification trees, is inherently, unstable, particularly for small data sets. Classification accuracy and, by implication,...
详细信息
ISBN:
(纸本)9781424405039
It is generally recognized that recursive partitioning, as used in the construction of class claaification trees, is inherently, unstable, particularly for small data sets. Classification accuracy and, by implication, tree structure, are sensitive to changes in the training data. Successful approaches to counteract this effect include multiple classifiers, e.g. boosting, bogging or windowing. the downside o(c) these multiple classification models, however, is the plethora of trees that result, often making it difficult to extract the classifier in a meaningful manner. We show that, by using some very weak knowledge in the sampling stage, when the data set is partitioned into the training and test sets, a more consistent and improved performance is achieved by a single decision tree classifier. the reductions in error rate attained are comparable withthose attained using boosting. In addition, we demonstrate that the combination of such sampling, combined with boosting, yields significant reductions in error rates.
Traditional RAID has the characteristics that location of stripe unit in each disk is stochastic and static, and that the outer zone of the disk has higher data transfer rate as compared to the inner one. Facing this ...
详细信息
ISBN:
(纸本)3540335846
Traditional RAID has the characteristics that location of stripe unit in each disk is stochastic and static, and that the outer zone of the disk has higher data transfer rate as compared to the inner one. Facing this situation, to exploit RAID I/O performance fully, this paper proposes a new algorithm PMSH (Placement and Migration based on Stripe unit Heat) for RAID stripe unit data to be placed optically and migrated dynamically. Based on the heat of RAID stripe unit, PMSH keeps migrating the frequently accessed stripe unit to the disk zone with higher data transfer rate to optimize the location of data in RAID disks and make the data distribution adapt to the evolution of file access pattern dynamically as well. Simulation results demonstrate significant RAID I/O performance improvement using PMSH.
We have designed and developed a general knowledge representation tool, an expert system shell called McESE (McMaster Expert System Environment);it derives a set of production (decision) rules of a very general form. ...
详细信息
ISBN:
(纸本)9728865554
We have designed and developed a general knowledge representation tool, an expert system shell called McESE (McMaster Expert System Environment);it derives a set of production (decision) rules of a very general form. Such a production set can be equivalently symbolized as a decision tree. McESE exhibits several parameters such as the weights, thresholds, and the certainty propagation functions that have to be adjusted (designed) according to a given problem, for instance, by a given set of training examples. We can use the traditional machinelearning (ML) or datamining (DM) algorithms for inducing the above parameters can be utilized. In this methodological case study, we discuss an application of genetic algorithms (GAs) to adjust (generate) parameters of the given tree that can be then used in the rule-based expert system shell McESE. the only requirement is that a set of McESE decision rules (or more precisely, the topology of a decision tree) be given.
Determiningthe relevant features is a combinatorial task in various fields of machinelearning such as text mining, bioinformatics, patternrecognition, etc. Several scholars have developed various methods to extract...
详细信息
ISBN:
(纸本)3540464840
Determiningthe relevant features is a combinatorial task in various fields of machinelearning such as text mining, bioinformatics, patternrecognition, etc. Several scholars have developed various methods to extract the relevant features but no method is really superior. Breiman proposed Random Forest to classify a pattern based on CART tree algorithm and his method turns out good results compared to other classifiers. Taking advantages of Random Forest and using wrapper approach which was first introduced by Kohavi et. al, we propose an algorithm named Dynamic Recursive Feature Elimination (DRFE) to find the optimal subset of features for reducing noise of the data and increasing the performance of classifiers. In our method, we use Random Forest as induced classifier and develop our own defined feature elimination function by adding extra terms to the feature scoring. We conducted experiments with two public datasets: Colon cancer and Leukemia cancer. the experimental results of the real world data showed that the proposed method has higher prediction rate compared to the baseline algorithm. the obtained results are comparable and sometimes have better performance than the widely used classification methods in the same literature of feature selection.
One common source of error in data is the existence of missing value fields. Imputation method has been a widely used technique in preprocessing phase of datamining, in which missing values are replaced by some estim...
详细信息
ISBN:
(纸本)9781586036157
One common source of error in data is the existence of missing value fields. Imputation method has been a widely used technique in preprocessing phase of datamining, in which missing values are replaced by some estimated values. Previous work is trying to seek the "original" values according to specific criteria, such as statistics measure. However, in domain of cost-sensitive leaning, minimal overall cost is the most important issue, i.e. a value which can minimize total cost is prefer than the "best" value upon common sense. For example, in medical domains, some data fields usually are left as absent and known information is enough for a decision. In this paper, we proposed a new method to study the problem of "missing or absent values?" in the domain cost-sensitive learning. Experiment results show some improvements with distinguished missing and absent data in cost-sensitive decision tree.
Feature selection is attracted much interest from researchers in many fields such as patternrecognition and datamining. In this paper, a novel algorithm for feature selection is developed. the proposed algorithm use...
详细信息
ISBN:
(纸本)0769525210
Feature selection is attracted much interest from researchers in many fields such as patternrecognition and datamining. In this paper, a novel algorithm for feature selection is developed. the proposed algorithm uses the standard linear SVM algorithm and is performed in an iterative way. Feature selection is carried out by assigning weights to features. Experimental results on UCI data set and face images confirm the feasibility and validation of the proposed method
We introduce some improvements to the dynamic learning vector quantization algorithm proposed by us for tackling the two major problems of those networks, namely neuron over-splitting and their distribution in the fea...
详细信息
We introduce some improvements to the dynamic learning vector quantization algorithm proposed by us for tackling the two major problems of those networks, namely neuron over-splitting and their distribution in the feature space. We suggest to explicitly estimate the potential improvement on the recognition rate achievable by splitting neurons in those regions of the feature space in which two or more classes overlap. We also suggest to compute the neuron splitting frequency, and to combine these information for selecting the most promising neuron to split. Experimental results on both synthetic and real data extracted from UCI machinelearning Repository show substantial improvements of the proposed algorithm with respect to the state of the art
High-throughput genome-wide measurements of gene transcript levels have become available withthe recent development of microarray technology. Intelligent and efficient mathematical and computational analysis tools ar...
详细信息
ISBN:
(纸本)0889865787
High-throughput genome-wide measurements of gene transcript levels have become available withthe recent development of microarray technology. Intelligent and efficient mathematical and computational analysis tools are needed to read and interpret the information content buried in those large scale gene expression patterns at various levels of resolution. But the development of such methods is still in its infancy. Modern machinelearning and datamining techniques based on information theory, like independent component analysis (ICA), consider gene expression patterns as a superposition of independent expression modes which are considered putative independent biological processes. We focus on two widely used ICA algorithms to blindly decompose gene expression profiles into independent component profiles representing underlying biological processes. these exploratory methods will be capable of detecting similarity, locally or globally, in gene expression patterns and help to group genes into functional categories - for example, genes that are expressed to a greater or lesser extent in response to a drug or an existing disease.
We introduce a generalization to the multiclass framework of a previous approach to boosting by constructing symmetric functions. this approach contrasts withthe usual AdaBoost-type boosting algorithms using linear s...
详细信息
We introduce a generalization to the multiclass framework of a previous approach to boosting by constructing symmetric functions. this approach contrasts withthe usual AdaBoost-type boosting algorithms using linear separators. Indeed, multiclass induction does not necessitate combination tricks such as those for linear separators, and it achieves some novel agnostic learning properties, as well as significant malicious noise tolerance. Experiments on a large testbed against AdaBoost and C4.5 display the efficiency of the approach proned
暂无评论