An efficient low-level word image representation plays a crucial role in general cursive wordrecognition. This paper proposes a novel representation scheme, where a word image can be represented as two sequences of f...
详细信息
Steering an autonomous vehicle requires the permanent adaptation of behavior in relation to the various situations the vehicle is in. This paper describes a research which implements such adaptation and optimization b...
详细信息
Several cost-sensitive boosting algorithms have been reported as effective methods in dealing with class imbalance problem. Misclassification costs, which reflect the different level of class identification importance...
详细信息
The self-organizing map (SOM) is a common methodology used to capture and represent datapatterns and increasingly playing a significant role in the development of neural networks. The primary objective of an SOM is t...
详细信息
ISBN:
(纸本)0889865078
The self-organizing map (SOM) is a common methodology used to capture and represent datapatterns and increasingly playing a significant role in the development of neural networks. The primary objective of an SOM is to determine an approximate representation of data with an unknown probability distribution, from a multi-dimensional input space, using a lower dimensional neural network. The approximation by the network corresponds to the topological structure inherent in the data distribution. The classical SOM, and many of its variations such as the growing grid, construct the network based on randomly selected pieces of the input space, where the number of pieces increases over time. We give an overview of a parallel algorithm for the SOM (ParaSOM), which alternatively examines the entire input in each step, leading to a more accurate representation of input patterns after only a fraction of iterations, albeit requiring significantly more time. Both growing grid and ParaSOM, unlike the classical SOM, do not maintain a fixed number of neurons. Instead, their networks may grow and increase in density to match the input space. We present a comparison of results generated by implementations of ParaSOM and growing grid is made, making apparent their considerable performance differences despite having the growth feature in common.
Feature selection method for text classification based on information gain ranking, improved by removing redundant terms using mutual information measure and inclusion index, is proposed. We report an experiment to st...
详细信息
datamining is one of the most important areas in the 21st century with many wide ranging applications. These include medicine, finance, commerce and engineering. patternmining is amongst the most important and chall...
详细信息
datamining is one of the most important areas in the 21st century with many wide ranging applications. These include medicine, finance, commerce and engineering. patternmining is amongst the most important and challenging techniques employed in datamining. patterns are collections of items which satisfy certain properties. Emerging patterns are those whose frequencies change significantly from one dataset to another. They represent strong contrast knowledge and have been shown to be very successful for constructing accurate and robust classifiers. In this paper, we examine various kinds of contrast patterns. We also investigate efficient patternmining techniques and discuss how to exploit patterns to construct effective classifiers
This paper uses a set of 3D geometric measures with the purpose of characterizing lung nodules as malignant or benign. Based on a sample of 36 nodules, 29 benign and 7 malignant, these measures are analyzed with a tec...
详细信息
The k-nearest neighbor (KNN) classification is a simple and effective classification approach. However improving performance of the classifier is still attractive. Combining multiple classifiers is an effective techni...
详细信息
ISBN:
(纸本)0769522971
The k-nearest neighbor (KNN) classification is a simple and effective classification approach. However improving performance of the classifier is still attractive. Combining multiple classifiers is an effective technique for improving accuracy. There are many general combining algorithms, such as Bagging, Boosting, or Error Correcting Output Coding that significantly improve the classifier such as decision trees, rule learners, or neural networks. Unfortunately, these combining methods developed do not improve the nearest neighbor classifiers. In this paper first, we present a new approach to combine multiple KNN classifiers based on different distance functions, in which we apply multiple distance functions to improve the performance of the k-nearest neighbor classifier Second, we develop a combining method, in which the weights of the distance function, are learnt by genetic algorithm. Finally, combining classifiers in error correcting output coding, are discussed The proposed algorithms seek to increase generalization accuracy when compared to the basic k-nearest neighbor algorithm. Experiments have been conducted on some benchmark datasets from the UCI machinelearning Repository. The results show that the proposed algorithms improve the performance of the k-nearest neighbor classification.
Feature selection refers to the problem of selecting those input features that are most predictive of a given outcome;a problem encountered in many areas such as machinelearning, patternrecognition and signal proces...
详细信息
ISBN:
(纸本)3540286535
Feature selection refers to the problem of selecting those input features that are most predictive of a given outcome;a problem encountered in many areas such as machinelearning, patternrecognition and signal processing. In particular, solution to this has found successful application in tasks that involve datasets containing huge numbers of features (in the order of tens of thousands), which would be impossible to process further. Recent examples include text processing and web content classification. Rough set theory has been used as such a dataset pre-processor with much success, but current methods are inadequate at finding minimal reductions, the smallest sets of features possible. This paper proposes a technique that considers this problem from a propositional satisfiability perspective. In this framework, minimal subsets can be located and verified. An initial experimental investigation is conducted, comparing the new method with a standard rough set-based feature selector.
Naive Bayes is one of the most efficient and effective inductive learning algorithms for machinelearning and datamining. Its competitive performance in classification is surprising, because the conditional independe...
详细信息
Naive Bayes is one of the most efficient and effective inductive learning algorithms for machinelearning and datamining. Its competitive performance in classification is surprising, because the conditional independence assumption on which it is based is rarely true in real-world applications. An open question is: what is the true reason for the surprisingly good performance of Naive Bayes in classification? In this paper, we propose a novel explanation for the good classification performance of Naive Bayes. We show that, essentially, dependence distribution plays a crucial role. Here dependence distribution means how the local dependence of an attribute distributes in each class, evenly or unevenly, and how the local dependences of all attributes work together, consistently (supporting a certain classification) or inconsistently (canceling each other out). Specifically, we show that no matter how strong the dependences among attributes are, Naive Bayes can still be optimal if the dependences distribute evenly in classes, or if the dependences cancel each other out. We propose and prove a sufficient and necessary condition for the optimality of Naive Bayes. Further, we investigate the optimality of Naive Bayes under the Gaussian distribution. We present and prove a sufficient condition for the optimality of Naive Bayes, in which the dependences among attributes exist. This provides evidence that dependences may cancel each other out. Our theoretic analysis can be used in designing learning algorithms. In fact, a major class of learning algorithms for Bayesian networks are conditional independence-based (or Cl-based), which are essentially based on dependence. We design a dependence distribution-based algorithm by extending the ChowLiu algorithm, a widely used CI based algorithm. Our experiments show that the new algorithm outperforms the ChowLiu algorithm, which also provides empirical evidence to support our new explanation.
暂无评论