In this paper we introduce an evolutionary approach for the efficient design of prototype-based classifiers using differential evolution (DE). For this purpose we amalgamate ideas from the Learning Vector Quantization...
详细信息
ISBN:
(纸本)9781479944613
In this paper we introduce an evolutionary approach for the efficient design of prototype-based classifiers using differential evolution (DE). For this purpose we amalgamate ideas from the Learning Vector Quantization (LVQ) framework for supervised classification by Kohonen [1], [2], with the DE-based automatic clustering approach by Das et al. [3] in order to evolve supervised classifiers. The proposed approach is able to determine both the optimal number of prototypes per class and the corresponding positions of these prototypes in the data space. By means of comprehensive computer simulations on benchmarking datasets, we show that the resulting classifier, named LVQ-DE, consistently outperforms state-of-the-art prototype-based classifiers.
A fast prototype-based nearest neighbor classifier is introduced. The proposed Adjusted SOINN classifier (ASC) is based on SOINN (self-organizing incremental neural network), it automatically learns the number of prot...
详细信息
A fast prototype-based nearest neighbor classifier is introduced. The proposed Adjusted SOINN classifier (ASC) is based on SOINN (self-organizing incremental neural network), it automatically learns the number of prototypes needed to determine the decision boundary, and learns new information without destroying old learned information. It is robust to noisy training data, and it realizes very fast classification. In the experiment, we use some artificial datasets and real-world datasets to illustrate ASC. We also compare ASC with other prototype-based classifiers with regard to its classification error, compression ratio, and speed up ratio. The results show that ASC has the best performance and it is a very efficient classifier. (C) 2008 Elsevier Ltd. All rights reserved.
In this paper, a new scheme for constructing parsimonious fuzzy classifiers is proposed based on the L2-support vector machine (L2-SVM) technique with model selection and feature ranking performed simultaneously in an...
详细信息
In this paper, a new scheme for constructing parsimonious fuzzy classifiers is proposed based on the L2-support vector machine (L2-SVM) technique with model selection and feature ranking performed simultaneously in an integrated manner, in which fuzzy rules are optimally generated from data by L2-SVM learning. In order to identify the most influential fuzzy rules induced from the SVM learning, two novel indexes for fuzzy rule ranking are proposed and named as alpha-values and omega-values of fuzzy rules in this paper. The ce-values are defined as the Lagrangian multipliers of the L2-SVM and adopted to evaluate the output contribution of fuzzy rules, while the w-values are developed by considering both the rule base structure and the output contribution of fuzzy rules. As a prototype-based classifier, the L2-SVM-based fuzzy classifier evades the curse of dimensionality in high-dimensional space in the sense that the number of support vectors, which equals the number of induced fuzzy rules, is not related to the dimensionality. Experimental results on high-dimensional benchmark problems have shown that by using the proposed scheme the most influential fuzzy rules can be effectively induced and selected, and at the same time feature ranking results can also be obtained to construct parsimonious fuzzy classifiers with better generalization performance than the well-known algorithms in literature.
For prototype-based classifiers, the number of prototypes results in increasing the computational time so that it takes very long time for a prototype-based classifier to determine the class label of an associated dat...
详细信息
For prototype-based classifiers, the number of prototypes results in increasing the computational time so that it takes very long time for a prototype-based classifier to determine the class label of an associated data. Many researchers have been interested in the reduction of the number of prototypes without degradation of the classification ability of prototype-based classifiers. In this paper, we introduce a new method for generating prototypes based on the assumption that the prototypes positioned near the boundary surface are important for improving the classification abilities of nearest neighbor classifiers. The main issue of this paper is how to locate the new prototypes as close as possible to the boundary surface. To realize this, we consider possibilistic C-Means clustering and conditional C-Means clustering. The clusters obtained by using possibilistic C-Means clustering methods are used to define the boundary areas, and the conditional fuzzy C-Means clustering technique is used to determine the locations of prototypes within the already defined boundary areas. The design procedure is illustrated with the aid of numeric examples that provide a thorough insight into the effectiveness of the proposed method.
Data imbalance has been a challenge in many areas of automatic classification. Many popular approaches including over-sampling, under-sampling, and Synthetic Minority Oversampling Technique (SMOTE) have been developed ...
详细信息
Data imbalance has been a challenge in many areas of automatic classification. Many popular approaches including over-sampling, under-sampling, and Synthetic Minority Oversampling Technique (SMOTE) have been developed and tested in previous research. A big problem with these techniques is that they try to solve the problem by modifying the original data rather than truly overcome the imbalance and let the classifiers learn. For tasks in areas like remote sensing and depression detection, the imbalanced data challenge also exists. Researchers have made efforts to overcome the challenge by adopting methods at the data pre-processing step. However, in remote sensing and depression detection tasks, the main interest is still on applying different new classifiers such as deep learning which has powerful classification ability but still do not consider data imbalance as prime factor of lower classification performance. In this thesis, we demonstrate the performance of K-CR in our evaluation experiments on a urban land cover classification dataset and on two depression detection datasets. The latter two datasets consist in social media texts (tweets), therefore we propose to adopt a feature selection technique Term Frequency - Category-based Term Weights (TF-CBTW) and various word embedding techniques (Word2Vec, FastText, GloVe, and language model BERT). This feature selection method was not applied before in similar settings and we show that it helps to improve the efficiency and the results of the K-CR classifier. Our three experiments show that K-CR can achieve comparable performance on the majority classes and better performance on minority classes when compared to other classifiers such as Random Forest, K-Nearest Neighbour, Support Vector Machines, Multi-layer Perception, Convolutional Neural Networks, and Long Short-Term Memory.
We propose two new comprehensive schemes for designing prototype-based classifiers. The scheme addresses all major issues (number of prototypes, generation of prototypes, and utilization of the prototypes) involved in...
详细信息
We propose two new comprehensive schemes for designing prototype-based classifiers. The scheme addresses all major issues (number of prototypes, generation of prototypes, and utilization of the prototypes) involved in the design of a prototype-based classifier. First we use Kohonen's self-organizing feature map (SOFM) algorithm to produce a minimum number (equal to the number of classes) of initial prototypes. Then we use a dynamic prototype generation and tuning algorithm (DYNAGEN) involving merging, splitting, deleting, and retraining of the prototypes to generate an adequate number of useful prototypes. These prototypes are used to design a "1 nearest multiple prototype (1-NMP)" classifier. Though the classifier performs quite well, it cannot reasonably deal with large variation of variance among the data from different classes. To overcome this deficiency we design a "1 most similar prototype (1-MSP)" classifier. We use the prototypes generated by the SOFM-based DYNAGEN algorithm and associate with each of them a zone of influence. A norm (Euclidean)-induced similarity measure is used for this. The prototypes and their zones of influence are fine-tuned by minimizing an error function. Both classifiers are trained and tested using several data sets, and a consistent improvement in performance of the latter over the former has been observed. We also compared our classifiers with some benchmark results available in the literature.
暂无评论