the proceedings contain 68 papers. the topics discussed include: a comparative analysis of data distribution methods in an agent-based neural system for classification tasks;stochastic differential portfolio games wit...
详细信息
ISBN:
(纸本)0769526624
the proceedings contain 68 papers. the topics discussed include: a comparative analysis of data distribution methods in an agent-based neural system for classification tasks;stochastic differential portfolio games with regime switching model;extracting symbolic rules from clustering of gene expression data;a novel microarray gene selection method based on consistency;combining greedy method and genetic algorithm to identify transcription factor binding sites;investigation of a new artificial immune system model applied to pattern recognition;RLM: a new method of encoding weights in DNA strands;shape representation and distance measure based on retational graph;fast modeling of curved object from two images;research on an improved gray gradient orientation algorithm in anisotropic high-pass filtering;and image color reduction based on self-organizing maps and growing self-organizing neural networks.
the proceedings contain 68 papers. the topics discussed include: incremental classification rules based on association rules using formal concept analysis;finite mixture models with negative components;principles of m...
详细信息
ISBN:
(纸本)3540269231
the proceedings contain 68 papers. the topics discussed include: incremental classification rules based on association rules using formal concept analysis;finite mixture models with negative components;principles of multi-kernel data mining;a comprehensible SOM-based scoring system;linear manifold clustering;clustering document images using graph summaries;unsupervised learning of visual feature hierarchies;a new multidimensional feature transformation for linear classifiers and its applications;embedding time series data for classification;statistical supports for frequent itemsets on data streams;neural expert model applied to phonemes recognition;and signature-based approach for intrusion detection.
In order to make machinelearning algorithms more usable, our community must be able to design robust systems that offer support to practitioners. In the context of classification, this amounts to developing assistant...
详细信息
ISBN:
(纸本)0769524958
In order to make machinelearning algorithms more usable, our community must be able to design robust systems that offer support to practitioners. In the context of classification, this amounts to developing assistants, which deal withthe increasing number of models and techniques, and give advice dynamically on such issues as model selection and method combination. this paper briefly reviews the potential of meta-learning in this context and reports on the early success of a Web-based Data Mining assistant.
We present new approaches for semi-supervised learning based on the formulations of SVMs for the conventional supervised setting. the manifold structure of the data points given by the graph Laplacian can be taken int...
详细信息
ISBN:
(纸本)0769524958
We present new approaches for semi-supervised learning based on the formulations of SVMs for the conventional supervised setting. the manifold structure of the data points given by the graph Laplacian can be taken into account in a efficient way. the proposed optimization problems fully enjoy the sparse structure of the graph Laplacian, which enables us to optimize the problems with a large number of data points in a practical amount of computational time. Some results of experiments showing the performance of our approaches are presented.
We consider reducing loss of a classifier by decreasing its bias and variance. Embarking upon classification of scarcely labeled data, we use active learning approach in semi-supervised learning, and show that we can ...
详细信息
ISBN:
(纸本)0769524958
We consider reducing loss of a classifier by decreasing its bias and variance. Embarking upon classification of scarcely labeled data, we use active learning approach in semi-supervised learning, and show that we can speed up convergence to a desired level of loss. Our focus, in this paper, is on the best instance selection for labeling the unlabeled data;we use Jensen-Shannon divergence as one selection criterion. We show that our single instance selection approaches are superior to multiple selection approach. Empirical results indicate that this method can decrease classification loss significantly.
We present several algorithms that combine many base learners trained on different distributions of the data, but allow some of the base learners to be trained simultaneously by separate processors. Our algorithms tra...
详细信息
ISBN:
(纸本)0769524958
We present several algorithms that combine many base learners trained on different distributions of the data, but allow some of the base learners to be trained simultaneously by separate processors. Our algorithms train batches of base classifiers using distributions that can be generated in advance of the training process. We propose several heuristic methods that produce a group of useful distributions based on the performance of the classifiers in the previous batch. We present experimental evidence that suggest that two of our algorithms are able to produce classifiers as accurate as the corresponding Adaboost classifier withthe same number of base learners, but with a greatly reduced computation time.
We propose an abstract self bounding genetic algorithm that can be applied to various problems of machinelearning. the bound on the generalization error that is output by our algorithm is based on Rademacher Penaliza...
详细信息
ISBN:
(纸本)0769524958
We propose an abstract self bounding genetic algorithm that can be applied to various problems of machinelearning. the bound on the generalization error that is output by our algorithm is based on Rademacher Penalization, a data driven penalization technique. We prove probabilistic oracle inequalities for the theoretical risk of the estimators based on this approach. this is done by comparing the performance of an idealized genetic algorithm that uses a fitness function based on the generalization error withthat of an empirical genetic algorithm based on Rademacher penalization. the, inequalities indicate that although we are not able to implement the idealized algorithm (because of the inability to compute the generalization error), the empirical algorithm does almost as well as the idealized algorithm would.
In this paper we present a weakened variation of Support Vector machines that can be used together with Adaboost. Our modified Support Vector machine algorithm has the following interesting properties: First, it is ab...
详细信息
ISBN:
(纸本)0769524958
In this paper we present a weakened variation of Support Vector machines that can be used together with Adaboost. Our modified Support Vector machine algorithm has the following interesting properties: First, it is able to handle distributions over the training data. Second, it is a weak algorithm in the sense that it ensures an empirical error upper bounded by 1/2. third, when used together with Adaboost, the resulting algorithm is faster than the usual SVM training algorithm. Finally, we show that our boosted SVM can be effective as an editing algorithm.
Erectile dysfunction (ED) is a multifactorial disorder that can cause significant distress for men. Risk factor identification may allow for future ED prevention or delay onset. the goal of this investigation is, 1) t...
详细信息
ISBN:
(纸本)0769524958
Erectile dysfunction (ED) is a multifactorial disorder that can cause significant distress for men. Risk factor identification may allow for future ED prevention or delay onset. the goal of this investigation is, 1) to evaluate different machinelearning approaches for prognosticating ED and, 2) to analyze the degree of importance of ED risk factors. the investigated machinelearning approaches include: 1) logistic regression as a statistical method, 2) multilayer feedforward backpropagation neural networks (an artificial neural-network tool), 3) the fuzzy K-nearest neighbor classifier as a fuzzy logic method;4) support vector machine (SVM), a relatively new machinelearning process, and 5) conventional discriminant function analysis. the overall results obtained indicate that the artificial neural network method yields the highest ROC-AUC, and that it has produced the most reliable model for prognosticating ED when compared to the other investigated models.
Bagging, AdaBoost and Arc-x4 are among the most popular methods for classifier ensembles. All these methods rely on resampling techniques to generate different training subsamples for each of the base classifiers that...
详细信息
ISBN:
(纸本)0769524958
Bagging, AdaBoost and Arc-x4 are among the most popular methods for classifier ensembles. All these methods rely on resampling techniques to generate different training subsamples for each of the base classifiers that constitute the ensemble. In the present work, the classical implementations of these algorithms are modified in such a way that resampling is performed separately over the training instances of each class, thus obtaining the same class distribution in each subsample as that of the original training set. Moreover, we also introduce other modifications related to the size of the subsamples and also to the voting strategy. Experimental results for medical and non-medical databases are here presented and potential benefits of the proposed methods for diagnosis are suggested.
暂无评论