Based on UDP and MFA, we propose a new un-supervised feature extraction algorithm, LMP (Local Marginal Projection), which is built on local quality. It measures the non-local quantities by the nearest sample between t...
详细信息
ISBN:
(纸本)9781424441990
Based on UDP and MFA, we propose a new un-supervised feature extraction algorithm, LMP (Local Marginal Projection), which is built on local quality. It measures the non-local quantities by the nearest sample between two locals. The goal of LMP is to find a projection that can maximize the distance of the sample in the same local and in different locals, in which case, the data can be projected into low-dimension easily. Besides, this projection could deal with the nonlinear and high-dimensional problem. The experiment on ORL and Yale face database shows that LMP algorithm can describe the high-dimensional data and can embed the nonlinear data Swiss-Hole into low-dimension space with a reasonable visual effectively.
Discretization, as a preprocessing step for datamining, is a process of converting the continuous attributes of a data set into discrete ones so that they can be treated as the nominal features by machinelearning al...
详细信息
Discretization, as a preprocessing step for datamining, is a process of converting the continuous attributes of a data set into discrete ones so that they can be treated as the nominal features by machinelearning algorithms. Those various discretization methods, that use entropy-based criteria, form a large class of algorithm. However, as a measure of class homogeneity, entropy cannot always accurately reflect the degree of class homogeneity of an interval. Therefore, in this paper, we propose a new measure of class heterogeneity of intervals from the viewpoint of class probability itself, Based on the definition of heterogeneity, we present a new criterion to evaluate a discretization scheme and analyze its property theoretically. Also, a heuristic method is proposed to find the approximate optimal discretization scheme. Finally, our method is compared, in terms of predictive error rate and tree size, with Ent-MDLC, a representative entropy-based discretization method well-known for its good performance. Our method is shown to produce better results than those of Ent-MDLC, although the improvement is not significant. It can be a good alternative to entropy-based discretization methods.
This paper examines the interpretability-accuracy tradeoff in fuzzy rule-based classifiers using a multiobjective fuzzy genetics-based machinelearning (GBML) algorithm. Our GBML algorithm is a hybrid version of Michi...
详细信息
This paper examines the interpretability-accuracy tradeoff in fuzzy rule-based classifiers using a multiobjective fuzzy genetics-based machinelearning (GBML) algorithm. Our GBML algorithm is a hybrid version of Michigan and Pittsburgh approaches, which is implemented in the framework of evolutionary multiobjective optimization (EMO). Each fuzzy rule is represented by its antecedent fuzzy sets as an integer string of fixed length. Each fuzzy rule-based classifier, which is a set of fuzzy rules, is represented as a concatenated integer string of variable length. Our GBML algorithm simultaneously maximizes the accuracy of rule sets and minimizes their complexity. The accuracy is measured by the number of correctly classified training patterns while the complexity is measured by the number of fuzzy rules and/or the total number of antecedent conditions of fuzzy rules. We examine the in terpretability-accuracy tradeoff for training patterns through computational experiments on some benchmark data sets. A clear tradeoff structure is visualized for each data set. We also examine the interpretabitity-accuracy tradeoff for testpatterns. Due to the overfitting to training patterns, a clear tradeoff structure is not always obtained in computational experiments for testpatterns. (C) 2006 Elsevier Inc. All rights reserved.
In this paper, we propose an automatic method for manuscript author verification based on an analysis of consecutive patches extracted from an image. The classification algorithm uses a deep convolutional network with...
详细信息
ISBN:
(纸本)9781509066285
In this paper, we propose an automatic method for manuscript author verification based on an analysis of consecutive patches extracted from an image. The classification algorithm uses a deep convolutional network with two types of patch extraction: one based on connected components and the other based on usage of a fixed-size sliding window. We apply this method to verify the authorship of the Arabic manuscript entitled al-Khitat attributed to the hand of the renowned medieval Arab historian al-Maqrizi. Using appropriately collected ground-truth labeled data for convolutional network training purpose, our method has demonstrated promising results when applied to previously unseen manuscripts.
The proceedings contain 10 papers. The topics discussed include: pattern of behavior mediated by cognitive scripts and emotional attitudes - context aware engineering of datamining systems;visualization of non-vector...
详细信息
ISBN:
(纸本)0769527302
The proceedings contain 10 papers. The topics discussed include: pattern of behavior mediated by cognitive scripts and emotional attitudes - context aware engineering of datamining systems;visualization of non-vectorial data using twin kernel embedding;dynamic algorithm selection using reinforcement learning;mining better technical trading strategies with genetic algorithms;finding the right features for instrument classification of classical music;quantification of intermarker influence based on global optimization and its application for stock market prediction;efficient fuzzy rules for classification;using coupled hidden Markov models for model suspect interactions in digital forensic analysis;using ontology to map categories in blog;and computational quantification of trust updates.
Electrical Systems are designed in the amalgamation of various types of electrical equipment at the generation, transmission, and distribution verge to furnish uninterrupted power to the consumers. To communicate this...
详细信息
In many practical machinelearning tasks, there are costs associated with acquiring the feature values of training instances, as well as a hard learning budget which limits the number of feature values that can be pur...
详细信息
ISBN:
(纸本)1595932089
In many practical machinelearning tasks, there are costs associated with acquiring the feature values of training instances, as well as a hard learning budget which limits the number of feature values that can be purchased. In this budgeted learning scenario, it is important to use an effective "data acquisition policy", that specifies how to spend the budget acquiring training data to produce an accurate classifier. This paper examines a simplified version of this problem, "active model selection" [10]. As this is a Markov decision problem, we consider applying reinforcement learning (RL) techniques to learn an effective spending policy. Despite extensive training, our experiments on various versions of the problem show that the performance of RL techniques is inferior to existing, simpler spending policies. Copyright 2005 ACM.
Extracting high level information from digital images and videos is a hard problem frequently faced by the computer vision and machinelearning communities. Modern surveillance systems can monitor people, cars or obje...
详细信息
Extracting high level information from digital images and videos is a hard problem frequently faced by the computer vision and machinelearning communities. Modern surveillance systems can monitor people, cars or objects by using computer vision methods. The objective of this work is to propose a method for identifying soft biometrics, in the form of clothing and gender, from images containing people, as a previous step for further identifying people themselves. We propose a solution to this classification problem using a Convolutional Neural Network, working as an all-in-one feature extractor and classifier. This method allows the development of a high-level end-to-end clothing/gender classifier. Experiments were done comparing the CNN with hand-designed classifiers. Also, two different operating modes of CNN are proposed and coin pared each other. The results obtained were very promising, showing that is possible to extract soft-biometrics attributes using an end-to-end CNN classifier. The proposed method achieved a good generalization capability, classifying the three different attributes with good accuracy. This suggests the possibility to search images using soft biometrics as search terms. (C) 2015 Elsevier B.V. All rights reserved.
We present the Source Code statistical Language Model data analysis pattern. statistical language models have been an enabling tool for a wide array of important language technologies. Speech recognition, machine tran...
详细信息
ISBN:
(纸本)9781467362962
We present the Source Code statistical Language Model data analysis pattern. statistical language models have been an enabling tool for a wide array of important language technologies. Speech recognition, machine translation, and document summarization (to name a few) all rely on statistical language models to assign probability estimates to natural language utterances or sentences. In this data analysis pattern, we describe the process of building n-gram language models over software source files. We hope that by introducing the empirical software engineering community to best practices that have been established over the years in research for natural languages, statistical language models can become a tool that SE researchers are able to use to explore new research directions.
The classical algorithm ISOMAP can find the intrinsic low-dimensional structures hidden in high-dimensional data uniformly distributed on or around a single manifold, but if the data are sampled from multi-class, each...
详细信息
ISBN:
(纸本)9781424441990
The classical algorithm ISOMAP can find the intrinsic low-dimensional structures hidden in high-dimensional data uniformly distributed on or around a single manifold, but if the data are sampled from multi-class, each of which corresponds to an independent manifold, and clusters formed by data points belonging to each class are separated away, several disconnected neighborhood graphs will form, which leads to the failure of ISOMAP algorithm. In this paper, an improved version of ISOMAP, namely Multi-Class Multi-Manifold ISOMAP (MCMM-ISOMAP), is proposed. MCMM-ISOMAP constructs a single neighborhood graph not by increasing the value of neighborhood parameter, but by the following steps that first choose appropriate value with which short-circuit edges can not be introduced, second find such pail-wise data each of which are two endpoints of the shortest Euclidean distance between classes, and finally make them neighborhood points each other. Thereby a single neighborhood graph will form, and then ISOMAP algorithm is applied to find the intrinsic low-dimensional embedding structure. Experimental results on synthetic and real data reveal effectiveness of the proposed method.
暂无评论