The proceedings contain 14 papers. The topics discussed include: toward economic machinelearning and utility-based datamining;machinelearning paradigms for utility-based datamining;cost-sensitive classifier evalua...
ISBN:
(纸本)1595932089
The proceedings contain 14 papers. The topics discussed include: toward economic machinelearning and utility-based datamining;machinelearning paradigms for utility-based datamining;cost-sensitive classifier evaluation;economical active feature-value acquisition through expected utility estimation;reinforcement learning for active model selection;wrapper-based computation and evaluation of sampling methods for imbalanced datasets;noisy information value in utility-based decision making;learning policies for sequential time and cost sensitive classification;improving classifier utility by altering the misclassification cost ratio;one-benefit learning: cost-sensitive learning with restricted cost information;utility based datamining for time series analysis - cost sensitive learning for neural network predictors;interruptible anytime algorithms for iterative improvement of decision trees;and contextual recommender problems.
Discretization, as a preprocessing step for datamining, is a process of converting the continuous attributes of a data set into discrete ones so that they can be treated as the nominal features by machinelearning al...
详细信息
Discretization, as a preprocessing step for datamining, is a process of converting the continuous attributes of a data set into discrete ones so that they can be treated as the nominal features by machinelearning algorithms. Those various discretization methods, that use entropy-based criteria, form a large class of algorithm. However, as a measure of class homogeneity, entropy cannot always accurately reflect the degree of class homogeneity of an interval. Therefore, in this paper, we propose a new measure of class heterogeneity of intervals from the viewpoint of class probability itself, Based on the definition of heterogeneity, we present a new criterion to evaluate a discretization scheme and analyze its property theoretically. Also, a heuristic method is proposed to find the approximate optimal discretization scheme. Finally, our method is compared, in terms of predictive error rate and tree size, with Ent-MDLC, a representative entropy-based discretization method well-known for its good performance. Our method is shown to produce better results than those of Ent-MDLC, although the improvement is not significant. It can be a good alternative to entropy-based discretization methods.
In many practical machinelearning tasks, there are costs associated with acquiring the feature values of training instances, as well as a hard learning budget which limits the number of feature values that can be pur...
详细信息
ISBN:
(纸本)1595932089
In many practical machinelearning tasks, there are costs associated with acquiring the feature values of training instances, as well as a hard learning budget which limits the number of feature values that can be purchased. In this budgeted learning scenario, it is important to use an effective "data acquisition policy", that specifies how to spend the budget acquiring training data to produce an accurate classifier. This paper examines a simplified version of this problem, "active model selection" [10]. As this is a Markov decision problem, we consider applying reinforcement learning (RL) techniques to learn an effective spending policy. Despite extensive training, our experiments on various versions of the problem show that the performance of RL techniques is inferior to existing, simpler spending policies. Copyright 2005 ACM.
This paper presents a new system for recognition, tracking and pose estimation of people in video sequences. It is based on the wavelet transform from the upper body part and uses Support Vector machines (SVM) for cla...
详细信息
ISBN:
(纸本)354024509X
This paper presents a new system for recognition, tracking and pose estimation of people in video sequences. It is based on the wavelet transform from the upper body part and uses Support Vector machines (SVM) for classification. recognition is carried out hierarchically by first recognizing people and then individual characters. The characteristic features that best discriminate one person from another are learned automatically. Tracking is solved via a particle filter that utilizes the SVM output and a first order kinematic model to obtain a robust scheme that successfully handles occlusion, different poses and camera zooms. For pose estimation a collection of SVM classifiers is evaluated to detect specific, learned poses.
Affine invariant regions have proved a powerful feature for object recognition and categorization. These features heavily rely on object textures rather than shapes, however. Typically, their shapes have been fixed to...
详细信息
ISBN:
(纸本)354024509X
Affine invariant regions have proved a powerful feature for object recognition and categorization. These features heavily rely on object textures rather than shapes, however. Typically, their shapes have been fixed to ellipses or parallelograms. The paper proposes a novel affine invariant region type, that is built up from a combination of fitted superellipses. These novel features have the advantage of offering a much wider range of shapes through the addition of a very limited number of shape parameters, with the traditional ellipses and parallelograms as subsets. The paper offers a solution for the robust fitting of superellipses to partial contours, which is a crucial step towards the implementation of the novel features.
In the typical nonparametric approach to classification in instance-based learning and datamining, random data (the training set of patterns) are collected and used to design a decision rule (classifier). One of the ...
详细信息
ISBN:
(纸本)3540305068
In the typical nonparametric approach to classification in instance-based learning and datamining, random data (the training set of patterns) are collected and used to design a decision rule (classifier). One of the most well known such rules is the k-nearest neighbor decision rule (also known as lazy learning) in which an unknown pattern is classified into the majority class among the k-nearest neighbors in the training set. This rule gives low error rates when the training set is large. However, in practice it is desired to store as little of the training data as possible, without sacrificing the performance. It is well known that thinning (condensing) the training set with the Gabriel proximity graph is a viable partial solution to the problem. However, this brings up the problem of efficiently computing the Gabriel graph of large training data sets in high dimensional spaces. In this paper we report on a new approach to the instance-based learning problem. The new approach combines five tools: first, editing the data using Wilson-Gabriel-editing to smooth the decision boundary, second, applying Gabriel-thinning to the edited set, third, filtering this output with the ICF algorithm of Brighton and Mellish, fourth, using the Gabriel-neighbor decision rule to classify new incoming queries, and fifth, using a new datastructure that allows the efficient computation of approximate Gabriel graphs in high dimensional spaces. Extensive experiments suggest that our approach is the best on the market.
All sciences, including astronomy, are now entering the era of information abundance. The, exponentially increasing volume and complexity of modern, data sets promises to transform the scientific practice, but also, p...
详细信息
ISBN:
(纸本)0769522556
All sciences, including astronomy, are now entering the era of information abundance. The, exponentially increasing volume and complexity of modern, data sets promises to transform the scientific practice, but also, poses a number of common technological challenges. The Virtual Observatory concept is the astronomical community's response to these challenges: it aims to harness the progress in information technology in the service of astronomy, and at the same time provide a valuable testbed for information technology and applied computer science. Challenges broadly fall into two categories: data handling (or "data farming"), including issues such as archives, intelligent storage, databases, interoperability, fast networks, etc., and datamining, data understanding, and knowledge discovery, which include issues such as automated clustering and classification, multivariate correlation searches, patternrecognition, visualization in highly hyperdimensional parameter spaces, etc., as well as various applications of machinelearning in these contexts. Such techniques are forming a methodological foundation for science with massive and complex data sets in general, and are likely to have a much broather impact on the modern society, commerce, information economy, security, etc. There is a powerful emerging synergy between the computationally enabled science and the science-driven computing, which will drive the progress in science, scholarship, and many other venues in the 21st century.
In this paper we investigate a trie-based APRIORI algorithm for mining frequent item sequences in a transactional database. We examine the datastructure, implementation and algorithmic features mainly focusing on tho...
详细信息
ISBN:
(纸本)1595932100
In this paper we investigate a trie-based APRIORI algorithm for mining frequent item sequences in a transactional database. We examine the datastructure, implementation and algorithmic features mainly focusing on those that also arise in frequent itemset mining. In our analysis we take into consideration modern processors' properties (memory hierarchies, prefetching, branch prediction, cache line size, etc.), in order to better understand the results of the experiments. Copyright 2005 ACM.
As the amount of multimodal meetings data being recorded increases, so does the need for sophisticated mechanisms for accessing this data. This process is complicated by the different informational needs of users, as ...
详细信息
ISBN:
(纸本)354024509X
As the amount of multimodal meetings data being recorded increases, so does the need for sophisticated mechanisms for accessing this data. This process is complicated by the different informational needs of users, as well as the range of data collected from meetings. This paper examines the current state of the art in meeting browsers. We examine both systems specifically designed for browsing multimodal meetings data and those designed to browse data collected from different environments, for example broadcast news and lectures. As a result of this analysis, we highlight potential directions for future research - semantic access, filtered presentation, limited display environments, browser evaluation and user requirements capture.
In this paper, we propose a fuzzy extension to proximal support vector classification via generalized eigenvalues. Here, a fuzzy membership value is assigned to each pattern, and points are classified by assigning the...
详细信息
ISBN:
(纸本)3540305068
In this paper, we propose a fuzzy extension to proximal support vector classification via generalized eigenvalues. Here, a fuzzy membership value is assigned to each pattern, and points are classified by assigning them to the nearest of two non parallel planes that are close to their respective classes. The algorithm is simple as the solution requires solving a generalized eigenvalue problem as compared to SVMs, where the classifier is obtained by solving a quadratic programming problem. The approach can be used to obtain an improved classification when one has an estimate of the fuzziness of samples in either class.
暂无评论