The growing ubiquity of sensors in mobile phones has opened many opportunities for personal daily activity sensing. Most context recognition systems require a cumbersome preparation by collecting and manually annotati...
详细信息
The proceedings contain 15 papers. The topics discussed include: learning with configurable operators and RL-based heuristics;reducing examples in relational learning with bounded-treewidth hypotheses;mining complex e...
ISBN:
(纸本)9783642373817
The proceedings contain 15 papers. The topics discussed include: learning with configurable operators and RL-based heuristics;reducing examples in relational learning with bounded-treewidth hypotheses;mining complex event patterns in computer networks;learning in the presence of large fluctuations: a study of aggregation and correlation;machinelearning as an objective approach to understanding music;pair-based object-driven action rules;effectively grouping trajectory streams;healthcare trajectory mining by combining multidimensional component and itemsets;graph-based approaches to clustering network-constrained trajectory data;finding the most descriptive substructures in graphs with discrete and numeric labels;learning in probabilistic graphs exploiting language-constrained patterns;improving robustness and flexibility of concept taxonomy learning from text;and context-aware predictions on business processes: an ensemble-based solution.
Transport authorities have been deploying and utilising sensor infrastructures in order to improve upon the level of transport-related services within cities. As existing resources are more and more constrained, novel...
详细信息
In this paper, we push forward the idea of machinelearning systems for which the operators can be modified and finetuned for each problem. This allows us to propose a learning paradigm where users can write (or adapt...
详细信息
We present the Source Code statistical Language Model data analysis pattern. statistical language models have been an enabling tool for a wide array of important language technologies. Speech recognition, machine tran...
详细信息
ISBN:
(纸本)9781467362955
We present the Source Code statistical Language Model data analysis pattern. statistical language models have been an enabling tool for a wide array of important language technologies. Speech recognition, machine translation, and document summarization (to name a few) all rely on statistical language models to assign probability estimates to natural language utterances or sentences. In this data analysis pattern, we describe the process of building n-gram language models over software source files. We hope that by introducing the empirical software engineering community to best practices that have been established over the years in research for natural languages, statistical language models can become a tool that SE researchers are able to use to explore new research directions.
Mutual Information estimation is an important task for many datamining and machinelearning applications. In particular, many feature selection algorithms make use of the mutual information criterion and could thus b...
详细信息
ISBN:
(纸本)9789898425980
Mutual Information estimation is an important task for many datamining and machinelearning applications. In particular, many feature selection algorithms make use of the mutual information criterion and could thus benefit greatly from a reliable way to estimate this criterion. More precisely, the multivariate mutual information (computed between multivariate random variables) can naturally be combined with very popular search procedure such as the greedy forward to build a subset of the most relevant features. Estimating the mutual information (especially through density functions estimations) between high-dimensional variables is however a hard task in practice, due to the limited number of available data points for real-world problems. This paper compares different popular mutual information estimators and shows how a nearest neighbors-based estimator largely outperforms its competitors when used with high-dimensional data.
The latest development (Huang et al., 2011) has shown that better generalization performance can be obtained for extreme learningmachine (ELM) by adding a positive value to the diagonal of HT H or HHT, where H is the...
详细信息
ISBN:
(纸本)9789898425980
The latest development (Huang et al., 2011) has shown that better generalization performance can be obtained for extreme learningmachine (ELM) by adding a positive value to the diagonal of HT H or HHT, where H is the hidden layer output matrix. This paper further extends this enhanced ELM to online sequential learning mode. An online sequential learning algorithm is proposed for SLFNs and other regularization networks, consisting of two formulas for two kinds of scenarios: when initial training data is of small scale or large scale. Performance of proposed online sequential learning algorithm is demonstrated through six benchmarking data sets for both regression and multi-class classification problems.
In machinelearning, scale adds complexity. The most obvious consequence of scale is that data takes longer to process. At certain points, however, scale makes trivial operations costly, thus forcing us to re-evaluate...
详细信息
The task of predicting the label of a network node, based on the labels of the remaining nodes, is an area of growing interest in machinelearning, as various types of data are naturally represented as nodes in a grap...
详细信息
ISBN:
(纸本)9789898425980
The task of predicting the label of a network node, based on the labels of the remaining nodes, is an area of growing interest in machinelearning, as various types of data are naturally represented as nodes in a graph. As an increasing number of methods and approaches are proposed to solve this task, the problem of comparing their performance becomes of key importance. In this paper we present an extensive experimental comparison of 15 different methods, on 15 different labelled-networks, as well as releasing all datasets and source code. In addition, we release a further set of networks that were not used in this study (as not all benchmarked methods could manage very large datasets). Besides the release of data, protocols and algorithms, the key contribution of this study is that in each of the 225 combinations we tested, the best performance-both in accuracy and running time-was achieved by the same algorithm: Online Majority Vote. This is also one of the simplest methods to implement.
In classification problems, the dissimilarity representation has shown to be more robust than using the feature space. In order to build the dissimilarity space, a representation set of r objects is used. Several meth...
详细信息
ISBN:
(纸本)9789898425980
In classification problems, the dissimilarity representation has shown to be more robust than using the feature space. In order to build the dissimilarity space, a representation set of r objects is used. Several methods have been proposed for the selection of a suitable representation set that maximizes the classification performance. A recurring and crucial challenge in patternrecognition and machinelearning refers to the class imbalance problem, which has been said to hinder the performance of learning algorithms. In this paper, we carry out a preliminary study that pursues to investigate the effects of several prototype selection schemes when data set are imbalanced, and also to foresee the benefits of selecting the representation set when the class imbalance is handled by resampling the data set. statistical analysis of experimental results using Friedman test demonstrates that the application of resampling significantly improve the performance classification.
暂无评论