The performance of anomaly detection algorithms is usually measured using the total residual error. This error metric is calculated by comparing the labels assigned by the detection algorithm against a reference groun...
详细信息
ISBN:
(纸本)9780769551029
The performance of anomaly detection algorithms is usually measured using the total residual error. This error metric is calculated by comparing the labels assigned by the detection algorithm against a reference ground truth. Obtaining a highly expressive ground truth is by itself a challenging task, if not infeasible. Often, a dataset is manually labeled by domain experts. However, manual labeling is error prone. In real-world sensor network deployments, it becomes even more difficult to label a sensor dataset due to the large amount of samples, the complexity of visualizing the data, and the uncertainty in the existence of anomalies. This paper proposes an automated technique which uses highly representative anomaly models for labeling. We demonstrate the effectiveness of this technique through evaluating a classification algorithm using our designed anomaly models as ground truth. We show that the classification accuracy is similar to that when using manually labeled realworld data points.
Pattern recognition problems occur in many fields and hence effective classification algorithms are the focus of much research. In various circumstances not classification accuracy but misclassification cost minimsati...
详细信息
ISBN:
(纸本)9781467364003;9781467363976
Pattern recognition problems occur in many fields and hence effective classification algorithms are the focus of much research. In various circumstances not classification accuracy but misclassification cost minimsation is the primary goal leading to the development of cost-sensitive classification algorithms. In this paper, we show how evolutionary algorithms, in particular genetic algorithms (GAs), can be employed optimise to cost-sensitive classifiers and classifier ensembles. In particular, we discuss how GAs can be employed to derive a compact set of fuzzy if-then rules with an embedded cost term, and how GAs are able to perform simultaneous classifier selection and fusion for ensemble classifiers.
Bugs are prevalent. To improve software quality, developers often allow users to report bugs that they found using a bug tracking system such as Bugzilla. Users would specify among other things, a description of the b...
详细信息
Bugs are prevalent. To improve software quality, developers often allow users to report bugs that they found using a bug tracking system such as Bugzilla. Users would specify among other things, a description of the bug, the component that is affected by the bug, and the severity of the bug. Based on this information, bug triagers would then assign a priority level to the reported bug. As resources are limited, bug reports would be investigated based on their priority levels. This priority assignment process however is a manual one. Could we do better? In this paper, we propose an automated approach based on machine learning that would recommend a priority level based on information available in bug reports. Our approach considers multiple factors, temporal, textual, author, related-report, severity, and product, that potentially affect the priority level of a bug report. These factors are extracted as features which are then used to train a discriminative model via a new classification algorithm that handles ordinal class labels and imbalanced data. Experiments on more than a hundred thousands bug reports from Eclipse show that we can outperform baseline approaches in terms of average F-measure by a relative improvement of 58.61%.
Recently modular multilevel converters are highly attractive for medium, high-voltage power transition and electrical machine drive. Capacitor voltage sorting is very important for capacitor voltage-balancing control....
详细信息
ISBN:
(纸本)9781479914463;9781479914470
Recently modular multilevel converters are highly attractive for medium, high-voltage power transition and electrical machine drive. Capacitor voltage sorting is very important for capacitor voltage-balancing control. This paper describes a novel sorting algorithm for capacitor voltage of modular multilevel converters (MMC).
In this paper, fault detection in HP drum of boilers in Kerman combined cycle power plant is explored by means of support vector machine (SVM) algorithm and principal component analysis (PCA). Initially, SVM classifie...
详细信息
ISBN:
(纸本)9781479931170
In this paper, fault detection in HP drum of boilers in Kerman combined cycle power plant is explored by means of support vector machine (SVM) algorithm and principal component analysis (PCA). Initially, SVM classifier algorithm and PCA are discussed and then based on the collecting data on normal and abnormal operating the conditions of boilers, fault detection is carried out via explained methods. Finally, a comparison of these techniques and other routine methods is made to show the superiority with the proposed approaches in Kerman power plant.
This study investigated the automated pre-processing and land cover classification of Landsat data. The Web-enabled Landsat Data (WELD) system was used to process large volumes of Landsat imagery to calibrated top of ...
详细信息
ISBN:
(纸本)9781479911127
This study investigated the automated pre-processing and land cover classification of Landsat data. The Web-enabled Landsat Data (WELD) system was used to process large volumes of Landsat imagery to calibrated top of atmosphere reflectance and brightness temperature products which are composited temporally and mosaicked for the KwaZulu-Natal Province of South Africa. The usefulness of an Automatic Spectral Rule-base Classifier (ASRC) approach was evaluated by relating the produced spectral categories to land cover classes. The ASRC method uses a hierarchical rule set, which relies on universally set thresholds derived from the literature, to decide on the spectral category. To assess the performance, the spectral categories were treated as input features to supervised classifiers to optimally assign land cover labels. The land cover classes used in the experiments were obtained from the official map of the Kwazulu-Natal province in South Africa, which was generated by operators in 2008. This approach was compared to an experiment using the original 7 Landsat spectral bands and derived indices as input features. It was found that the ASRC spectral categories did not provide a useful translation to land cover classes (45.5% classification accuracy), while the experiments using the Landsat 7 spectral bands or indices did considerably better (82.7% classification accuracy).
This paper presents our recent development of a classification algorithm for identification of breast cancer margins measured by hyperspectral imaging for the purpose of lowering the number of missed positive margins ...
详细信息
ISBN:
(纸本)9781479923427
This paper presents our recent development of a classification algorithm for identification of breast cancer margins measured by hyperspectral imaging for the purpose of lowering the number of missed positive margins in breast cancer lumpectomy. After extracting Fourier coefficient selection features and reducing the dimensionality of hyperspectral image data via the Minimum Redundancy Maximum Relevance method, an SVM classifier involving a radial basis kernel function is deployed to separate cancerous tissues from normal tissues. By examining exvivo breast cancer hyperspectral images tagged by a pathologist, the developed classification approach is shown to achieve a sensitivity of about 98% and a specificity of about 99%.
Natural texture images exhibit a high intra-class diversity due to different acquisition conditions (scene enlightenment, perspective angle,...). To handle with the diversity, a new supervised classification algorithm...
详细信息
ISBN:
(纸本)9781479903573
Natural texture images exhibit a high intra-class diversity due to different acquisition conditions (scene enlightenment, perspective angle,...). To handle with the diversity, a new supervised classification algorithm based on a parametric formalism is introduced: the K-centroids-based classifier (K-CB). A comparative study between various supervised classification algorithms on the VisTex and Brodatz image databases is conducted and reveals that the proposed K-CB classifier obtains relatively good classification accuracy with a low computational complexity.
In this paper, we study the binary classification problem in machine learning and introduce a novel classification algorithm based on the "Context Tree Weighting Method". The introduced algorithm incremental...
详细信息
ISBN:
(纸本)9781479903573
In this paper, we study the binary classification problem in machine learning and introduce a novel classification algorithm based on the "Context Tree Weighting Method". The introduced algorithm incrementally learns a classification model through sequential updates in the course of a given data stream, i.e., each data point is processed only once and forgotten after the classifier is updated, and asymptotically achieves the performance of the best piecewise linear classifiers defined by the "context tree". Since the computational complexity is only linear in the depth of the context tree, our algorithm is highly scalable and appropriate for real time processing. We present experimental results on several benchmark data sets and demonstrate that our method provides significant computational improvement both in the test (5 ~ 35×) and training phases (40 ~ 1000×), while achieving high classification accuracy in comparison to the SVM with RBF kernel.
This paper introduces a centroid-based (CB) supervised classification algorithm of textured images. In the context of scale/orientation decomposition, we demonstrate the possibility to develop centroid approach based ...
详细信息
ISBN:
(纸本)9781479936878
This paper introduces a centroid-based (CB) supervised classification algorithm of textured images. In the context of scale/orientation decomposition, we demonstrate the possibility to develop centroid approach based on a stochastic modeling. The aim of this paper is twofold. Firstly, we introduce the generalized Gamma distribution (GΓD) for the modeling of wavelet coefficients. A comparative goodness-of-fit study with various univariate models reveals the potential of the proposed model. Secondly, we propose an algorithm to estimate the centroid from the collection of GΓD parameters. To speed-up the convergence of the steepest descent, we propose to include the Fisher information matrix in the optimization step. Experiments from various conventional texture databases are conducted and demonstrate the interest of the proposed classification algorithm.
暂无评论