Incremental Flexible Frequency Discretization (IFFD) is a recently proposed discretization approach for Naive Bayes (NB). IFFD performs satisfactory by setting the minimal interval frequency for discretized intervals ...
详细信息
ISBN:
(纸本)9783642033476
Incremental Flexible Frequency Discretization (IFFD) is a recently proposed discretization approach for Naive Bayes (NB). IFFD performs satisfactory by setting the minimal interval frequency for discretized intervals as a fixed number, In this paper, we first argue that this setting cannot guarantee optimal classification performance in terms of classification error. We observed empirically that an optimal minimal interval frequency existed for each dataset. We thus proposed a sequential search and wrapper based incremental discretization method for NB: named Optimal Flexible Frequency Discretization (OFFD). Experiments were conducted on 17 datasets from UCI machinelearning repository and performance was compared between NB trained on the data discretized by OFFD, IFFD, PKID, and FFD respectively. Results show that OFFD works better than these alternatives for NB. Experiments between NB discretized on the data with OFFD and C4.5 showed that our new method outperforms C4.5 on most of the datasets we have tested.
Clustering has been among the most, active research topics in machinelearning and patternrecognition. though recent approaches delivered impressive results in a number of challenging clustering tasks. most of them d...
详细信息
ISBN:
(纸本)9783642105197
Clustering has been among the most, active research topics in machinelearning and patternrecognition. though recent approaches delivered impressive results in a number of challenging clustering tasks. most of them did not solve two problems. First., most, approaches need prior knowledge about the number of clusters which is not practical ill applications. Second. non-linear and elongated clusters cannot clustered correctly. In tins power. a general framework is proposed to solve both problems by convex clustering based on learned distance. In the proposed framework, the data is transformed from elongated structures into compact ones by a novel distance learning algorithm. then, a convex clustering algorithm is used to cluster the transformed data. Presented experimental results demonstrate successful solutions to both problems. In particular, the proposed approach is very suitable for superpixel generation, which are a common base for recent high level image segmentation algorithms.
this paper introduces the application of datamining technology in enterprise crisis management for enterprise competitive intelligence collection under the market competition. It focuses on how to acquire datamining...
详细信息
this paper introduces the application of datamining technology in enterprise crisis management for enterprise competitive intelligence collection under the market competition. It focuses on how to acquire datamining methods and contents of enterprise competitive intelligence in crisis management process, and the roles of datamining in enterprises crisis management competitive intelligence system. Finally, it builds the acquiring model of enterprise crisis competitive intelligence based on datamining.
Organisms exhibit a close structure-function relationship and a slight change in structure may in turn change their outputs accordingly [1]. this feature is important as it is the main reason why organisms have better...
详细信息
Neural network analysis, an important branch in datamining, has been widely used in statistical analysis, patternrecognition, image processing, biological species division and customer division. Based on division me...
详细信息
Neural network analysis, an important branch in datamining, has been widely used in statistical analysis, patternrecognition, image processing, biological species division and customer division. Based on division method, the paper rationally selected initial class center, dynamically regulated the number of classification during image classification, and proposed an image recognition method. In the new method, multi-scale wavelet decomposition was firstly conducted for the image to be recognized. then fisher transformation was performed on decomposition results of different scales which were defined as decomposition vectors. Finally, image recognition was realized in the fisher transformation domain according to the minimum absolute distance or comparative distance. the new method was proved to have high correct recognition rate and excellent recognition effect.
Meta-learning is currently a hot research topic in machinelearning, which has emerged from the need to support datamining automation in issues related to algorithm and parameter selection. Finding the best learning ...
详细信息
Meta-learning is currently a hot research topic in machinelearning, which has emerged from the need to support datamining automation in issues related to algorithm and parameter selection. Finding the best learning strategy for a new domain/problem can prove to be an expensive and time-consuming process even for the experienced analysts. this paper presents a new meta-learning system, designed to automatically discover the most reliable learning schemes for a particular dataset, based on the knowledge the system acquired about similar datasets. the novelty of the approach consists in combining dataset characterization with landmarking to increase the accuracy of the predictions. the proposed architecture is aiming to resolve the problem of selecting the best classifier for a dataset while minimizing the work done by the user but still offering flexibility.
GTM neurolike structures provide solutions of such tasks as: patternrecognition, prediction tasks, classification, Principal Components Analysis, factor analysis, optimization, lost data renewal or its (data) compres...
详细信息
GTM neurolike structures provide solutions of such tasks as: patternrecognition, prediction tasks, classification, Principal Components Analysis, factor analysis, optimization, lost data renewal or its (data) compression, realization of information security methods, solution of algebraic equations systems (including underdetermined and overdetermined), high dimensional data visualization, etc.
Nowadays, the carpet quality analysis is determined in industry by human experts, because the automated assessment is not capable of matching the human expertise. therefore, the carpet company demands a reliable and e...
详细信息
Nowadays, the carpet quality analysis is determined in industry by human experts, because the automated assessment is not capable of matching the human expertise. therefore, the carpet company demands a reliable and economic standardization of carpet wear level. this paper presents a new strategy for analyzing and classifying the texture of the wear carpet surface of 3D image, where 3D image is produced by 3D laser scanner. 2D image is obtained from 3D data resample on different grid sizes. the features extracted are based on Haralick descriptors of co-occurrence matrix. these features are used as inputs to a classifier system, which is based on support vector machine (SVM). Multi-class classification training based on SVM is applied. the performance of the new technique proposed gives an average of over 92% correct labeling.
As the wireless services developed rapidly in the recent years, a diversity of wireless services emerge such that radio environment becomes more and more complicated. Radio Spectrum security is now attached with great...
详细信息
As the wireless services developed rapidly in the recent years, a diversity of wireless services emerge such that radio environment becomes more and more complicated. Radio Spectrum security is now attached with great importance. Real time spectrum anomalies detection is vital for increasing demand on security to ensure that wireless services function on the rails. Malicious radio events, such as illegal channel occupation, happened frequently in the recent years, which result in severe interference to the normal radio spectrum usage. there were anomalies detection approaches in different areas proposed to conquer such malicious events. However, those malicious events usually happen in a short interval, this increases the demand on instantaneous responds for real-time events, and the complexity of previous approaches makes them insufficient to handle the real time task. In this paper, a new approach for anomalies detection in spectrum monitoring is proposed. Distinct from previous anomalies detection methods, both temporal and spectral information are taken into account and utilized to find out the potential anomalies. Meanwhile, an adaptive learning ability is proposed along to respond to the real-time change of radio environment. To analyze spectrum measurement data with high dimension, Mahalanobis distance is applied to disclose potential anomalies according to the historical pattern of radio spectrum. Methodology analysis and real case study have been performed to validate the detection effectiveness in practice.
mining bilingual data (including bilingual sentences and terms) from the Web can benefit many NLP applications, such as machine translation and cross language information retrieval. In this paper, based on the observa...
ISBN:
(纸本)9781932432466
mining bilingual data (including bilingual sentences and terms) from the Web can benefit many NLP applications, such as machine translation and cross language information retrieval. In this paper, based on the observation that bilingual data in many web pages appear collectively following similar patterns, an adaptive pattern-based bilingual datamining method is proposed. Specifically, given a web page, the method contains four steps: 1) preprocessing: parse the web page into a DOM tree and segment the inner text of each node into snippets; 2) seed mining: identify potential translation pairs (seeds) using a word based alignment model which takes both translation and transliteration into consideration; 3) patternlearning: learn generalized patterns withthe identified seeds; 4) pattern based mining: extract all bilingual data in the page using the learned patterns. Our experiments on Chinese web pages produced more than 7.5 million pairs of bilingual sentences and more than 5 million pairs of bilingual terms, both with over 80% accuracy.
暂无评论