Outlier detection is one of the branches of datamining, with important applications in the domains of finance fraud detection, network intrusion analysis and so on. But most applications are high dimensional domains....
详细信息
ISBN:
(纸本)1424400600
Outlier detection is one of the branches of datamining, with important applications in the domains of finance fraud detection, network intrusion analysis and so on. But most applications are high dimensional domains. Many algorithms use the concept of proximity to find outliers based on the relationship to the data set. However, the sparsity of high dimensional points results to the algorithms are not available for high dimensional space. In this paper, we discuss a new technique ODHDP(Outlier Detection in High Dimension based on Projection) which finds the outliers based on projection from the data set.
Recently several manifold learning algorithms have been presented for nonlinear dimensionality reduction. Isomap is one of them. However, Isomap suffers from a deficiency that it does not give an explicit mapping func...
详细信息
ISBN:
(纸本)1424400600
Recently several manifold learning algorithms have been presented for nonlinear dimensionality reduction. Isomap is one of them. However, Isomap suffers from a deficiency that it does not give an explicit mapping function, which is from high dimensional space to low dimensional target space. In this paper, a version of Isomap with explicit mapping, called E-Isomap, is proposed. In E-Isomap, the geodesic distance matrix is fed into a cost function and then Iterative Majorization is adopted to solve an optimization problem for obtaining boththe low dimensional configuration and the nonlinear mapping. Owing to the existence of explicit mapping, this version of Isomap can be more easily used in patternrecognitionthan the original ones. the experiments on two benchmark data sets are given to demonstrate the performance of the presented method.
Privacy preserving datamining is a novel research direction in datamining and statistical databases, where datamining algorithms are analyzed for the side-effects they incur in data privacy. there have been many st...
详细信息
ISBN:
(纸本)1424400600
Privacy preserving datamining is a novel research direction in datamining and statistical databases, where datamining algorithms are analyzed for the side-effects they incur in data privacy. there have been many studies on efficient discovery of frequent itemsets in privacy preserving datamining. However, it is nontrivial to maintain such discovered frequent itemsets because a database may allow frequent itemsets updates and such frequent itemsets may be turned into infrequent itemsets. In this paper, an incremental updating algorithm IPPFIM is proposed for efficient maintenance of discovered frequent itemsets when new transaction data are added to a transaction database in privacy preserving. the algorithm makes use of previous mining results to cut down the cost of finding new frequent itemsets In an updated database, the performance evaluation shows the efficiency of this method.
In manufacturing processes it is very important that the condition of the cutting tool, particularly the indications when it should be changed, can be monitored. Cutting tool condition monitoring is a very complex pro...
详细信息
ISBN:
(纸本)1424400600
In manufacturing processes it is very important that the condition of the cutting tool, particularly the indications when it should be changed, can be monitored. Cutting tool condition monitoring is a very complex process and thus sensor fusion techniques and artificial intelligence signal processing algorithms are employed in this study. the multi-sensor signals reflect the tool condition comprehensively. A unique fuzzy neural hybrid patternrecognition algorithm has been developed. the weighted approaching degree can measure the difference of signal features accurately and the neurofuzzy network combines the transparent representation of fuzzy system withthe learning ability of neural networks. the algorithm has strong modeling and noise suppression ability. these leads to successful tool wear classification under a range of machining conditions.
Currently, datamining in data stream becomes a very popular research field. One of the central tasks in miningdata streams is that of identifying outliers which can lead to discovering unexpected and interesting kno...
详细信息
ISBN:
(纸本)1424400600
Currently, datamining in data stream becomes a very popular research field. One of the central tasks in miningdata streams is that of identifying outliers which can lead to discovering unexpected and interesting knowledge, which is critical important. To effectively mine outliers in data stream, ODABK, an algorithm for outlier detection in data stream is presented. It is based on KNN and significantly enhanced by means of other data structures and its optimized logical operations. Finally, the paper reports experiments on a real-world census data which show that ODABK is more effective in detection rate and execution times.
Finding the co-location patterns for spatial data is a challenging problem in spatial databases. While previous work focused on the discovery of co-location patterns for categorical data, we present a novel method tha...
详细信息
ISBN:
(纸本)1424400600
Finding the co-location patterns for spatial data is a challenging problem in spatial databases. While previous work focused on the discovery of co-location patterns for categorical data, we present a novel method that finds co-location patterns in spatial continuous data. Our algorithm mines the co-location patterns for continuous data by using a multi-layer index and neighbor domain set which resembles with item-set of transactions in classical datamining. We conduct experiments withthe fire data and the results indicate that the new algorithm is very effective.
In this paper, an Apriori algorithm is presented for mining frequent patterns based on inverted list. Compared with traditional Apriori algorithm and FP-growth algorithm, this algorithm has better efficiency and wider...
详细信息
ISBN:
(纸本)1424400600
In this paper, an Apriori algorithm is presented for mining frequent patterns based on inverted list. Compared with traditional Apriori algorithm and FP-growth algorithm, this algorithm has better efficiency and wider application range. Aimed at reducing the defect of traditional Apriori algorithm, this algorithm avoids lots of redundant operations with inverted list. this algorithm only needs scan data set twice and don't need joining and pruning operations. Frequent item set is saved in each transaction frequent set TF, and insert next frequent single item one by one,then generate new possible frequent item set. In this way, lots of redundant operations can be reduced. the performance study shows that it is more efficient in both dense datasets and sparse datasets.
Face recognition using labeled and unlabelled data has received considerable amount of interest in the past years. In the same time, multiple classifier systems (MCS) have been widely successful in various pattern rec...
详细信息
ISBN:
(纸本)9608457564
Face recognition using labeled and unlabelled data has received considerable amount of interest in the past years. In the same time, multiple classifier systems (MCS) have been widely successful in various patternrecognition applications such as face recognition. MCS have been very recently investigated in the context of semi-supervised learning. Very few attention has been devoted to verifying the usefulness of the newly developed semi-supervised MCS models for face recognition. In this work we attempt to access and compare the performance of several semi-supervised MCS training algorithms when applied to the face recognition problem. Experiments on a data set of face images are presented. Our experiments use non-homogenous classifier ensemble, majority voting rule and compare between a three semi-supervised learning models: the self-trained single classifier model, the ensemble driven model and a newly proposed modified co-training model. Experimental results reveal that the investigated semi-supervised models are successful in the exploitation of unlabelled data to enhance the classifier performance and their combined output. the proposed semi-supervised learning model has shown a significant improvement of the classification accuracy compared to existing models.
Similarity measure between time series is a key issue in datamining of time series database. Euclidean distance measure is typically used init. However, the measure is an extremely brittle distance measure. Dynamic T...
详细信息
ISBN:
(纸本)1424400600
Similarity measure between time series is a key issue in datamining of time series database. Euclidean distance measure is typically used init. However, the measure is an extremely brittle distance measure. Dynamic Time Warping (DTW) is proposed to deal withthis case, but its expensive computation limits its application in massive datasets. In this paper, we present a new distance measure algorithm, called local segmented dynamic time warping (LSDTW), which is based on viewing the local DTW measure at the segment level. the DTW measure between the two segments is the product of the square of the distance between their mean times the number of points of the longer segment. Experiments about cluster analysis on the basis of this algorithm were implemented on a synthetic and a real world dataset comparing with Euclidean and classical DTW measure. the experiment results show that the new algorithm gives better computational performance in comparison to classical DTW with no loss of accuracy.
Frequent itemset mining is a classic problem in datamining. However, most algorithms have to scan databases many times. this paper presents an algorithm that can find maximal frequent itemsets quickly. In this algori...
详细信息
ISBN:
(纸本)1424400600
Frequent itemset mining is a classic problem in datamining. However, most algorithms have to scan databases many times. this paper presents an algorithm that can find maximal frequent itemsets quickly. In this algorithm, each transaction is represented as a binary vector, so the task of discovering maximal frequent itemsets is turn to search frequent patterns in binary vector set. the algorithm is unique in that it simultaneously explores boththe itemset space and transaction space, unlike previous frequent itemset mining methods that only exploit the itemset search space. Furthermore, this algorithm can certify mining maximal frequent patterns with only one scan of original databases. Experiments verify the efficiency and advantages of the proposed algorithm.
暂无评论