Due to the increasing study of manpower resource, mining useful information and helpful knowledge from databases are evolving into an important research area. In this paper we use fuzzy datamining algorithm to analyz...
详细信息
ISBN:
(纸本)9781424409723
Due to the increasing study of manpower resource, mining useful information and helpful knowledge from databases are evolving into an important research area. In this paper we use fuzzy datamining algorithm to analyze the performance assessment of staffs in enterprise, grasp the structure of enterprise staffs, and then predict the performance of new staffs. the ability of the algorithm is tested experimentally and gives encouraging results.
Class imbalance tends to cause inferior performance in datamining learners. Evolutionary sampling is a technique which seeks to counter this problem by using genetic algorithms to evolve a reduced sample of a complet...
详细信息
ISBN:
(纸本)9780769530697
Class imbalance tends to cause inferior performance in datamining learners. Evolutionary sampling is a technique which seeks to counter this problem by using genetic algorithms to evolve a reduced sample of a complete dataset to train a classification model. Evolutionary sampling works to remove noisy and duplicate instances so that the sampled training data will produce a superior classifier We propose this novel technique as a method to handle severe class imbalance in datamining. this paper presents our research into the the use of evolutionary sampling with C4.5 decision trees and compares the technique's performance with random undersampling.
the EM algorithm has been used repeatedly to identify latent classes in categorical data by estimating finite distribution mixtures of product components. Unfortunately, the underlying mixtures are not uniquely identi...
详细信息
ISBN:
(纸本)9783540734987
the EM algorithm has been used repeatedly to identify latent classes in categorical data by estimating finite distribution mixtures of product components. Unfortunately, the underlying mixtures are not uniquely identifiable and, moreover, the estimated mixture parameters are starting-point dependent. For this reason we use the latent class model only to define a set of "elementary" classes by estimating a mixture of a large number components. We propose a hierarchical "bottom up" cluster analysis based on unifying the elementary latent classes sequentially. the clustering procedure is controlled by minimum information loss criterion.
Fractal theory has been used for computer graphics, image compression and different fields of patternrecognition. In this paper, a fractal based method for recognition of both on-line and off-line Farsi/Arabic handwr...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
Fractal theory has been used for computer graphics, image compression and different fields of patternrecognition. In this paper, a fractal based method for recognition of both on-line and off-line Farsi/Arabic handwritten digits is proposed. Our main goal is to verify whether fractal theory is able to capture discriminatory information from digits for patternrecognition task. Digit classification problem (on-line and off-line) deals withpatterns which do not have complex structure. So, a general purpose fractal coder, introduced for image compression, is simplified to be utilized for this application. In order to do that, during the coding process, contrast and luminosity information of each point in the input pattern are ignored. therefore, this approach can deal with on-line data and binary images of handwritten Farsi digits. In fact, our system represents the shape of the input pattern by searching for a set, of geometrical relationship between parts of it. Some fractal-based features are directly extracted by the fractal coder. We show that the resulting features have invariant properties which can be used for object recognition.
this paper adopts the idea of nearest neighbor and proposes a new approach called Fast Intuitive Clustering Approach (FICA). Besides, FICA also adds the concept of data compression to lower the operating times and coo...
详细信息
ISBN:
(纸本)9781424409723
this paper adopts the idea of nearest neighbor and proposes a new approach called Fast Intuitive Clustering Approach (FICA). Besides, FICA also adds the concept of data compression to lower the operating times and coordinates with parameters to reach global search. A series of experiments have been conducted on FICA and other clustering algorithms, like K-Means and DBSCAN. According to the simulation results, it is observed that the proposed FICA clustering algorithm outperforms K-Means and DBSCAN. FICA can not only to perform good efficiency and correctness but also be applied in large number of data sets. Finally, the proposed FICA is applied in face recognition problem.
In recent years there has been a tremendous increase in the number of users maintaining online blogs on the Internet. Companies, in particular, have become aware of this medium of communication and have taken a keen i...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
In recent years there has been a tremendous increase in the number of users maintaining online blogs on the Internet. Companies, in particular, have become aware of this medium of communication and have taken a keen interest in what is being said about them through such personal blogs. this has given rise to a new field of research directed towards mining useful information from a large amount of unformatted data present in online blogs and online forums. We discuss an implementation of such a blog mining application. the application is broadly divided into two parts, the indexing process and the search module. Blogs pertaining to different organizations are fetched from a particular blog domain on the Internet. After analyzing the textual content of these blogs they are assigned a sentiment rating. Specific data from such blogs along withtheir sentiment ratings are then indexed on the physical hard drive. the search module searches through these indexes at run time for the input organization name and produces a list of blogs conveying both positive and negative sentiments about the organization.
Association rule mining often results in an overwhelming number of rules. In practice, it is difficult for the final user to select the most relevant rules. In order to tackle this problem, various interestingness mea...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
Association rule mining often results in an overwhelming number of rules. In practice, it is difficult for the final user to select the most relevant rules. In order to tackle this problem, various interestingness measures were proposed. Nevertheless, the choice of an appropriate measure remains a hard task and the use of several measures may lead to conflicting information. In this paper, we give a unified view of objective interestingness measures. We define a new framework embedding a large set of measures called SBMs and we prove that the SBMs have a similar behavior. Furthermore, we identify the whole collection of the rules simultaneously optimizing all the SBMs. We provide an algorithm to efficiently mine a reduced set of rules among the rules optimizing all the SBMs. Experiments on real datasets highlight the characteristics of such rules.
We develop a metric Psi, based upon the RAND index, for the comparison and evaluation of dimensionality reduction techniques. this metric is designed to test the preservation of neighborhood structure in derived lower...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
We develop a metric Psi, based upon the RAND index, for the comparison and evaluation of dimensionality reduction techniques. this metric is designed to test the preservation of neighborhood structure in derived lower dimensional configurations. We use a customer information data set to show how Psi can be used to compare dimensionality reduction methods, tune method parameters, and choose solutions when methods have a local optimum problem. We show that Psi is highly negatively correlated with an alienation coefficient K that is designed to test the recovery of relative distances. In general a method with a good value of Psi also has a good value of K. However the monotonic regression used by Nonmetric MDS produces solutions with good values of Psi, but poor values of K.
A scalable and effective algorithm called AMGMSP (Approximate mining of Global Multidimensional Sequential patterns) is proposed to solve the problem of miningthe multidimensional sequential patterns for large databa...
详细信息
ISBN:
(纸本)9781424409723
A scalable and effective algorithm called AMGMSP (Approximate mining of Global Multidimensional Sequential patterns) is proposed to solve the problem of miningthe multidimensional sequential patterns for large databases in the distributed environment. First, the multidimensional information is embedded into the corresponding sequences in order to convert the mining on the multidimensional sequential patterns to sequential patterns. then the sequences are clustered, summarized, and analyzed on the distributed sites, and the local patterns could be obtained by the effective approximate sequential patternmining method. Finally, the global multidimensional sequential patterns could be mined by high vote sequential patterns after collecting all the local patterns on one site. Boththe theories and the experiments indicate that this method could simplify the problem of miningthe multidimensional sequential patterns and avoid miningthe redundant information. the global sequential patterns could be obtained effectively by the scalable method after reducing the cost of communication.
Two distinct principles of multi-modal kernel-based patternrecognition, kernel and classifier fusion, are demonstrated to share common underlying characteristics via the use of a novel kernel-based technique for comb...
详细信息
ISBN:
(纸本)9781424409723
Two distinct principles of multi-modal kernel-based patternrecognition, kernel and classifier fusion, are demonstrated to share common underlying characteristics via the use of a novel kernel-based technique for combining modalities under fully general conditions, namely, the neutral-point method. this method presents a conservative kernel-based strategy for dealing with missing and disjoint training data in independent measurement modalities that can be theoretically shown to default to the Sum Rule classification scheme. Results of comparative experiments indicate that the neutral-point technique loses relatively little classification information with respect to coincident training data, and is in fact preferable for independent kernels produced by different physical modalities due to its better error-cancellation properties.
暂无评论