The challenging task of today's era in data clustering is the common technique of arranging similar data into chunks. The traditional clustering algorithm is effective for handling large amount of data which comes...
详细信息
The challenging task of today's era in data clustering is the common technique of arranging similar data into chunks. The traditional clustering algorithm is effective for handling large amount of data which comes from various sources such as social media, business, internet, etc. However, the time complexity of the serial calculation method is very high in these traditional algorithms. The K-Means algorithm is sensitive for initial points and local optimization and many times K-Means runs for K value. K-Harmonic Means is insensitive to the initialization of the centers and suitable for large scale datasets. To overcome these defects of traditional clustering algorithm, a hybrid method is suggested in this paper. MapReduce is a parallel programming model for distributed processing and generates data sets with a parallel, distributed algorithmic program on a cluster. In this paper, observations are given based on the different MapReduce algorithms. A new hybrid clustering algorithm based on MapReduce is proposed on those observations.
clustering also called cluster analysis is the task of data objects grouping where group objects are similar. Simultaneously, they should be different from other groups objects. Such groups are termed as Clusters. We ...
详细信息
ISBN:
(纸本)9781538668818
clustering also called cluster analysis is the task of data objects grouping where group objects are similar. Simultaneously, they should be different from other groups objects. Such groups are termed as Clusters. We are study in this paper several clustering algorithms applied in E-learning systems. Our objective lies in performing a deep analysis of clustering data mining techniques both theoretically and experimentally and to do a comparative study in order to distinguish which technique is more suitable to identify e-learners profile and also to evaluate better student performance in engineering education. A good clustering method is able to yield high quality clusters. The algorithms under investigation are: Canopy algorithm, Cobweb algorithm, EM algorithm, Farthest First algorithm, Filtered Clusterer algorithm, Make Density Based Clusterer algorithm. The performance of clustering the six algorithms is compared through the use of clustering tool WEKA (version 3.7.12) as an open source tool.
Due to different settings of the parameters and random selection of initial clustering centers, the traditional K-means algorithm is not stable. clustering validity index (CVI) is an important method for evaluating th...
详细信息
ISBN:
(纸本)9781728111421;9781728111414
Due to different settings of the parameters and random selection of initial clustering centers, the traditional K-means algorithm is not stable. clustering validity index (CVI) is an important method for evaluating the effect of clustering results generated by clustering algorithms. However, many of the existing CVIs suffer from instability, narrow range of applications and cannot properly process datasets with non-spherical distribution and datasets with a large number of overlapping points. Aiming at these problems, the traditional K-means algorithm is firstly improved by utilizing the dynamic average distance to find the initial clustering centers rather than selecting them randomly. Then, based on the idea of dynamic average distance, a new clustering validity index, DCVI, is proposed. The new DCVI is able to deal with many kinds of datasets includes non-convex datasets and datasets with a large number of overlapping points. Thirdly, by integrating the improved K-means algorithm with the new DCVI, a new algorithm (KVOA) is designed to optimize and determine the optimal clustering number (Kopt) for a wide range of datasets. The experimental results on testing several datasets have demonstrated that the improved K-means algorithm is more accurately and stably than the traditional ones. Meanwhile, the new DCVI is compared with six commonly used CVIs. The experimental results show that our new DCVI is more accurately and stably than the other CVIs.
In the bioinformatics area it expose an amazing development at the crossroads of biology, medicine, information science, and computer science. The pictures neatly explain that nowadays in this field research is as rep...
详细信息
ISBN:
(纸本)9781538669488
In the bioinformatics area it expose an amazing development at the crossroads of biology, medicine, information science, and computer science. The pictures neatly explain that nowadays in this field research is as reproductive in the data mining research. However, maximum bioinformatics research handles with the tasks of identification and classification, tree or network induction from data. clustering techniques are mostly employed in the sector of information technology, medicine as well as *** this paper, the modified hierarchical clustering algorithms are introduced and applied to orthologous IGF-1R protein sequences and it can produce orthologous clusters of sequences and phylogenetic trees are formed Compared to existing hierarchical algorithms these new algorithms are very efficient, it takes less time to execute and clustering accuracy is also *** contribution is acceptable attempt has been made on understanding the role of IGF-1R. The outcome enabled research in extended further to delineate the dependency of Physio-chemical properties, on the activity of inhibitors, and to study the multivariate regression analysis on a set of 87 IGF-1R inhibitors are dependent variables and some of independent variables resulted in F-test: 8.812, r value: 0.794 and r2 value of 0.631, respectively. The data set was introduced for the presence of outliers by calculating the leverages and standard residuals and finally 6 compounds were eliminated. A new regression model was attempted 76 compounds training set and 5 compound validation set. A Regression plot is obtained and justifies the predictive ability of the regression model. Finally, the designing or screening compounds libraries for new analogues should enhance the inhibitory activity against IGF-1R.
In this paper, we propose two fuzzy clustering algorithms in the differential privacy scheme based on the fuzzy c-means algorithm. Up to the author's knowledge, these are the first algorithms of their kind which p...
详细信息
In this paper, we propose two fuzzy clustering algorithms in the differential privacy scheme based on the fuzzy c-means algorithm. Up to the author's knowledge, these are the first algorithms of their kind which provide privacy for fuzzy data. Moreover, these two algorithms are experimentally compared with the original fuzzy algorithm and shown to be good approximation for the original fuzzy c-means algorithm.
Library is a place where large amount of data are generated and stored. These data must be transformed into information and knowledge which then could be used by researchers and users. The librarians might need to und...
详细信息
Library is a place where large amount of data are generated and stored. These data must be transformed into information and knowledge which then could be used by researchers and users. The librarians might need to understand how to transform, analyze, and present data in order to facilitate knowledge creation. The knowledge, data mining extracts can be expressed as Concepts, Rule, and Pattern, Constraints, and Visualization etc. As we progress into a more integrated part of the business processes, the process of transfer of information has become more complicated. Today, one of the biggest challenges that libraries face is the explosive growth of library data and to use this data to improve the quality of managerial decisions. Data mining is a field of intersection of computer Science and statistics used to discover patterns in the information bank. Data mining uses methods of extraction, combination, and analysis of data to create new information by revealing trends, patterns, and relationships. The patterns observed from data mining will be helpful in managing the library and understanding the reader's mindset towards reading and frequency of visits and choices of books could be analyzed.
This paper examines various clustering algorithms for corporate bond. The corporate bond is a bond issued by the corporations with certain time limit after which the bond will get mature. In order to raise funding suc...
详细信息
This paper examines various clustering algorithms for corporate bond. The corporate bond is a bond issued by the corporations with certain time limit after which the bond will get mature. In order to raise funding such as explanation of business and for many other reasons corporation issues bond to the organization. A few years ago, the corporate bond market had to face very poor clustered default events in comparison to those experienced throughout the recession. The purpose of bond clustering, any organization can choose the bond with high ratings and high yields. clustering is the technique to group data by identifying the common characteristics on a large dataset. In the proposed work K-Means, Hierarchical, Self Organizing Map, Fuzzy-C Mean and Gaussian Mixture Model clustering algorithms have been used to segment the bonds. The Portuguese retail bank with data collected from 2008 to 2013 is used for the presented work. The proposed work results, K-Means gives the most efficient result in terms of time and accuracy.
Association rules are widely used to extract patterns from a given database. The association rules are capable of finding correlations among items, making it possible for the user to learn which items are present in t...
详细信息
Association rules are widely used to extract patterns from a given database. The association rules are capable of finding correlations among items, making it possible for the user to learn which items are present in the transactions and which of them have a significant correlation. One of the major problems with association rules is that the number of extracted rules usually exceeds the number of transactions present in the database, also surpassing the user's capability to explore the obtained knowledge. To overcome this problem, the post-processing phase was proposed with the objective of directing the user to the rules that potentially have the most interesting knowledge. One of the used approaches is to divide the association rules into groups (or clusters), so that rules behave similarly are on the same group, facilitating the rule set understanding. In the literature, there are some works that uses clustering algorithms to split the rules while some other works use community detection algorithms. As both approaches obtain groups of association rules, but using different premises, different results can be obtained. No study has been done on the differences among clustering and community detection algorithms, which makes the selection of the algorithm hard, once their behavior is not well known in the association rule post-processing phase. This paper presents an analysis on both approaches, aiming to find the differences and the similarities among them, making it easier to select an approach by knowing its behavior.
This paper presents a novel review of various clustering algorithms used in the vehicular ad hoc networks (VANET). VANET is an emerging technology sharing a plethora of applications providing safety and contentment to...
详细信息
ISBN:
(纸本)9781538664841
This paper presents a novel review of various clustering algorithms used in the vehicular ad hoc networks (VANET). VANET is an emerging technology sharing a plethora of applications providing safety and contentment to the vehicle users. It is a vital part of Intelligent Transport Systems (ITS) which provides coherent and well organized communication between the vehicles like sending warning messages to avoid the accidents and fatal conditions. Peculiar traffic conditions and the dynamic topology of the network can be challenging for the timely delivery of the messages. Though VANETs presents a unique range of challenges for routing, on the other hand it equally presents solutions via clustering algorithms. clustering can be useful in maintaining the stability and reliability of an ad-hoc network that results in performance enhancement. clustering is basically a key technology in VANET that outperform the MANET clustering algorithms like 0lowest ID algorithm, maximum degree algorithm etc. that does not perform well on the pitch of VANET. Adding to the merit of vehicular ad- hoc networks, they have a good hand in accident avoidance, congestion detection, information dissemination etc. This paper is a reviewed comparative study of various clustering algorithms researched in recent years hence making it an easy task to examine the best algorithm for clustering in a particular situation.
This study compares classification algorithm performances of data mining clustering algorithms for remotely sensed multispectral image data using WEKA data mining software. clustering algorithm selection is very impor...
详细信息
ISBN:
(纸本)9781538651513
This study compares classification algorithm performances of data mining clustering algorithms for remotely sensed multispectral image data using WEKA data mining software. clustering algorithm selection is very important for data mining classification method based clustering. The class attribute for remotely sensed multispectral image data is obtained from six different clustering algorithms for classification. Classification algorithm performances computed depending on the data labeling of six different clustering algorithms in terms of correctly classified instances and kappa statistics for seven different classification algorithms.A strategy is developed for selecting the best unsupervised clustering algorithm, among different clustering algorithms, giving the highest supervised classification accuracy in terms of correctly classified instances and kappa statistics for semi-supervised classification of remotely-sensed multispectral image data. The performances of seven semi-supervised classification methods assessed depending on six different unsupervised clustering algorithms for supervised classification of remotely sensed multispectral image data. This study determines data free clustering algorithms for classification.
暂无评论