Increasing the lifespan of a group of distributed wireless sensors is one of the major challenges in research. This is especially important for distributed wireless sensor nodes used in harsh environments since it is ...
详细信息
Increasing the lifespan of a group of distributed wireless sensors is one of the major challenges in research. This is especially important for distributed wireless sensor nodes used in harsh environments since it is not feasible to replace or recharge their batteries. Thus, the popular low-energy adaptive clustering hierarchy (LEACH) algorithm uses the "computation and communication energy model" to increase the lifespan of distributed wireless sensor nodes. As an improved method, we present here that a combination of three clustering algorithms performs better than the LEACH algorithm. The clustering algorithms included in the combination are the k-means(++), k-means, and gap statistics algorithms. These three algorithms are used selectively in the following manner: the k-means C C algorithm initializes the center for the k-means algorithm, the k-means algorithm computes the optimal center of the clusters, and the gap statistics algorithm selects the optimal number of clusters in a distributed wireless sensor network. Our simulation shows that the approach of using a combination of clustering algorithms increases the lifespan of the wireless sensor nodes by 15% compared with the LEACH algorithm. This paper reports the details of the clustering algorithms selected for use in the combination approach and, based on the simulation results, compares the performance of the combination approach with that of the LEACH algorithm.
clustering plays an important role in data mining and machine learning. Then, intuitionistic fuzzy sets (IFSs) are flexible and practical in dealing with vagueness and uncertainty problems. To cluster the information ...
详细信息
clustering plays an important role in data mining and machine learning. Then, intuitionistic fuzzy sets (IFSs) are flexible and practical in dealing with vagueness and uncertainty problems. To cluster the information expressed by intuitionistic fuzzy data, this paper proposes the joint training auto-encoder based intuitionistic fuzzy clustering algorithm. Firstly, we propose the auto-encoder based intuitionistic fuzzy clustering by utilizing similarity measure of IFSs, auto-encoder and k-means algorithm. Then, we propose the joint training auto-encoder based intuitionistic fuzzy clustering algorithm by utilizing the proposed auto-encoder based intuitionistic fuzzy clustering and two kinds of similarity measures for the clustering analysis of intuitionistic fuzzy data. Lastly, several experiments are provided to verify the effectiveness of the proposed intuitionistic fuzzy clustering algorithms.
This study examined the electrocardiographic data set recorded by Boston's Beth Israel Hospital for the work of the cardiac and neurotransmitter system and for normal and various irregular heartbeat patterns of el...
详细信息
This study examined the electrocardiographic data set recorded by Boston's Beth Israel Hospital for the work of the cardiac and neurotransmitter system and for normal and various irregular heartbeat patterns of electrical activity in the heart. Seven different types of arrhythmia, available in this data set, were classified using four different, widely used, classifiers (Fuzzy C-Means, Naive Bayes, Extreme Learning Machine and K-Means) by multiple classification methods. Classifier performances were evaluated using accuracy, sensitivity, and selectivity classification performance measures. The results of the study showed that classification achievements for four classifiers had the highest success rate of 99% of "Normal" beat type compared to other types of arrhythmia. The average classification performances of Naive Bayes and Extreme Learning Machine classifiers were found to be higher when the classifiers were compared among themselves. When the averages of all the arrhythmia types were taken the most successful classifier was detected as Naive Bayes classifier with 92% accuracy and 95% selectivity values.
Data clustering techniques is used for aiding knowledge discovery when no additional information is available. There are several clustering techniques which produce reasonable results, although they often produce qual...
详细信息
Data clustering techniques is used for aiding knowledge discovery when no additional information is available. There are several clustering techniques which produce reasonable results, although they often produce qualitatively distinct clusterings. In this paper, we study how different clustering algorithms produce different kinds of clusters and their relations. Also, we evaluate the possibility to merge differently generated clustering into a new clustering which neither of original algorithms can produce. The main contribution of this paper is a new algorithm which merges previous generated clusterings based on must-link constraint rules built from agreements among elements observed from such clusterings. This novel approach employs the entropy of agreements in order to decide to which cluster should an element belong. Experimental results indicate: 1) our approach can merge characteristics from original clusterings; 2) in some situations, it captures new information from data and improve results, mainly when considering external perspective; and 3) in no situation it has produced significantly worse results.
We give the motivation for scoring clustering algorithms and a metric M: A → N from the set of clustering algorithms to the natural numbers which we realize as (Equation Presented) where αi, βi, wi are parameters u...
详细信息
Big Data has become commonplace in most Internet-based applications, which by delivering services to planetary scale numbers of users generate very large data sets. Such data sets are considered as a valuable source o...
详细信息
ISBN:
(纸本)9781509060306
Big Data has become commonplace in most Internet-based applications, which by delivering services to planetary scale numbers of users generate very large data sets. Such data sets are considered as a valuable source of analytics information and knowledge for many purposes and domains. It is claimed each time more that Big Data and machine learning, especially data mining, are the basis for developing advanced analytics platforms for turning data into valuable assets, gaining competitive advantage and make better decisions. At the same time, however, Big Data applications are showing to be killer applications for the state of the art machine learning and data mining algorithms. Indeed, traditional data mining frameworks such as WEKA, R, etc. and those from big companies such as IBM SPSS Modeler, SAS Enterprise Miner, Oracle Data Mining, etc. are facing the challenges of 1) coping with mining large data sets within short times and 2) under high rates of data generation. The way envisaged ahead to effectively deal with such challenges is to move to Cloudbased versions of such frameworks and development of new frameworks implemented using Cloud platforms. In either case, data mining and machine learning algorithms are being fully implemented in Cloud platforms under new requirements of Big Data for efficiency and performance. In the group of newly developed frameworks there is Apache Mahout, whose goal is “to build an environmentfor quickly creating scalable performant machine learning applications". In this paper we analyse the performance of some clustering algorithms of Apache Mahout using a Twitter streaming dataset under a Hadoop MapReduce cluster infrastructure according to various evaluation criteria.
The problem of estimating appropriate number of clusters has been a main and difficult issue in clustering researches. There are different methods for this in hierarchical clustering; a typical approach is to try clus...
详细信息
ISBN:
(纸本)9781509049189
The problem of estimating appropriate number of clusters has been a main and difficult issue in clustering researches. There are different methods for this in hierarchical clustering; a typical approach is to try clustering for different number of clusters, and compare them using a measure to estimate cluster numbers. On the other hand, there is no such method to estimate automatically the number of clusters in agglomerative hierarchical clustering (AHC), since AHC produces a family of clusters with different cluster numbers at the same time using the form of dendrograms. An exception is the Newman method in network clustering, but this method does not have a useful dendrogram output. The aim of the present paper is to propose new methods to automatically estimate the number of clusters in AHC. We show two approaches for this purpose, one is to use a variation of cluster validity measure, and another is to use statistical model selection method like BIC.
K-means is the basic algorithm used for discovering clusters within a dataset. Methods to enhance the k-means clustering algorithm are discussed. With the help of these methods efficiency, accuracy, performance and co...
详细信息
ISBN:
(纸本)9781538642061
K-means is the basic algorithm used for discovering clusters within a dataset. Methods to enhance the k-means clustering algorithm are discussed. With the help of these methods efficiency, accuracy, performance and computational time are improved. Some enhanced variations improve the efficiency and accuracy of the algorithm. Basically, in all the methods, the main aim is to reduce the number of iterations which will decrease the computational time. Studies show that K-means algorithm in clustering is widely used technique. Various enhancements done on k-mean are collected, so by using these enhancements, one can build a new hybrid algorithm which will be more efficient, accurate and less time consuming than the previous work.
In distributed storage systems, documents are shared among multiple Cloud providers and stored within their respective storage servers. In social secret sharing-based distributed storage systems, shares of the documen...
详细信息
ISBN:
(纸本)9781538624883
In distributed storage systems, documents are shared among multiple Cloud providers and stored within their respective storage servers. In social secret sharing-based distributed storage systems, shares of the documents are allocated according to the trustworthiness of the storage servers. This paper proposes a trust mechanism using machine learning techniques to compute evidence-based trust values. Our mechanism mitigates the effect of colluding storage servers. More precisely, it becomes possible to detect unreliable evidence and establish countermeasures in order to discourage the collusion of storage servers. Furthermore, this trust mechanism is applied to the social secret sharing protocol AS 3 , showing that this new evidence-based trust mechanism enhances the protection of the stored documents.
In this article, we advance divide-and-conquer strategies for solving the community detection problem in networks. We propose two algorithms which perform clustering on a number of small subgraphs and finally patches ...
详细信息
暂无评论