In this paper a new encoding scheme and a software environinent. called DAGC. to develop, and evaluate genetic clustering algorithms is described. DAGC facilitates experiments with genetic clustering algorithms by pro...
详细信息
In this paper a new encoding scheme and a software environinent. called DAGC. to develop, and evaluate genetic clustering algorithms is described. DAGC facilitates experiments with genetic clustering algorithms by providing an extensible library of components to assemble new algorithms or modify existing ones. The algorithms may be executed within the environment on caterpillar or random graphs or class dependency graphs extracted from a given source code. The resultant clustering can be stored in a database, for later analysis. DAGC allows confidence analysis by automatically deriving a consolidated model from different clustering results for a given graph. We also offer a new clustering algorithm, called DAGC. The results of comparing the DAGC algorithm with a well known. algorithm, Bunch, are presented.
Communication signals that propagate through free space are subject to multi-path interference due to scattering by various objects in the propagation channel. The effect is especially severe in complex situations in ...
详细信息
Communication signals that propagate through free space are subject to multi-path interference due to scattering by various objects in the propagation channel. The effect is especially severe in complex situations in dense urban environments. To investigate the problem, a typical multi-static detection scenario is reconstructed under controlled laboratory conditions, from which suitable data sets are created. Data-driven models are then employed in EDGE computing platforms to profile the scatter centers based on the subjective manner in which they affect the signals. These have been interpreted primarily based on clustering algorithm (CA) operations- using a select suite of pre-processing models that effectively tame the variations in the C-band spatial-temporal data. A subset of the data of interest could then be subjected to an optional, compute-intensive machine learning (ML) approach. The relative advantages of the proposed method vis-a-vis an array of conventional schemes are highlighted, while also considering its carbon friendly attribute. Given the more significant association of the data to antenna radiation patterns, estimation of the latter can now be performed free of any anechoic chamber set up in a time and cost agnostic manner. The benefit of this work would lie in the realm of mid-band 5G-NR (and the future 6G) cellular communication systems deployment, where optimizing the distributed antenna location attributes on time and cost-constrained scales becomes imperative before any large-scale deployment.
Traditionally, prototype based fuzzy clustering algorithms such as the Fuzzy C Means (FCM) algorithm have been used to find ''compact'' or ''filled'' clusters. Recently, there have been...
详细信息
Traditionally, prototype based fuzzy clustering algorithms such as the Fuzzy C Means (FCM) algorithm have been used to find ''compact'' or ''filled'' clusters. Recently, there have been attempts to generalize such algorithms to the case of hollow or ''shell-like'' clusters, i.e., clusters that lie in subspaces of feature spate. The shell clustering approach provides a powerful means to solve the hitherto unsolved problem of simultaneously fitting: multiple curves/surfaces to unsegmented, scattered and sparse data. In this paper, we present several fuzzy and possibilistic algorithms to detect linear and quadric shell clusters. We also introduce generalizations of these algorithms in which the prototypes represent sets of higher-order polynomial functions. The suggested algorithms provide a good trade-off between computational complexity and performance. Since the objective function used in these algorithms is the sum of squared distances, the clustering is sensitive to noise and outliers. We show that by using a possibilistic approach to clustering, one can make the proposed algorithms robust.
clustering algorithms have become one of the most critical research areas in multiple domains, especially data mining. However, with the massive growth of big data applications in the cloud world, these applications f...
详细信息
clustering algorithms have become one of the most critical research areas in multiple domains, especially data mining. However, with the massive growth of big data applications in the cloud world, these applications face many challenges and difficulties. Since Big Data refers to an enormous amount of data, most traditional clustering algorithms come with high computational costs. Hence, the research question is how to handle this volume of data and get accurate results at a critical time. Despite ongoing research work to develop different algorithms to facilitate complex clustering processes, there are still many difficulties that arise while dealing with a large volume of data. In this paper, we review the most relevant clustering algorithms in a categorized manner, provide a comparison of clustering methods for large-scale data and explain the overall challenges based on clustering type. The key idea of the paper is to highlight the main advantages and disadvantages of clustering algorithms for dealing with big data in a scalable approach behind the different other features.
This paper focuses on the unsupervised detection of the Higgs boson particle using the most informative features and variables which characterize the“Higgs machine learning challenge 2014”data *** unsupervised detec...
详细信息
This paper focuses on the unsupervised detection of the Higgs boson particle using the most informative features and variables which characterize the“Higgs machine learning challenge 2014”data *** unsupervised detection goes in this paper analysis through 4 steps:(1)selection of the most informative features from the considered data;(2)definition of the number of clusters based on the elbow *** experimental results showed that the optimal number of clusters that group the considered data in an unsupervised manner corresponds to 2 clusters;(3)proposition of a new approach for hybridization of both hard and fuzzy clustering tuned with Ant Lion Optimization(ALO);(4)comparison with some existing metaheuristic optimizations such as Genetic Algorithm(GA)and Particle Swarm Optimization(PSO).By employing a multi-angle analysis based on the cluster validation indices,the confusion matrix,the efficiencies and purities rates,the average cost variation,the computational time and the Sammon mapping visualization,the results highlight the effectiveness of the improved Gustafson-Kessel algorithm optimized withALO(ALOGK)to validate the proposed *** if the paper gives a complete clustering analysis,its novel contribution concerns only the Steps(1)and(3)considered *** first contribution lies in the method used for Step(1)to select the most informative features and *** used the t-Statistic technique to rank ***,a feature mapping is applied using Self-Organizing Map(SOM)to identify the level of correlation between ***,Particle Swarm Optimization(PSO),a metaheuristic optimization technique,is used to reduce the data set *** second contribution of thiswork concern the third step,where each one of the clustering algorithms as K-means(KM),Global K-means(GlobalKM),Partitioning AroundMedoids(PAM),Fuzzy C-means(FCM),Gustafson-Kessel(GK)and Gath-Geva(GG)is optimized and tuned with ALO.
Traditional scalable clustering algorithms mainly deal with the clustering of linearly separable data, but it is challenging to cluster the non-linear separable data efficiently in the feature space. In this article, ...
详细信息
Traditional scalable clustering algorithms mainly deal with the clustering of linearly separable data, but it is challenging to cluster the non-linear separable data efficiently in the feature space. In this article, we propose a novel Kernelized Scalable Random Sampling with Iterative Optimization Fuzzy c-Means (KSRSIO-FCM) clustering algorithm using Big Data framework. To propose the KSRSIO-FCM, we also propose the Kernelized version of Scalable Literal Fuzzy c-Means (KSLFCM) clustering algorithm, which is an integral part of the proposed KSRSIO-FCM algorithm. These kernelized clustering algorithms are evolved to deal with the non-linear separable problems by applying a kernel Radial Basis Functions (RBF) which maps the input data space non-linearly into a high dimensional feature space. We aim to design and implement the kernelized fuzzy clustering algorithms on Apache Spark, which performs the efficient clustering of Big Data due to its in-memory cluster computing technique. Exhaustive experiments are performed on various big datasets to show the effectiveness of proposed KSRSIO-FCM in comparison with other scalable clustering algorithms, i.e., KSLFCM, SRSIO-FCM, and SLFCM. The reported experimental results show that the KSRSIO-FCM algorithm in comparison with KSLFCM, SRSIO-FCM, and SLFCM achieves significant improvement in terms of time and space complexity, Normalized Mutual Information (NMI), Adjusted Rand Index (ARI), and F-score, respectively. Furthermore, we have carried out a performance analysis of KSRSIO-FCM versus KSLFCM. Thus, the reported results show that the KSRSIO-FCM implemented on Apache Spark has better potential for Big Data clustering as compared to traditional scalable fuzzy clustering methods.
In the past two decades, network clustering has been proven as efficient approach for data collection and routing in wireless sensor networks (WSNs). It provides several advantages over other methods in terms of energ...
详细信息
In the past two decades, network clustering has been proven as efficient approach for data collection and routing in wireless sensor networks (WSNs). It provides several advantages over other methods in terms of energy efficiency, scalability, even energy distribution, etc. Given the limited capabilities of sensor nodes energy resources, processing power, and communication range, cluster-based protocols accommodate the network's operation with these constraints. Several survey papers present and compare many clustering algorithms from various perspectives. However, most of these surveys either are outdated or have limited scope. This paper provides a comprehensive review of clustering algorithms where the new ideas and concepts proposed in each phase of the clustering process are extensively studied. Three topics are discussed in this review. First, we present the objectives, characteristics and challenges of clustering algorithms. Second, the cluster-head selection methods for different types of WSNs are extensively studied. Third, this review presents a detailed description of newly proposed methods to handle energy heterogeneity, energy harvesting, fault-tolerance, scalability, mobility and data correlation in WSNs. Furthermore, the protocols taxonomy in each phase is discussed to provide a deeper understanding of current clustering approaches. Finally, a set of criteria is presented to simplify the comparison and identify each protocol's pros and cons. This review presents a comprehensive introduction and can be a useful guidance for new researchers in this field. Also, it will help system designers to identify alternative solutions for selecting an appropriate method in each phase of the clustering process.
Molecular dynamics (MD) simulation has become a powerful tool to investigate the structure- function relationship of proteins and other biological macromolecules at atomic resolution and biologically relevant timesc...
详细信息
Molecular dynamics (MD) simulation has become a powerful tool to investigate the structure- function relationship of proteins and other biological macromolecules at atomic resolution and biologically relevant timescales. MD simulations often produce massive datasets con- taining millions of snapshots describing proteins in motion. Therefore, clustering algorithms have been in high demand to be developed and applied to classify these MD snapshots and gain biological insights. There mainly exist two categories of clustering algorithms that aim to group protein conformations into clusters based on the similarity of their shape (geometric clustering) and kinetics (kinetic clustering). In this paper, we review a series of frequently used clustering algorithms applied in MD simulations, including divisive algorithms, ag- glomerative algorithms (single-linkage, complete-linkage, average-linkage, centroid-linkage and ward-linkage), center-based algorithms (K-Means, K-Medoids, K-Centers, and APM), density-based algorithms (neighbor-based, DBSCAN, density-peaks, and Robust-DB), and spectral-based algorithms (PCCA and PCCA+). In particular, differences between geomet- ric and kinetic clustering metrics will be discussed along with the performances of diflhrent clustering algorithms. We note that there does not exist a one-size-fits-all algorithm in the classification of MD datasets. For a specific application, the right choice of clustering algo- rithm should be based on the purpose of clustering, and the intrinsic properties of the MD conformational ensembles. Therefore, a main focus of our review is to describe the merits and limitations of each clustering algorithm. We expect that this review would be helpful to guide researchers to choose appropriate clustering algorithms for their own MD datasets.
Ventricular extrasystoles (VE) are ectopic heartbeats involving irregularities in the heart rhythm. VEs arise in response to impulses generated in some part of the heart different from the sinoatrial node. These are c...
详细信息
Ventricular extrasystoles (VE) are ectopic heartbeats involving irregularities in the heart rhythm. VEs arise in response to impulses generated in some part of the heart different from the sinoatrial node. These are caused by the premature discharge of a ventricular ectopic focus. VEs after myocardial infarction are associated with increased mortality. Screening of VEs is typically a manual and time consuming task that involves analysis of the heartbeat morphology, QRS duration, and variations of the RR intervals using long-term electrocardiograms. We describe a novel algorithm to perform automatic classification of VEs and report the results of our validation study. The proposed algorithm makes use of bounded clustering algorithms, morphology matching, and RR interval length to perform automatic VE classification without prior knowledge of the number of classes and heartbeat features. Additionally, the proposed algorithm does not need a training set.
clustering algorithms are becoming popular and widely applied in many academic fields, such as machine learning, pattern recognition, and artificial intelligence. It has posed significant challenges to accelerate the ...
详细信息
clustering algorithms are becoming popular and widely applied in many academic fields, such as machine learning, pattern recognition, and artificial intelligence. It has posed significant challenges to accelerate the algorithms due to the explosive data scale and wide variety of applications. However, previous studies mainly focus on the raw speedup with insufficient attention to the flexibility of the accelerator to support various applications. In order to accelerate different clustering algorithms in one accelerator, in this article, we design an accelerating framework based on FPGA for four state-of-the-art clustering methods, including K-means, PAM, SLINK, and DBSCAN algorithms. Moreover, we provide both euclidean and Manhattan distances as similarity metrics in the accelerator design paradigm. Moreover, we provide a custom instruction set to operate the accelerators within each application. In order to evaluate the performance and hardware cost of the accelerator, we constructed a hardware prototype on the state-of-the-art Xilinx FPGA platform. Experimental results demonstrate that the accelerator framework is able to achieve up to 23x speedup than Intel Xeon processor, and is 9.46x more energy efficient than NVIDIA GTX 750 GPU accelerators.
暂无评论