Finding clusters in datasets with different distributions and sizes is challenging when clusters are of widely various shapes, sizes, and densities. Based on a similar-to-multiple-point clustering strategy, a novel an...
详细信息
Finding clusters in datasets with different distributions and sizes is challenging when clusters are of widely various shapes, sizes, and densities. Based on a similar-to-multiple-point clustering strategy, a novel and simple clustering algorithm named MulSim is presented to address these issues in this paper. MulSim first defines a new distance which can automatically adapt different densities when clustering. Then, the MulSim groups two points together if and only if one point is similar to another point and its similar neighbors. Our comprehensive experiments on both multi-dimensional and two dimensional datasets representing different clustering difficulties, show that the MulSim performs better than classical and state-of-the-art baselines in most cases. Besides, when increasing the size of datasets, MulSim can still ensure good clustering quality. In addition, the impact of the two MulSim parameters on clustering quality as well as the way of the parameter estimation are analyzed. In the end, the practicability and feasibility of the algorithm are tested through a face recognition example.
Spectral clustering is widely used in data mining, machine learning and other fields. It can identify the arbitrary shape of a sample space and converge to the global optimal solution. Compared with the traditional k-...
详细信息
Spectral clustering is widely used in data mining, machine learning and other fields. It can identify the arbitrary shape of a sample space and converge to the global optimal solution. Compared with the traditional k-means algorithm, the spectral clustering algorithm has stronger adaptability to data and better clustering results. However, the computation of the algorithm is quite expensive. In this paper, an efficient parallel spectral clustering algorithm on multi-core processors in the Julia language is proposed, and we refer to it as juPSC. The Julia language is a high-performance, open-source programming language. The juPSC is composed of three procedures: (1) calculating the affinity matrix, (2) calculating the eigenvectors, and (3) conducting k-means clustering. Procedures (1) and (3) are computed by the efficient parallel algorithm, and the COO format is used to compress the affinity matrix. Two groups of experiments are conducted to verify the accuracy and efficiency of the juPSC. Experimental results indicate that (1) the juPSC achieves speedups of approximately 14x similar to 18x on a 24-core CPU and that (2) the serial version of the juPSC is faster than the Python version of scikit-learn. Moreover, the structure and functions of the juPSC are designed considering modularity, which is convenient for combination and further optimization with other parallel computing platforms. (C) 2020 Elsevier Inc. All rights reserved.
Imaging algorithms for visualization of defects play a significant role in Lamb wave-based research of nondestructive testing and structural health monitoring. In classical algorithms, the position or distribution of ...
详细信息
Imaging algorithms for visualization of defects play a significant role in Lamb wave-based research of nondestructive testing and structural health monitoring. In classical algorithms, the position or distribution of defects is located by mapping the amplitude or phase information of signals from the time domain to every discrete spatial grid of the structure. It is time-consuming. In this study, the diversity, statistical, and fuzzy characteristics of the elliptic imaging algorithm are analyzed first;then, an intelligent defect location algorithm is proposed based on the evolutionary strategy and the K-means algorithm. The position of defects can be identified by observing the distribution of individuals. There are six parts in the proposed algorithm, including the data structure design, adaptive population screening, adaptive population reproduction, diversity maintenance mechanism, and cutoff criterion. Considering the statistical and fuzzy characteristics in the detection, several specific input parameters are defined in our algorithm, such as the distance-dependent screening threshold, path-dependent residual vector, and path-independent residual. To maintain the diversity of individuals in the analysis, we have made two adjustments to the evolutionary strategy: one is to optimize the population screening and reproduction steps with the K-means algorithm, and the other is to add a diversity maintenance method into the evolutionary strategy. The effectiveness of the proposed intelligent defect location algorithm is verified by numerical simulations and experiments. Numerical studies indicate that the proposed algorithm has a reliable performance in the detection of defects with different shapes and sizes. In the experimental research, we demonstrate that the efficiency of the proposed algorithm is about 200 times faster than the elliptic imaging algorithm. And the optimum parameter setting of the algorithm is investigated by analyzing the influence of parameter
With the increase of ship AIS data. Through AIS data mining algorithm, it is possible to extract guidelines for the best path of a specific water area by using trajectory segment clustering. The defects of the current...
详细信息
With the increase of ship AIS data. Through AIS data mining algorithm, it is possible to extract guidelines for the best path of a specific water area by using trajectory segment clustering. The defects of the current trajectory segment clustering algorithm are mainly reflected in: the lack of direction of the path after clustering;For trajectory segment clustering, the whole trajectory is considered rather than a single trajectory segment. In this paper, the trajectory direction and density are used as a measure of similarity between trajectories, and the trajectory segment clustering algorithm is used to analyze the compressed ship trajectory segment data. The first step is to eliminate clusters that contain too few trajectories, and clusters that have too small a distance value between trajectories. For two nearby clusters in opposite directions, the Hausdorff distance is determined. When the distance is less than the threshold, the trajectory is considered to be a two-way route. Finally, the trajectory clusters after clustering are fused to form a view of the overall traffic flow frame of ships in the water area. The framework can describe the main driving direction of the ship in the water and provide decision-making suggestions for the driver's path planning.
In the emerging environment of the Internet of Things (IoT), through the connection of billions of radio frequency identification (RFID) tags and sensors to the Internet, applications will generate an unprecedented nu...
详细信息
In the emerging environment of the Internet of Things (IoT), through the connection of billions of radio frequency identification (RFID) tags and sensors to the Internet, applications will generate an unprecedented number of transactions and amount of data that require novel approaches in mining useful information from RFID trajectories. RFID data usually contain a considerable degree of uncertainty caused by various factors such as hardware flaws, transmission faults and environment instability. In this paper, we propose an efficient clustering algorithm that is much less sensitive to noise and outliers than the existing methods. To better facilitate the emerging cloud computing resources, our algorithm is designed cloud-friendly so that it can be easily adopted in a cloud environment. The scalability and efficiency of the proposed algorithm are demonstrated through an extensive set of experimental studies.
The fuzziness index m has important influence on the clustering result of fuzzy clustering algorithms, and it should not be forced to fix at the usual value m = 2. In view of its distinctive features in applications a...
详细信息
The fuzziness index m has important influence on the clustering result of fuzzy clustering algorithms, and it should not be forced to fix at the usual value m = 2. In view of its distinctive features in applications and its limitation in having m = 2 only, a recent advance of fuzzy clustering called fuzzy c-means clustering with improved fuzzy partitions (IFP-FCM) is extended in this paper, and a generalized algorithm called GIFP-FCM for more effective clustering is proposed. By introducing a novel membership constraint function, a new objective function is constructed, and furthermore, GIFP-FCM clustering is derived. Meanwhile, from the viewpoints of L-P norm distance measure and competitive learning, the robustness and convergence of the proposed algorithm are analyzed. Furthermore, the classical fuzzy c-means;algorithm (FCM) and IFP-FCM can be taken as two special cases of the proposed algorithm. Several experimental results including its application to noisy image texture segmentation are presented to demonstrate its average advantage over FCM and IFP-FCM in both clustering and robustness capabilities.
Owing to the striking features, such as controllable mobility, low cost, and so on, unmanned aerial vehicles (UAVs) are deemed to be the promising solution to complete data collection tasks of Internet of Things devic...
详细信息
Owing to the striking features, such as controllable mobility, low cost, and so on, unmanned aerial vehicles (UAVs) are deemed to be the promising solution to complete data collection tasks of Internet of Things devices (IoTDs). The limited onboard energy, however, undeniably impedes the progress of collecting data. Furthermore, this task is complicated further due to the various amount of data generated by the different types of IoTDs. The goal of this paper is to design an applicable data collection scheme for IoT networks using a laser-powered UAV to maximize system energy efficiency. We propose an improved clustering algorithm called logarithm kernel-based mean shift (LKMS) inspired by the idea behind the mean shift algorithm. Based on the LKMS, we propose a novel algorithm to determine the optimal visiting order and enter points (EPs) of IoTD clusters, paving the way for the following optimization. To manage to solve the variables-coupling and non-convex formulated problem, we artificially divide the entire flying procedure into two phases, the flying and charging (FC) phase as well as the collecting data (CD) phase, depending on whether the UAV is harvesting energy. The block coordinate descent (BCD) and the successive convex approximation (SCA) methods are used to decouple the variables and solve the non-convex subproblems. Simulation results validate the effectiveness of our proposed scheme.
Beyond the widely-studied scheduling of wafers within cluster tools, a novel and important perspective is raised in this paper to tackle an upper-level optimization problem in real-world production, i.e., the assignme...
详细信息
Beyond the widely-studied scheduling of wafers within cluster tools, a novel and important perspective is raised in this paper to tackle an upper-level optimization problem in real-world production, i.e., the assignment of hybrid types of wafer lots to a set of cluster tools with parallel modules to minimize the maximum completion time for the lots. The main difficulty in addressing such a problem is that the objective, i.e., the maximum completion time, cannot be calculated explicitly beforehand. To make this problem tractable, the associated maximal overlap among tools is utilized to heuristically evaluate the objective for the problem. Besides, since the cluster tools for processing are identical, we further tackle this problem as a clustering issue. Accordingly, a clustering algorithm based on greedy searching is proposed to allocate wafer lots into cluster tools while minimizing the maximal overlap. To elucidate our method and its significance in real-world production, the wet bench tool in wet cleaning process is taken as a case study. We compare the proposed algorithm with the empirical method in fabs and several intelligent optimization algorithms, and the experimental results verify the effectiveness of our proposed method in terms of improved efficiency.
The dissolution characteristics of glycerol derivatives in solvents at different temperatures and pressures were studied. The effects of solvent structure on gas absorption capacity and separation selectivity were ana...
详细信息
The dissolution characteristics of glycerol derivatives in solvents at different temperatures and pressures were studied. The effects of solvent structure on gas absorption capacity and separation selectivity were analyzed. The absorption thermodynamics and kinetics were discussed. The experimental results show that as the temperature increases, the solubility of glycerin derivatives decreases, and the separation selectivity between gases also decreases. At the same temperature, the more carbon atoms of the glycerin derivative, the easier the dissolution process in the ionic liquid. The thermodynamic parameters of each gas do not change much with increasing temperature. At the same time, the spectral clustering algorithm can be used to obtain the characteristics of the global optimal solution, which solves the problem that the traditional hybrid data clustering algorithm is easy to fall into the local optimal solution.
The clustering algorithm is considered as an important and basic method in the field of data mining on interdisciplinary researches. Various problems such as sensitive selection of initial clustering centre, easy to f...
详细信息
The clustering algorithm is considered as an important and basic method in the field of data mining on interdisciplinary researches. Various problems such as sensitive selection of initial clustering centre, easy to fall into local optimal solution, poor universal search capacity and requiring prior knowledge for determining numbers of clusters still exist in the traditional clustering algorithm. A gene expression programming (GEP) automatic clustering algorithm with variable penalty factors is adopted in this paper, featuring combination of penalty factors and GEP clustering algorithm, no requirements for prior knowledge on the data set, automatic division of clusters and better solution for the impact of isolated points and noise points. The simulation experiment makes further proof of the effectiveness of the algorithm in this paper.
暂无评论