We establish sufficient conditions of exact and almost full recovery of the node partition in Bipartite Stochastic Block Model (BSBM) using polynomial time algorithms. First, we improve upon the known conditions of al...
详细信息
We establish sufficient conditions of exact and almost full recovery of the node partition in Bipartite Stochastic Block Model (BSBM) using polynomial time algorithms. First, we improve upon the known conditions of almost full recovery by spectral clustering algorithms in BSBM. Next, we propose a new computationally simple and fast procedure achieving exact recovery under milder conditions than the state of the art. Namely, if the vertex sets V-1 and V-2 in BSBM have sizes n(1) and n(2), we show that the condition p = Omega (max (root log n(1)/n(1)n(2), log n(1)/n(2))) on the edge intensity p is sufficient for exact recovery within V-1. This condition exhibits an elbow at n(2) asymptotic to n(1) log n(1) between the low-dimensional and high-dimensional regimes. The suggested procedure is a variant of Lloyd's iterations initialized with a well-chosen spectral estimator leading to what we expect to be the optimal condition for exact recovery in BSBM. The optimality conjecture is supported by showing that, for a supervised oracle procedure, such a condition is necessary to achieve exact recovery. The key elements of the proof techniques are different from classical community detection tools on random graphs. Numerical studies confirm our theory, and show that the suggested algorithm is both very fast and achieves almost the same performance as the supervised oracle. Finally, using the connection between planted satisfiability problems and the BSBM, we improve upon the sufficient number of clauses to completely recover the planted assignment.
An algorithm is described that fits boundary data of planar shapes in either rectangular coordinate or chain-code format with a set of straight line segments. The algorithm combines a new vertex detection method, whic...
详细信息
An algorithm is described that fits boundary data of planar shapes in either rectangular coordinate or chain-code format with a set of straight line segments. The algorithm combines a new vertex detection method, which locates initial vertices and segments in the data, with the c -elliptotype clustering algorithm, which iteratively adjusts the location of these initial segments, thereby obtaining a best polygonal fit for the data in the mean-squared error sense. Several numerical examples are given to exemplify the implementation and utility of this new approach.
We share our experiences teaching university students about clustering algorithms using EduClust, an online visualization we developed. EduClust supports professors in preparing teaching material and students in visua...
详细信息
We share our experiences teaching university students about clustering algorithms using EduClust, an online visualization we developed. EduClust supports professors in preparing teaching material and students in visually and interactively exploring cluster steps and the effects of changing clustering parameters. We used EduClust for two years in our computer science lectures on clustering algorithms and share our experience integrating the online application in a data science curriculum. We also point to opportunities for future development.
This paper deals with the problem of clustering a data set. In particular, the bisecting divisive partitioning approach is here considered. We focus on two algorithms: the celebrated K-means algorithm, and the recentl...
详细信息
Aiming to IDS (Intrusion Detection Systems) low features alarm clustering quality and excessive redundant alarms, an IDS alerts clustering algorithm based on novel chaotic particle swarm optimization is proposed. We f...
详细信息
In batch spawning fish, secondary growth oocytes (SGO) are recruited and spawned in successive co-horts, and multiple cohorts co-occur in spawning-capable females. So far, histological features such as the prevalence ...
详细信息
In batch spawning fish, secondary growth oocytes (SGO) are recruited and spawned in successive co-horts, and multiple cohorts co-occur in spawning-capable females. So far, histological features such as the prevalence of cortical alveoli or yolk granules are conservatively used to distinguish oocytes in different developmental stages which do not necessarily correspond to different cohorts. In this way, valuable information about spawning dynamics remains unseen and consequently misleading conclu-sions might be drawn, especially for species with high spawning rates and increased overlapping among oocyte cohorts. We introduce a new method for grouping oocytes into different cohorts based on the application of the K-means clustering algorithm on the characteristics of cytoplasmic structures, such as the varying size and intensity of cortical alveoli and yolk granules in oocytes of different development. The method allowed the grouping of oocytes without the need of using oocyte diameter, and thus, a crucial histological bias dealing with the cutting angle and the orientation of reference points (e.g. nu-cleus) has been overcome. Using sardine, Sardina pilchardus, as a case study, the separation of cohorts provided new insight into the ovarian dynamics, indentifying successive recruitment of up to five oocyte cohorts between SGO recruitment and spawning. These results verified previous histological indications of the number of cohorts in sardine. Altogether, this method represents an improved tool to study species with complex ovarian dynamics. (c) 2021 Elsevier Inc. All rights reserved.
clustering is one of the most important research areas in the field of data mining. clustering means creating groups of objects based on their features in such a way that the objects belonging to the same groups are s...
详细信息
To improve the security of network in big data era, the improved clustering algorithm is applied to carry out network security defence. Firstly, application of large data clustering algorithms in network security defe...
详细信息
To improve the security of network in big data era, the improved clustering algorithm is applied to carry out network security defence. Firstly, application of large data clustering algorithms in network security defence is analysed. Secondly, network security defence model is studied, and corresponding mathematical model is designed. Thirdly, the improved clustering algorithm based on big data is established through analysis text requirements and data characteristics. Finally, simulation analysis is carried out, and the effectiveness of the proposed algorithm is verified. The theory analysis results show that the proposed model can provide the theoretical basis for designing network information security defence system. Copyright 2021 Inderscience Enterprises Ltd.
An evaluation of several clustering methods was conducted. Artificial clusters which exhibited the properties of internal cohesion and external isolation were constructed. The true cluster structure was subsequently h...
详细信息
An evaluation of several clustering methods was conducted. Artificial clusters which exhibited the properties of internal cohesion and external isolation were constructed. The true cluster structure was subsequently hidden by six types of error-perturbation. The results indicated that the hierarchical methods were differentially sensitive to the type of error perturbation. In addition, generally poor recovery performance was obtained when random seed points were used to start theK-means algorithms. However, two alternative starting procedures for the nonhierarchical methods produced greatly enhanced cluster recovery and were found to be robust with respect to all of the types of error examined.
In this paper, we focus on the development of two similarity measure based robust possibilistic cmeans clustering (RPCM) algorithms which are not sensitive to the selection of initial parameters, robust to noise and o...
详细信息
In this paper, we focus on the development of two similarity measure based robust possibilistic cmeans clustering (RPCM) algorithms which are not sensitive to the selection of initial parameters, robust to noise and outliers, and able to automatically determine the number of clusters. The proposed algorithms are based on two different objective functions of PCM which can be regarded as special cases of similarity based robust clustering algorithms. The robustness of the proposed RPCM algorithms to noise and outliers is analyzed by using influence function and gross error sensitivity. Several simulations are conducted to demonstrate the effectiveness of the proposed algorithms.
暂无评论