The manual interpretation of ground-penetrating radar images is characterised by long interpretation cycles and high staff requirements. The automated interpretation schemes based on support vector machines, digital i...
详细信息
The manual interpretation of ground-penetrating radar images is characterised by long interpretation cycles and high staff requirements. The automated interpretation schemes based on support vector machines, digital images, convolutional neural networks and other techniques proposed in recent years mainly detect features from B-scan slices of 3D ground-penetrating radar data, without taking full advantage of the multi-channel acquisition of data from 3D ground-penetrating radar and joint discrimination. This paper proposes a void recognition algorithm based on cluster analysis algorithm, using VRADI algorithm to process 3D ground-penetrating radar B-Scan, using DBSCAN clustering algorithm to divide clusters and remove noise;proposes correlation weighting coefficient W- i ,W- j to quantitatively evaluate the degree of correlation of different survey channels, proposes prime relative position coefficient P- d indicator to quantitatively evaluate the position similarity, and proposes weighted homocentric overlap coefficient P- r indicator to quantify signal similarity. This paper applies the algorithm to carry out physical engineering experiments and uses binary logistic regression analysis to develop a correlation model. The experimental results show that significance of P- d and P (r) are less than 0.05, both of which are important influencing indicators for the determination of the presence or absence of void. With an optimal critical probability of 0.4, the recognition accuracy of VRADI algorithm increases from 71.7% to 92.2%. The VRADI algorithm combined with the cluster analysis algorithm outperforms manual recognition in terms of accuracy (92.2% > 83.9%) and recall (90.5% > 86.9%), and the algorithm has good engineering application value.
Recent research on multi-view clustering algorithms for complex disease subtyping often overlooks aspects like clustering stability and critical assessment of prognostic relevance. Furthermore, current frameworks do n...
详细信息
Recent research on multi-view clustering algorithms for complex disease subtyping often overlooks aspects like clustering stability and critical assessment of prognostic relevance. Furthermore, current frameworks do not allow for a comparison between data-driven and pathway-driven clustering, highlighting a significant gap in the methodology. We present the COPS R-package, tailored for robust evaluation of single and multi-omics clustering results. COPS features advanced methods, including similarity networks, kernel-based approaches, dimensionality reduction, and pathway knowledge integration. Some of these methods are not accessible through R, and some correspond to new approaches proposed with COPS. Our framework was rigorously applied to multi-omics data across seven cancer types, including breast, prostate, and lung, utilizing mRNA, CNV, miRNA, and DNA methylation data. Unlike previous studies, our approach contrasts data- and knowledge-driven multi-view clustering methods and incorporates cross-fold validation for robustness. clustering outcomes were assessed using the ARI score, survival analysis via Cox regression models including relevant covariates, and the stability of the results. While survival analysis and gold-standard agreement are standard metrics, they vary considerably across methods and datasets. Therefore, it is essential to assess multi-view clustering methods using multiple criteria, from cluster stability to prognostic relevance, and to provide ways of comparing these metrics simultaneously to select the optimal approach for disease subtype discovery in novel datasets. Emphasizing multi-objective evaluation, we applied the Pareto efficiency concept to gauge the equilibrium of evaluation metrics in each cancer case-study. Affinity Network Fusion, Integrative Non-negative Matrix Factorization, and Multiple Kernel K-Means with linear or Pathway Induced Kernels were the most stable and effective in discerning groups with significantly different s
The evaluation of clustering algorithms is intrinsically difficult because of the lack of objective measures. On the basis of the DIFI and China's Provincial Panel data, this study aims to test the poverty reducti...
详细信息
The evaluation of clustering algorithms is intrinsically difficult because of the lack of objective measures. On the basis of the DIFI and China's Provincial Panel data, this study aims to test the poverty reduction effect of digital inclusive finance in three dimensions of income, education, and healthcare and further look at the transmission mechanism of digital inclusive finance in poverty alleviation. The results indicated that digital inclusive finance exerts a poverty reduction effect in three dimensions-medical poverty, income poverty, and education poverty. Of these, the coverage breadth significantly affects the alleviation of medical poverty, the use depth significantly affects the alleviation of income poverty and education poverty, and the digitization level affects the alleviation of poverty in three dimensions. The level of regional economic development plays an intermediary role in the poverty alleviation effect of digital inclusive finance. Compared with the western region, which is relatively backward in development, the poverty reduction effect of digital inclusive finance in the eastern region is more significant.
We compare different jet-clustering algorithms in establishing fully hadronic final states stemming from the chain decay of a heavy Higgs state into a pair of the 125 GeV Higgs boson that decays into bottom-antibottom...
详细信息
We compare different jet-clustering algorithms in establishing fully hadronic final states stemming from the chain decay of a heavy Higgs state into a pair of the 125 GeV Higgs boson that decays into bottom-antibottom quark pairs. Such 4b events typically give rise to boosted topologies, wherein b<($)over bar b> pairs emerging from each 125 GeV Higgs boson tend to merge into a single, fat b-jet. Assuming large hadron collider (LHC) settings, we illustrate how both the efficiency of selecting the multi-jet final state and the ability to reconstruct from it the masses of all Higgs bosons depend on the choice of jet-clustering algorithm and its parameter settings. We indicate the optimal choice of clustering method for the purpose of establishing such a ubiquitous beyond the SM (BSM) signal, illustrated via a Type-II 2-Higgs Doublet Model (2HDM).
This paper presents schemes for determining the efficiency of clustering algorithms within wireless sensor networks (WSNs). Wireless sensor networking is a fast-growing technology, and such networks have the capabilit...
详细信息
This paper presents schemes for determining the efficiency of clustering algorithms within wireless sensor networks (WSNs). Wireless sensor networking is a fast-growing technology, and such networks have the capability to distribute tasks within themselves for effective computation. Wireless sensor networks have the capability to sense the environment, process information, and send that information to the end user. These systems are self-organizing, and they manage themselves without any centralized control. Wireless sensor networks are widely used in applications such as disaster management, battlefield investigation, border security, and security surveillance. They are deployed in large numbers, and are even operated in unattended environments, where human monitoring is either difficult or, sometimes, impossible. Therefore, to maintain service quality and a reasonable system lifespan, efficient management strategies are required. Several clustering algorithms have been introduced in this regard. In this paper, clustering algorithms comprising GESC, UDCA, and the k-Mean Method are analyzed and compared, with the help of certain attributes that are taken as classification criteria. On the basis of this analysis, a composite technique is proposed. Some suggestions are also presented, which if taken into consideration, can help in improving the efficiency of clustering algorithms.
The fuzzy c-means (FCM) clustering algorithm is the best known and used method in fuzzy clustering and is generally applied to well defined set of data. In this paper a generalized Probabilistic fuzzy c -means (FCM) a...
详细信息
The fuzzy c-means (FCM) clustering algorithm is the best known and used method in fuzzy clustering and is generally applied to well defined set of data. In this paper a generalized Probabilistic fuzzy c -means (FCM) algorithm is proposed and applied to clustering fuzzy sets. This technique leads to a fuzzy partition of the fuzzy rules, one for each cluster, which corresponds to a new set of fuzzy sub-systems. When applied to the clustering of a flat fuzzy system results a set of decomposed sub-systems that will be conveniently linked into a Parallel Collaborative Structures.
This paper presents a novel idea to avoid large transient error of hill-climbing type adaptive algorithms in abrupt changing environments. A data re-initialization (DR) scheme is proposed using multiple models approac...
详细信息
This paper presents a novel idea to avoid large transient error of hill-climbing type adaptive algorithms in abrupt changing environments. A data re-initialization (DR) scheme is proposed using multiple models approach with special initialization technique for smooth transition between different environments. The design is based on adaptive fuzzy clustering algorithm with on-line modifications; furthermore, it is equipped with learning capabilities. The fast adaptation is realized by DR and switching between different models; learning is realized by recording and retrieving the trained up models. The algorithm is conceptually simple and simulation results are appealing. Finally, conclusion and remarks are given.
clustering is performed to get insights into the data whose volume makes it problematic for analysis by humans. Due to this, clustering algorithms have emerged as meta learning tools for performing exploratory data an...
详细信息
clustering is performed to get insights into the data whose volume makes it problematic for analysis by humans. Due to this, clustering algorithms have emerged as meta learning tools for performing exploratory data analysis. A Cluster is defined as a set of objects which have a higher degree of similarity to each other compared to objects not in the same set. However there is ambiguity regarding a suitable similarity metric for clustering. Multiple measures have been proposed related to quantifying similarity such as euclidean distance, density in data space etc. making clustering a multi-objective optimization problem. In this paper, different clustering approaches are studied from the theoretical perspective to understand their relevance in context of massive data-sets and empirically these have been tested on artificial benchmarks to highlight their strengths and weaknesses.
Across a wide variety of fields and especially for industrial companies, data are being collected and accumulated at a dramatic pace from many different resources and services. Hence, there is an urgent need for a new...
详细信息
Across a wide variety of fields and especially for industrial companies, data are being collected and accumulated at a dramatic pace from many different resources and services. Hence, there is an urgent need for a new generation of computational theories and tools to assist humans in extracting useful information from the rapidly growing volumes of digital data. A well-known fundamental task of data mining to extract information is clustering. However, with the modified applications for various domains, several researchers have developed and have provided many clustering algorithms. This complexity makes it difficult for researchers and practitioners to keep up with clustering algorithms development. As a result, finding appropriate algorithms helps significantly to organize information and extract the correct answer from different queries of the databases. In this respect, the aim of this paper is to find the appropriate clustering algorithm for sparse industrial dataset. To achieve this goal, we first present related work that focus on comparing different clustering algorithms over the past twenty years. After that, we provide a categorization of different clustering algorithms found in the literature by matching their properties to the 4V’s challenges of Big data which allow us to select the candidate clustering algorithm. Finally, using internal validity indices, K-means, agglomerative hierarchical, DBSCAN and SOM have been implemented and compared on four datasets. In addition, we highlighted the best performing clustering algorithm that gives us the efficient clusters for each dataset.
Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical...
详细信息
Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical clustering were investigated. Both theoretical analysis and detailed experimental results were given. It is shown that a distance function greatly affects clustering results and can be used to detect the outlier of a cluster by the comparison of such different results and give the shape information of clusters. In practice situation, it is suggested to use different distance function separately, compare the clustering results and pick out the 搒wing points? And such points may leak out more information for data analysts.
暂无评论