Clustering methods are widely used tools in many aspects of science, such as ecology, medicine, or even market research, that commonly deal with dendrogram-based analyses. In such analyses, for a given initial dissimi...
详细信息
Clustering methods are widely used tools in many aspects of science, such as ecology, medicine, or even market research, that commonly deal with dendrogram-based analyses. In such analyses, for a given initial dissimilarity matrix, the resulting dendrogram may strongly vary according to the selected clustering methods. However, numerous dendrogram-based analyses require adequate measurement for assessing of which of the clustering methods preserves most faithfully the initial dissimilarity matrix. While cophenetic correlation coefficient-based measures have been widely used for this purpose, we emphasize here that it is not always a suitable approach. We thus propose a measure based on a matrix norm, the 2-norm, to adequately check which of the resulting ultrametric distance matrices related to the dendrograms is the closest to the initial dissimilarity matrix. In addition, we also propose an objective way to define a benchmark value (threshold value) in order to assess whether the degree of conformity between the ultrametric distance matrix selected and the initial dissimilarity matrix is satisfactory. Our proposal may notably be incorporated within a recently proposed approach that involves the use of clustering methods in environmental science and beyond. In ecology, various functional diversity indices based on clustering species from their functional dissimilarities may benefit from this overall approach.
This paper presents GelClust, a new software that is designed for processing gel electrophoresis images and generating the corresponding phylogenetic trees. Unlike the most of commercial and non-commercial related sof...
详细信息
This paper presents GelClust, a new software that is designed for processing gel electrophoresis images and generating the corresponding phylogenetic trees. Unlike the most of commercial and non-commercial related softwares, we found that GelClust is very user-friendly and guides the user from image toward dendrogram through seven simple steps. Furthermore, the software, which is implemented in C# programming language under Windows operating system, is more accurate than similar software regarding image processing and is the only software able to detect and correct gel 'smile' effects completely automatically. These claims are supported with experiments. (C) 2013 Elsevier Ireland Ltd. All rights reserved.
In this paper, we are concerned with the problem of flooding undirected weighted graphs under ceiling constraints. We provide a new algorithm based on a hierarchical structure called dendrogram, which offers the signi...
详细信息
In this paper, we are concerned with the problem of flooding undirected weighted graphs under ceiling constraints. We provide a new algorithm based on a hierarchical structure called dendrogram, which offers the significant advantage that it can be used for multiple flooding with various scenarios of the ceiling values. In addition, when exploring the graph through its dendrogram structure in order to calculate the flooding levels, independent sub-dendrograms are generated, thus offering a natural way for parallel processing. We provide an efficient implementation of our algorithm through suitable data structures and optimal organisation of the computations. Experimental results show that our algorithm outperforms well established classical algorithms, and reveal that the cost of building the dendrogram highly predominates over the total running time, thus validating both the efficiency and the hallmark of our method. Moreover, we exploit the potential parallelism exposed by the flooding procedure to design a multi-thread implementation. As the underlying parallelism is created on the fly, we use a queue to store the list of the sub-dendrograms to be explored, and then use a cyclic distribution to assign them to the participating threads. This yields a load balanced and scalable process as shown by additional benchmark results. Our program runs in few seconds on an ordinary computer to flood graphs with more that 20 millions of nodes.
Seriation is the ordering of the leaves of a dendrogram, such that leaves representing similar items are placed near each other according to some metric, within the constraints of the cluster tree. Such ordering great...
详细信息
A lot of image data has been digitized and preserved in computers. In order to search for an image, an efficient and accurate retrieval method is needed. This paper is concerned with shape retrieval which is one of th...
详细信息
A lot of image data has been digitized and preserved in computers. In order to search for an image, an efficient and accurate retrieval method is needed. This paper is concerned with shape retrieval which is one of the searching methods for finding similar images in a database. Shape, the outer form of a picture, is considered the most promising feature to identify entities in an image. The problem of shape retrieval takes a lot of time because an exhaustive search is mainly used in literatures. This paper suggests the use of a clustering method known as dendrogram for shape retrieval. In addition, we proposed the automatic decision of a threshold to determine a number of clusters in dendrogram. Through the experimental result, the proposed method proved fast retrieval preserving almost the same level accuracy.
Malware analysis is a vital and challenging task in the ever-changing cyber threat landscape. Traditional signature-based methods cannot keep up with the fast-paced evolution of malware variants. This underscores the ...
详细信息
Dendronic image analysis has been shown to provide a robust technique in the detection of tumours within digital mammograms. It provides the capability of fully automated image analysis through hierarchical segmentati...
详细信息
ISBN:
(纸本)0780377893
Dendronic image analysis has been shown to provide a robust technique in the detection of tumours within digital mammograms. It provides the capability of fully automated image analysis through hierarchical segmentation. However, its general acceptance in image analysis has not been realised due to computational intensity in creating the image dendrogram. We have developed an efficient technique that can create image dendrograms a great deal faster than traditional repetitive segmentation algorithms, making dendronic analysis of digital mammograms a viable tool in the detection of breast cancer.
Bone mineral density (BMD) measurements are the gold standard by which osteoporosis is diagnosed. Dual Energy X-ray Absorptiometry (DEXA) is the most commonly used measuring technique to assess BMD at both the hip and...
详细信息
ISBN:
(纸本)0769524974
Bone mineral density (BMD) measurements are the gold standard by which osteoporosis is diagnosed. Dual Energy X-ray Absorptiometry (DEXA) is the most commonly used measuring technique to assess BMD at both the hip and spine. There are three sites within each hip, collectively called the femoral sites, which are assessed-the femoral neck site, the trochanteric site, and Ward's triangle. It is a known fact that the different areas of the femur are composed of different percentages of cancellous bone. The null hypothesis that we examine in this paper is that there is no difference between the mineralization states at all of these sites. To do so, we introduce and present a dendrogram-based methodology that (1) identifies valid clusters for any two given subsets of the six sites, (2) uses the valid clusters to examine the existence of associations between the two subsets, and (3) the existence of association, if any, is measured by the degree of similarity between the two subsets. The obtained results in this research effort partially reject the null hypothesis.
Hierarchical clustering (HC for short) outputs a dendrogram that offers more topological information than flat clustering(e.g., k-means). However, the existing HC algorithms focus on either the quality of the dendrogr...
详细信息
ISBN:
(纸本)9783319444031;9783319444024
Hierarchical clustering (HC for short) outputs a dendrogram that offers more topological information than flat clustering(e.g., k-means). However, the existing HC algorithms focus on either the quality of the dendrogram or the ability of mining arbitrary shaped clusters. To address the above two aspects simultaneously, we present HICMEN by adopting (1) the classic agglomerative clustering framework that can generate a complete dendrogram, and (2) a novel similarity measure based on mutual k-nearest neighbors to capture the connectivity of data points and help properly merge up each arbitrary shaped cluster piece by piece. More importantly, we prove that the similarity measure has a nice property called weak monotonicity, which guarantees the quality of the dendrogram generated by HICMEN. Extensive experimental results show that HICMEN is capable of mining arbitrary shaped clusters effectively, and can simultaneously output a high quality dendrogram.
In this paper, we are concerned with the problem of flooding undirected weighted graphs un- der ceiling constraints. We provide a new algorithm based on a hierarchical structure called dendrogram , which offers the si...
详细信息
In this paper, we are concerned with the problem of flooding undirected weighted graphs un- der ceiling constraints. We provide a new algorithm based on a hierarchical structure called dendrogram , which offers the significant advantage that it can be used for multiple flooding with various scenarios of the ceiling values. In addition, when exploring the graph through its dendrogram structure in order to calculate the flooding levels, independent sub-dendrograms are generated, thus offering a natural way for parallel processing. We provide an efficient im- plementation of our algorithm through suitable data structures and optimal organisation of the computations. Experimental results show that our algorithm outperforms well established classical algorithms, and reveal that the cost of building the dendrogram highly predominates over the total running time, thus validating both the efficiency and the hallmark of our method. Moreover, we exploit the potential parallelism exposed by the flooding procedure to design a multi-thread implementation. As the underlying parallelism is created on the fly, we use a queue to store the list of the sub-dendrograms to be explored, and then use a cyclic distribution to assign them to the participating threads. This yields a load balanced and scalable process as shown by additional benchmark results. Our program runs in few seconds on an ordinary computer to flood graphs with more that 20 millions of nodes.
暂无评论