The quality of marine statistical data is the life of marine statistical work, and it is the key to reflecting the level of marine statistical work, which is related to the credibility of marine statistical department...
详细信息
The quality of marine statistical data is the life of marine statistical work, and it is the key to reflecting the level of marine statistical work, which is related to the credibility of marine statistical departments. Therefore, in marine statistics work, we need to establish perfect marine statistical data quality evaluation criteria. Only by meeting the basic standards will we gradually establish a comprehensive and systematic marine statistical data quality monitoring system, which will guarantee the authority of the marine statistical data. This paper further improves the quality of marine statistical data in China by using a clustering algorithm.
Many pattern recognition computer programs use one of the clustering algorithm techniques. Often these algorithms use a Euclidean distance metric as a similarity measure. A scheme is proposed where both the Euclidean ...
详细信息
Many pattern recognition computer programs use one of the clustering algorithm techniques. Often these algorithms use a Euclidean distance metric as a similarity measure. A scheme is proposed where both the Euclidean metric and a more simple city-block metric are utilized together to reduce overall classification time. The relation between the Euclidean and city-block distances is introduced as a scalar function. The bounds of the function are given and used to decide whether classification of each pattern vector is to be achieved by the computationally slow Euclidean distance or the faster city-block distance. The criteria is that the classification should be identical to the original Euclidean only scheme.
The 21st century is a time of information, data, and knowledge. Information technology is changing every aspect of human society. From the perspective of machine learning, clustering, an important branch of data minin...
详细信息
In the recent benchmarking article entitled "Comparison and Evaluation of clustering algorithms for Tandem Mass Spectra", Rieder et al. compared several different approaches to cluster MS/MS spectra. While w...
详细信息
In the recent benchmarking article entitled "Comparison and Evaluation of clustering algorithms for Tandem Mass Spectra", Rieder et al. compared several different approaches to cluster MS/MS spectra. While we certainly recognize the value of the manuscript, here, we report some shortcomings detected in the original analyses. For most analyses, the authors clustered only single MS/MS runs. In one of the reported analyses, three MS/MS runs were processed together, which already led to computational performance issues in many of the tested approaches. This fact highlights the difficulties of using many of the tested algorithms on the nowadays produced average proteomics data sets. Second, the authors only processed identified spectra when merging MS runs. Thereby, all unidentified spectra that are of lower quality were already removed from the data set and could not influence the clustering results. Next, we found that the authors did not analyze the effect of chimeric spectra on the clustering results. In our analysis, we found that 3% of the spectra in the used data sets were chimeric, and this had marked effects on the behavior of the different clustering algorithms tested. Finally, the authors' choice to evaluate the MS-Cluster and spectra-cluster algorithms using a precursor tolerance of 5 Da for high-resolution Orbitrap data only was, in our opinion, not adequate to assess the performance of MS/MS clustering approaches.
High-dimensional data is interpreted with a considerable number of features, and new problems are presented in groups. The so-called "high dimension" is initially created to explain the common increase in ti...
详细信息
The modern period of development of society is characterized by a significant impact on its information technology in all spheres of life. Marketing is no exception. Among the great competition, it is so difficult to ...
详细信息
The modern period of development of society is characterized by a significant impact on its information technology in all spheres of life. Marketing is no exception. Among the great competition, it is so difficult to interest a potential buyer with your product. The solution to this problem is data clustering. The aim of the work is to investigate two popular clustering algorithms DBSCAN K-means, to analyze the dataset of customer data. The experiments confirmed the efficiency of the proposed methods for data clustering. It has been investigated that K-means for analyzing customer data. After all, the data are non-spherical in shape and have different densities, contain noise. However, the DBSCAN algorithm is excellent with such data.
The synergy between Artificial Intelligence and the Edge Computing paradigm promises to transfer decision-making processes to the periphery of sensor networks without the involvement of central data servers. For this ...
详细信息
The synergy between Artificial Intelligence and the Edge Computing paradigm promises to transfer decision-making processes to the periphery of sensor networks without the involvement of central data servers. For this reason, we recently witnessed an impetuous development of devices that integrate sensors and computing resources in a single board to process data directly on the collection place. Due to the particular context where they are used, the main feature of these boards is the reduced energy consumption, even if they do not exhibit absolute computing powers comparable to modern high-end CPUs. Among the most popular Artificial Intelligence techniques, clustering algorithms are practical tools for discovering correlations or affinities within data collected in large datasets, but a parallel implementation is an essential requirement because of their high computational cost. Therefore, in the present work, we investigate how to implement clustering algorithms on parallel and low-energy devices for edge computing environments. In particular, we present the experiments related to two devices with different features: the quad-core UDOO X86 Advanced+ board and the GPU-based NVIDIA Jetson Nano board, evaluating them from the performance and the energy consumption points of view. The experiments show that they realize a more favorable trade-off between these two requirements than other high-end computing devices.
Background: Artificial neural networks (ANNs) have been shown to be valuable in the analysis of analytical flow cytometric (AFC) data in aquatic ecology. Automated extraction of clusters is an important first stage in...
详细信息
Background: Artificial neural networks (ANNs) have been shown to be valuable in the analysis of analytical flow cytometric (AFC) data in aquatic ecology. Automated extraction of clusters is an important first stage in deriving ANN training data from field samples, but AFC data pose a number of challenges for many types of clustering algorithm. The fuzzy k-means algorithm recently has been extended to address nonspherical clusters with the use of scatter matrices. Four variants were proposed, each optimizing a different measure of clustering "goodness". Methods: With AFC data obtained from marine phytoplankton species in culture, the four fuzzy k-means algorithm variants were compared with each other and with another multivariate clustering algorithm based on critical distances currently used in flow cytometry. Results: One of the algorithm variants (adaptive distances, also known as the Gustafson-Kessel algorithm) was found to be robust and reliable, whereas the others showed various problems. Conclusions: The adaptive distances algorithm was superior in use to the clustering algorithms against which it was tested, but the problem of automatic determination of the number of clusters remains to be addressed. (C) 2001 Wiley-Liss, Inc.
Background: A wealth of clustering algorithms has been applied to gene co-expression experiments. These algorithms cover a broad range of approaches, from conventional techniques such as k-means and hierarchical clust...
详细信息
ISBN:
(纸本)9783642212598
Background: A wealth of clustering algorithms has been applied to gene co-expression experiments. These algorithms cover a broad range of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative effectiveness provides guidance to algorithm selection, development and implementation. Most prior work on comparative clustering evaluation has focused on parametric methods. Graph theoretical methods are recent additions to the tool set for the global analysis and decomposition of microarray co-expression matrices that have not generally been included in earlier methodological comparisons. In the present study, a variety of parametric and graph theoretical clustering algorithms are compared using well-characterized transcriptomic data at a genome scale from Saccharomyces cerevisiae. Methods: For each clustering method under study, a variety of parameters were tested. Jaccard similarity was used to measure each cluster's agreement with every GO and KEGG annotation set, and the highest Jaccard score was assigned to the cluster. Clusters were grouped into small, medium, and large bins, and the Jaccard score of the top five scoring clusters in each bin were averaged and reported as the best average top 5 (BAT5) score for the particular method. Results: Clusters produced by each method were evaluated based upon the positive match to known pathways. This produces a readily interpretable ranking of the relative effectiveness of clustering on the genes. Methods were also tested to determine whether they were able to identify clusters consistent with those identified by other clustering methods. Conclusions: Validation of clusters against known gene classifications demonstrate that for this data, graph-based techniques outperform conventional clustering approaches, suggesting that furth
One of the fundamental challenges of clustering is how to evaluate, without auxiliary information, to what extent the obtained clusters fit the natural partitions of the data set. A common approach for evaluation of c...
详细信息
ISBN:
(纸本)9781424413799
One of the fundamental challenges of clustering is how to evaluate, without auxiliary information, to what extent the obtained clusters fit the natural partitions of the data set. A common approach for evaluation of clustering results is to use validity indices. We propose a new validity index, Conn-Index, for prototype based clustering. Conn-Index is applicable to data sets with a wide variety of cluster characteristics (different shapes, sizes, densities, overlaps). We construct Conn-Index based on inter- and intra-cluster connectivities of prototypes, which are found through a weighted Delaunay triangulation called "connectivity matrix" [1], where the weights indicate the data distribution. We compare the performance of Conn-Index to commonly used indices on synthetic and real data sets.
暂无评论