Identifying protein complexes from protein-protein interaction networks is one of the crucial tasks in computational biology. Traditional methods, along with their shortcomings in fully understanding protein complex c...
详细信息
In the evolving field of precision agriculture, accurate monitoring of vineyard health using advanced technologies is crucial for efficient resource management and addressing climate change challenges. Optimized disea...
详细信息
In the evolving field of precision agriculture, accurate monitoring of vineyard health using advanced technologies is crucial for efficient resource management and addressing climate change challenges. Optimized disease detection methods enhance efficiency, sustainability and economic viability, making non-destructive health assessment vital for modern agricultural practices. This study aims to differentiate grapevine varieties based on their spectral characteristics using multispectral imaging. Focusing on grapevine canopies within a vineyard in the Attica region of Greece, this research proposes a methodology for aerial multispectral images exploitation captured over two consecutive years, namely 2022 and 2023. Unlike typical vineyards with limited grape varieties, the study area included over 70 varieties, each with relatively small sample sizes. Classification algorithms were employed to separate vines from soil and shadows, with the Maximum Likelihood algorithm achieving 98.79% and 90.53% accuracy for the 2022 and 2023 images, respectively. Vegetation indices were applied to assess vine health, chlorophyll content and canopy density. Among seven indices, the Chlorophyll Vegetation Index (CVI) and Vegetation Ratio Index (RVI) were selected due to their low correlation. Six clustering algorithms were tested, with the Bisecting K-means algorithm proving the most effective, achieving a silhouette value of 0.41. Comparative analysis between the 2022 and 2023 clusters revealed that 34 vine varieties maintained stable health, 24 improved and 15 worsened. This study underscores the potential of multispectral imaging and clustering algorithms in vineyard management, offering insights to optimize cultivation practices based on spectral data.
The importance of dealing with big data is further increasing, as machine learning (ML) systems obtain useful knowledge from big datasets. However, using all data is practically prohibitive because of the massive size...
详细信息
The importance of dealing with big data is further increasing, as machine learning (ML) systems obtain useful knowledge from big datasets. However, using all data is practically prohibitive because of the massive sizes of the datasets, so summarizing them by centers obtained from k-center clustering is a promising approach. We have two concerns here. One is fairness, because if the summary does not have some specific groups, subsequent applications may provide unfair results for the groups. The other is the presence of outliers, and if outliers dominate the summary, it cannot be useful. To overcome these concerns, we address the problem of fair k-center clustering with outliers. Although prior works studied the fair k-center clustering problem, they do not consider outliers. This paper yields a linear time algorithm that satisfies the fairness constraint of our problem and probabilistically guarantees the almost 3-approximation bound. Its empirical efficiency and effectiveness are also reported.
Scholars in the machine learning community have recently focused on analyzing the fairness of learning models, including clustering algorithms. In this work we study fair clustering in a probabilistic (soft) setting, ...
详细信息
Scholars in the machine learning community have recently focused on analyzing the fairness of learning models, including clustering algorithms. In this work we study fair clustering in a probabilistic (soft) setting, where observations may belong to several clusters determined by probabilities. We introduce new probabilistic fairness metrics, which generalize and extend existing non-probabilistic fairness frameworks and propose an algorithm for obtaining a fair probabilistic cluster solution from a data representation known as a fairlet decomposition. Finally, we demonstrate our proposed fairness metrics and algorithm by constructing a fair Gaussian mixture model on three real-world datasets. We achieve this by identifying balanced micro-clusters which minimize the distances induced by the model, and on which traditional clustering can be performed while ensuring the fairness of the solution.
We implement AntClust, a clustering algorithm based on the chemical recognition system of ants and use it to cluster images of cars. We will give a short recap summary of the main working principles of the algorithm a...
详细信息
ISBN:
(纸本)9783031692567;9783031692574
We implement AntClust, a clustering algorithm based on the chemical recognition system of ants and use it to cluster images of cars. We will give a short recap summary of the main working principles of the algorithm as devised by the original paper [1]. Further, we will describe how to define a similarity function for images and how the implementation is used to cluster images of cars from the vehicle re-identification data set. We then test the clustering performance of AntClust against DBSCAN, HDBSCAN and OPTICS. Finally one of the core parts in AntClust, the rule set can be easily redefined with our implementation, enabling a way for other bio-inspired algorithms to find rules in an automated process. The implementation can be found on GitLab [9].
Ensemble clustering learns more accurate consensus results from a set of weak base clustering results. This technique is more challenging than other clustering algorithms due to the base clustering result set's ra...
详细信息
ISBN:
(纸本)1577358872
Ensemble clustering learns more accurate consensus results from a set of weak base clustering results. This technique is more challenging than other clustering algorithms due to the base clustering result set's randomness and the inaccessibility of data features. Existing ensemble clustering methods rely on the Co-association (CA) matrix quality but lack the capability to handle missing connections in base clustering. Inspired by the neighborhood high-order and topological similarity theories, this paper proposes a topological ensemble model based on high-order information. Specifically, this paper compensates for missing connections by mining neighborhood high-order connection information in the CA matrix and learning optimal connections with adaptive weights. Afterward, the learned excellent connections are embedded into topology learning to capture the topology of the base clustering. Finally, we incorporate adaptive high-order connection representation and topology learning into a unified learning framework. To the best of our knowledge, this is the first ensemble clustering work based on topological similarity and high-order connectivity relations. Extensive experiments on multiple datasets demonstrate the effectiveness of the proposed method. The source code of the proposed approach is available at https://***/ltyong/awec.
With the increase in scale and complexity of integrated circuits, the amount of test data volume has also grown. Due to the simple structure, low computational cost, and capability to handle large-scale data of cluste...
详细信息
ISBN:
(纸本)9798350388350;9798350388343
With the increase in scale and complexity of integrated circuits, the amount of test data volume has also grown. Due to the simple structure, low computational cost, and capability to handle large-scale data of clustering algorithms, they offer a new solution for processing test data of integrated circuits. This paper defines two characteristics based on the features of test data: frequency_rate and count_of_ones, and incorporates them into the K-MEANS, DBSCAN, and OPTICS algorithms, respectively, to achieve segmentation of circuit test data with the silhouette coefficient as the evaluation criterion. Based on extensive data experiments, the results indicate that the OPTICS algorithm achieves the best clustering effect and is suitable for subsequent data processing.
As one of the most popular machine learning tools in the field of unsupervised learning, clustering has been widely used in various practical applications. While numerous methods have been proposed for clustering, a c...
详细信息
ISBN:
(纸本)1577358872
As one of the most popular machine learning tools in the field of unsupervised learning, clustering has been widely used in various practical applications. While numerous methods have been proposed for clustering, a commonly encountered issue is that the existing clustering methods rely heavily on local neighborhood information during the optimization process, which leads to suboptimal performance on real-world datasets. Besides, most existing clustering methods use Euclidean distances or densities to measure the similarity between data points. This could constrain the effectiveness of the algorithms for handling datasets with irregular patterns. Thus, a key challenge is how to effectively capture the global structural information in clustering instances to improve the clustering quality. In this paper, we propose a new clustering algorithm, called SEC. This algorithm uses the global structural information extracted from an encoding tree to guide the clustering optimization process. Based on the relation between data points in the instance, a sparse graph of the clustering instance can be constructed. By leveraging the sparse graph constructed, we propose an iterative encoding tree method, where hierarchical abstractions of the encoding tree are iteratively extracted as new clustering features to obtain better clustering results. To avoid the influence of easily misclustered data points located on the boundaries of the clustering partitions, which we call "fringe points", we propose an iterative pre-deletion and reassignment technique such that the algorithm can delete and reassign the "fringe points" to obtain more resilient and precise clustering results. Empirical experiments on both synthetic and real-world datasets demonstrate that our proposed algorithm outperforms state-of-the-art clustering methods and achieves better clustering performances. On average, the clustering accuracy (ACC) is increased by 1.7% and the normalized mutual information (NMI) by 7.9% co
Motivated by recent work in computational social choice, we extend the metric distortion framework to clustering problems. Given a set of n agents located in an underlying metric space, our goal is to partition them i...
详细信息
ISBN:
(纸本)1577358872
Motivated by recent work in computational social choice, we extend the metric distortion framework to clustering problems. Given a set of n agents located in an underlying metric space, our goal is to partition them into k clusters, optimizing some social cost objective. The metric space is defined by a distance function d between the agent locations. Information about d is available only implicitly via n rankings, through which each agent ranks all other agents in terms of their distance from her. Still, even though no cardinal information (i.e., the exact distance values) is available, we would like to evaluate clustering algorithms in terms of social cost objectives that are defined using d. This is done using the notion of distortion, which measures how far from optimality a clustering can be, taking into account all underlying metrics that are consistent with the ordinal information available. Unfortunately, the most important clustering objectives (e.g., those used in the well-known k-median and k-center problems) do not admit algorithms with finite distortion. To sidestep this disappointing fact, we follow two alternative approaches: We first explore whether resource augmentation can be beneficial. We consider algorithms that use more than k clusters but compare their social cost to that of the optimal k-clusterings. We show that using exponentially (in terms of k) many clusters, we can get low (constant or logarithmic) distortion for the k-center and k-median objectives. Interestingly, such an exponential blowup is shown to be necessary. More importantly, we explore whether limited cardinal information can be used to obtain better results. Somewhat surprisingly, for k-median and k-center, we show that a number of queries that is polynomial in k and only logarithmic in n (i.e., only sublinear in the number of agents for the most relevant scenarios in practice) is enough to get constant distortion.
Wireless sensor networks (WSNs) play an important role in the Internet of Things (IoT). These are networks of sensor nodes that are clustered to collect and exchange locally sensed data. In each cluster, a cluster hea...
详细信息
ISBN:
(纸本)9798350384826;9798350384819
Wireless sensor networks (WSNs) play an important role in the Internet of Things (IoT). These are networks of sensor nodes that are clustered to collect and exchange locally sensed data. In each cluster, a cluster head (CH) gathers data from its cluster members, aggregates it, and sends it to the sink node. To serve IoT applications, the sink node then shares this data with the other CHs. Nevertheless, clustering a massive collection of sensor nodes is challenging. This is because these sensor nodes have limited energy resources and are distributed over a vast area. Recently, High Altitude Platform Stations (HAPS) have been shown to improve the connectivity in WSNs by serving as non-terrestrial sink nodes. The quasi-stationary nature of these non-terrestrial platforms can offer vast geographical coverage, and thus, improve WSNs' transmission reliability. This paper proposes HAPS-greedy clustering (HAPS-GC) of WSNs to support massive IoT applications. In contrast to existing HAPS-based WSN clustering schemes, HAPS-GC considers not only the connectivity between sensor nodes within each cluster but also their connectivity with HAPS. Simulation results show that the proposed HAPS-GC approach can significantly increase the WSN throughput while maintaining WSN energy consumption similar to the existing HAPS-based WSN clustering schemes and a scenario, where a terrestrial sink node is used.
暂无评论