Anchor based incomplete multiview clustering has grasped growing interest recently because of its great success in effectively partitioning multimodal data. However, due to the absence of label information, the constr...
详细信息
Anchor based incomplete multiview clustering has grasped growing interest recently because of its great success in effectively partitioning multimodal data. However, due to the absence of label information, the constructed anchors could be mismatched. Such an Anchor Mismatching Problem (AMP) will cause the structure of generated bipartite graph to be chaotic, degrading the clustering performance. To tackle this issue, we design an algorithm termed Constructing Corresponding Anchors for Incomplete Multiview clustering (CCA-IMC). Specifically, we first devise a permutation strategy to transform anchors on each view. Subsequently, we directly generate the consensus bipartite graph, which is shared for all incomplete views, by the transformed anchors rather than by fusing each view-specific bipartite graph. Afterwards, all anchors and permutation matrices as well as the consensus bipartite graph are jointly optimized in one common framework so as to promote each other. In such ways, anchors are rearranged towards correct matching relationship according to the consensus graph structure. In addition to these, our CCA-IMC has also been proven to be with linear time and memory overheads, which makes it able to scale up to work with large-scale tasks. Massive experiments implemented on ten popular datasets give evidence of our superiorities compared to current strong IMC competitors.
Controlling the impact of partial supervision on the outcomes of modeling is of uttermost importance in semi-supervised fuzzy clustering. Semi-Supervised Fuzzy C-Means (SSFCMeans), a specific model we consider, uses a...
详细信息
Controlling the impact of partial supervision on the outcomes of modeling is of uttermost importance in semi-supervised fuzzy clustering. Semi-Supervised Fuzzy C-Means (SSFCMeans), a specific model we consider, uses a single hyperparameter called a scaling factor alpha to weigh the impact of partially labeled data. This concept became widespread and was reused directly in many works building on SSFCMeans, or even applied to other fuzzy clustering algorithms, such as Possibilistic C-Means. However, none of the works challenged the original interpretation of alpha, which suggests that the impact of partial supervision is directly proportional to the scaling factor. We fill the above-mentioned research gap and thoroughly analyze this relationship. We provide novel explanations of the scaling factor alpha in terms of the key element of fuzzy clustering-the membership values. We prove that the impact of partial supervision is a nonlinear function of alpha. Our approach is rooted in the explainability framework, which distinguishes interpretation from an explanation and treats the latter as superior. Explaining the scaling factor leads to an explainable impact of partial supervision and enables greater control of it. Finally, built on the novel explanations, we propose a unified, analytically justified framework for selecting the value of the hyperparameter alpha that is based on the cross-validation approach. We illustrate that the proposed framework enables an extensive analysis of the impact of partial supervision in SSFCMeans with a simulation experiment.
Currently, supervised person re-identification (Re-ID) models trained on labeled datasets can achieve high recognition performance in the same data domain. However, accuracy drops dramatically when these models are di...
详细信息
Currently, supervised person re-identification (Re-ID) models trained on labeled datasets can achieve high recognition performance in the same data domain. However, accuracy drops dramatically when these models are directly applied to other unlabeled datasets or natural environments, due to a significant sample distribution gap between the two domains. Unsupervised Domain Adaptation (UDA) methods can solve this problem by fine-tuning the model on the target dataset with pseudo-labels generated by the clustering method. Yet, these methods are primarily aimed at the image-based person Re-ID domain. This is because the background noise and interference information are complex and changeable in the video scenarios, resulting in large intra-class distances and small inter-class spaces, which easily lead to noisy labels. Huge domain gap and noisy labels hinder clustering and training processes heavily in the video-based person Re-ID. To address the problem, we propose a novel UDA method via Dynamic clustering and Co-segment Attentive Learning (DCCAL) for it. DCCAL includes a Dynamic clustering (DC) module and a Co-segment Attentive Learning (CAL) module. The DC module is responsible for adaptively clustering pedestrians within different generation processes to alleviate noisy labels. On the other hand, the CAL module reduces the domain gap using a co-segmentation-based attention mechanism. Additionally, we introduce Kullback-Leibler (KL) divergence loss to reduce the distribution of features between two domains for better performance. Experimental results on two large-scale video-based person Re-ID datasets, MARS and DukeMTMC-VideoReID (DukeV), demonstrate exceptional precision performance. Our method outperforms state-of-the-art semi-supervised and unsupervised approaches by 1.1% in Rank-1 and 1.5% in mAP on DukeV, as well as 3.1% and 2.1% in Rank-1 and mAP on MARS, respectively.
Hierarchical clustering is able to provide partitions of different granularity levels. However, most existing hierarchical clustering techniques perform clustering in the original feature space of the data, which may ...
详细信息
Hierarchical clustering is able to provide partitions of different granularity levels. However, most existing hierarchical clustering techniques perform clustering in the original feature space of the data, which may suffer from overlap, sparseness, or other undesirable characteristics, resulting in noncompetitive performance. In the field of deep clustering, learning representations using pseudo labels has recently become a research hotspot. Yet most existing approaches employ coarse-grained pseudo labels, which may contain noise or incorrect labels. Hence, the learned feature space does not produce a competitive model. In this paper, we introduce the idea of fine-grained labels of supervised learning into unsupervised clustering, giving rise to the enhanced adjacency-constrained hierarchical clustering (ECHC) model. The full framework comprises four steps. One, adjacency-constrained hierarchical clustering (CHC) is used to produce relatively pure fine-grained pseudo labels. Two, those fine-grained pseudo labels are used to train a shallow multilayer perceptron to generate good representations. Three, the corresponding representation of each sample in the learned space is used to construct a similarity matrix. Four, CHC is used to generate the final partition based on the similarity matrix. The experimental results show that the proposed ECHC framework not only outperforms 14 shallow clustering methods on eight real-world datasets but also surpasses current state-of-the-art deep clustering models on six real-world datasets. In addition, on five real-world datasets, ECHC achieves comparable results to supervised algorithms.
The possibilistic c-means (PCM) clustering is an important unsupervised pattern recognition method. However, it is still faced with huge challenges in clustering multidimensional data with multiple characteristics, su...
详细信息
The possibilistic c-means (PCM) clustering is an important unsupervised pattern recognition method. However, it is still faced with huge challenges in clustering multidimensional data with multiple characteristics, such as imbalanced sample sizes, imbalanced feature components, noise and outlier corruption, and the sparse distribution of small targets in the feature space caused by the "curse of dimensionality." In view of this, this article proposes a PCM clustering algorithm based on the Mahalanobis-Kernel Distance and the suppressed competitive learning strategy. To begin with, the Mahalanobis-Kernel Distance combined with the absolute attribute of possibilistic memberships is proposed to enhance the intra-class compactness of small targets with sparse distribution and feature imbalance. In addition, to overcome the inherent coincident clustering problem caused by possibilistic memberships, the "suppressed competitive learning" mechanism based on the Mahalanobis-Kernel distance is designed to generate cluster cores and correct memberships of objects located within the cluster cores, thus guiding purposefully the clustering process. Furthermore, spatial information is introduced by the membership filtering scheme to improve the segmentation effect of color images with small targets and noise injection. Experimental results show that the algorithm in this article can achieve better clustering and segmentation performance than several state-of-the-art fuzzy clustering methods for color images with imbalanced sizes and features and noise injection.
Multi-view clustering (MVC), which effectively fuses information from multiple views for better performance, has received increasing attention. Most existing MVC methods assume that multi-view data are fully paired, w...
详细信息
Multi-view clustering (MVC), which effectively fuses information from multiple views for better performance, has received increasing attention. Most existing MVC methods assume that multi-view data are fully paired, which means that the mappings of all corresponding samples between views are predefined or given in advance. However, the data correspondence is often incomplete in real-world applications due to data corruption or sensor differences, referred to as the data-unpaired problem (DUP) in multi-view literature. Although several attempts have been made to address the DUP issue, they suffer from the following drawbacks: 1) most methods focus on the feature representation while ignoring the structural information of multi-view data, which is essential for clustering tasks;2) existing methods for partially unpaired problems rely on pregiven cross-view alignment information, resulting in their inability to handle fully unpaired problems;and 3) their inevitable parameters degrade the efficiency and applicability of the models. To tackle these issues, we propose a novel parameter-free graph clustering framework termed unpaired multi-view graph clustering framework with cross-view structure matching (UPMGC-SM). Specifically, unlike the existing methods, UPMGC-SM effectively utilizes the structural information from each view to refine cross-view correspondences. Besides, our UPMGC-SM is a unified framework for both the fully and partially unpaired multi-view graph clustering. Moreover, existing graph clustering methods can adopt our UPMGC-SM to enhance their ability for unpaired scenarios. Extensive experiments demonstrate the effectiveness and generalization of our proposed framework for both paired and unpaired datasets.
Energy Harvesting Wireless Sensor Networks (EH-WSNs) main goal is to increase efficiency in settings where Energy Harvesting (EH) is restricted by environmental resources. To solve the drawbacks of conventional Wirele...
详细信息
Energy Harvesting Wireless Sensor Networks (EH-WSNs) main goal is to increase efficiency in settings where Energy Harvesting (EH) is restricted by environmental resources. To solve the drawbacks of conventional Wireless Sensor Networks (WSNs) routing methods that usually ignore EH, this work presents a multi-hop clustering and renewable energy-based routing protocol designed for EH-WSNs. The suggested method performs clustering both centralized and decentralized using energy circumstances and the quantity of captured energy. The protocol operates in three phases: cluster formation, data transmission, and centralized management. To evaluate the effectiveness of the proposed approach, we analyze three distinct scenarios with different settings. The findings show that the suggested approach greatly lowers the total network energy usage while allowing a higher number of nodes to stay operational. We find that our approach outperforms AEHAC, CRBS, HUCL, and EADUC in terms of average energy levels, overall efficiency, network stability, and number of live nodes during the simulation. The results taken together show that the suggested method continuously improves network efficiency and stability in all assessed situations.
In real scenarios, graph-based multiview clustering has clearly shown popularity owing to the high efficiency in fusing the information from multiple views. Practically, the multiview graphs offer both consistent and ...
详细信息
In real scenarios, graph-based multiview clustering has clearly shown popularity owing to the high efficiency in fusing the information from multiple views. Practically, the multiview graphs offer both consistent and inconsistent cues as they usually come from heterogeneous sources. Previous methods illustrated the importance of leveraging the multiview consistency and inconsistency for accurate modeling. However, when fusing the graphs, the inconsistent parts are generally ignored and hence the valued view-specific attributes are lost. To solve this problem, we propose an accurate complementarity learning (ACL) model for graph-based multiview clustering. ACL clearly distinguishes the consistent, complementary, and noise and corruption terms from the initial multiview graphs. In contrast to existing models that overlooked the complementary information, we argue that the view-specific characteristics extracted from the complementary terms are beneficial for affinity learning. In addition, ACL exploits only the positive parts of the complementary information for preserving the evidence on the positive sample relationship, and ignores the negative cues to avoid the vanishing of effective affinity strengths. This way, the learned affinity matrix is able to properly balance the consistent and complementary information. To solve the ACL model, we introduce an efficient alternating optimization algorithm with a varying penalty parameter. Experiments on synthetic and real-world databases clearly demonstrated the superiority of ACL.
In clustering-based speaker diarization systems, the embedding clusters for distinctive speakers exhibit wide variability in size and density, posing difficulty for clustering accuracy. In spite of this, with the assi...
详细信息
In clustering-based speaker diarization systems, the embedding clusters for distinctive speakers exhibit wide variability in size and density, posing difficulty for clustering accuracy. In spite of this, with the assistance of the overall distance relationships among speaker embeddings, most of the embeddings can be grouped to the correct cluster by sophisticated offline clustering algorithms. However, in online scenarios, such a complete distance relationships of the embeddings can not be obtained due to the incremental arrival of embeddings. Consequently, determining the number of clusters and then correctly grouping the embeddings become challenging in an online fashion. Furthermore, errors would accumulate quickly over time if the online clustering algorithm assigns the embeddings into clusters erroneously in the beginning. To address these problems, we designed a novel framework for online clustering. To reduce the high variability of speaker embeddings, we proposed the clustering guided embedding extractor training (CGEET) algorithm to encourage similarity between the size of the embedding space for different speakers in attempt to simplify the distance relationships of embeddings. The CGEET algorithm can grasp the distance information of the entire speaker embedding space and provide it to the online clustering algorithm. With this preliminary information, the distance thresholds guided online clustering (DTGOC) algorithm then processes incoming embeddings using a divide-and-conquer approach. It first handles the embeddings with explicit distance relationships and then searches for possible path combination they have with remaining embeddings in an online fashion. Moreover, in order to utilize the distance relationships of embeddings that are far apart in time, an online re-clustering strategy is incorporated in our DTGOC algorithm, which can alleviate error accumulation during online clustering. By implementing the above innovations, our proposed online clus
Recently, deep clustering networks, which able to learn latent embedding and clustering assignment simultaneously, attract lots of attention. Among the deep clustering networks, the suitable regularization term is not...
详细信息
Recently, deep clustering networks, which able to learn latent embedding and clustering assignment simultaneously, attract lots of attention. Among the deep clustering networks, the suitable regularization term is not only beneficial to training of neural network, but also enhancing clustering performance. In the paper, we propose a deep fuzzy clustering network with mixed matrix norm regularization (DFCNR). Specifically, DFCNR uses the weighted intra-class variance as clustering loss, l(1,2) norm and the Frobenius norm of soft assignment matrix as regularization term, where the minimization of l(1,2) norm aims to achieve balanced assignment, and maximization of Frobenius norm aims to achieve discriminative assignment. Moreover, by solving the quadratic convex constraint optimization problem about soft assignment, we derive the activation function of clustering layer. Extensive experiments conducted on several datasets illustrate the superiority of the proposed approach in comparison with current methods.
暂无评论