Massive electrical load exhibits many patterns making it difficult for forecast algorithms to generalise well. Most learning algorithms produce a better forecast for dominant patterns in the case of weekday consumptio...
详细信息
Massive electrical load exhibits many patterns making it difficult for forecast algorithms to generalise well. Most learning algorithms produce a better forecast for dominant patterns in the case of weekday consumption and otherwise for less dominant patterns in weekend and holiday consumption. In view of this, there is the need to cluster the load patterns, so learning algorithms can focus on the patterns independently to produce forecasts with better accuracy for all cases. However, clustering time-series data breaks the time-series dependency, making model training difficult. This paper presents a novel sequence-to-sequence cluster framework to reform time-series dependency after clustering;this enables independent clusters to be modelled using Convolutional Neural Network-Gated Recurrent Unit, which learns spatiotemporal features for future forecasts. A real-world dataset by the Korea Power Exchange composed of nationwide consumption is used for case studies and experiments. Experimental results verify that the proposed study effectively improves the accuracy of electric load forecasting by about 50%, with a WAPE of 0.67%. The proposed method also speeds up the training process of the forecast algorithm by about 35%, given that only a subset of the dataset is trained due to clustering. Korea Water Resources Corporation has implemented the proposed method for load forecasting and system marginal price estimation.
Graph clustering is one of the most significant, challenging, and valuable topic in the analysis of real complex networks. To detect the cluster configuration accurately and efficiently, we propose a new Markov cluste...
详细信息
Graph clustering is one of the most significant, challenging, and valuable topic in the analysis of real complex networks. To detect the cluster configuration accurately and efficiently, we propose a new Markov clustering algorithm based on the limit state of the belief dynamics model. First, we present a new belief dynamics model, which focuses beliefs of multicontent and randomly broadcasting information. A strict proof is provided for the convergence of nodes' normalized beliefs in complex networks. Second, we introduce a new Markov clustering algorithm (denoted as BMCL) by employing a belief dynamics model, which guarantees the ideal cluster configuration. Following the trajectory of the belief convergence, each node is mapped into the corresponding cluster repeatedly. The proposed BMCL algorithm is highly efficient: the convergence speed of the proposed algorithm researches O(TN) in sparse networks. Last, we implement several experiments to evaluate the performance of the proposed methods.
The Multi-Depot Capacitated Arc Routing Problem (MDCARP) is an important combinatorial optimization problem with wide applications in logistics. Large Scale MDCARP (LSMDCARP) often occurs in the real world, as the pro...
详细信息
The Multi-Depot Capacitated Arc Routing Problem (MDCARP) is an important combinatorial optimization problem with wide applications in logistics. Large Scale MDCARP (LSMDCARP) often occurs in the real world, as the problem size (e.g., number of edges/tasks) is usually very large in practice. It is challenging to solve LSMDCARP due to the large search space and complex interactions among the depots and the tasks. Divide-and-conquer strategies have shown success in solving large-scale problems by decomposing the problem into smaller sub-problems to be solved separately. However, it is challenging to find accurate decomposition for LSMDCARP. To address this issue and alleviate the negative effect of inaccurate problem decomposition, this article proposes a new divide-and-conquer strategy for solving LSMDCARP, which introduces a new restricted global optimization stage within the typical dynamic decomposition procedure. Based on the new divide-and-conquer strategy, this article develops a problem-specific Task Moving among Sub-problems (TMaS) process for the global optimization stage and incorporates it into the state-of-the-art RoCaSH algorithm for LSMDCARP. The resultant algorithm, namely, RoCaSH2, was compared with the state-of-the-art algorithms on a wide range of LSMDCARP instances, and the results showed that RoCaSH2 can achieve significantly better results than the state-of-the-art algorithms within a much shorter time.
Multiple kernel clustering (MKC) has recently achieved remarkable progress in fusing multisource information to boost the clustering performance. However, the O( n(2)) memory consumption and O( n(3) ) computational co...
详细信息
Multiple kernel clustering (MKC) has recently achieved remarkable progress in fusing multisource information to boost the clustering performance. However, the O( n(2)) memory consumption and O( n(3) ) computational complexity prohibit these methods from being applied into median- or large-scale applications, where n denotes the number of samples. To address these issues, we carefully redesign the formulation of subspace segmentation-based MKC, which reduces the memory and computational complexity to O( n) and O( n(2) ), respectively. The proposed algorithm adopts a novel sampling strategy to enhance the performance and accelerate the speed of MKC. Specifically, we first mathematically model the sampling process and then learn it simultaneously during the procedure of information fusion. By this way, the generated anchor point set can better serve data reconstruction across different views, leading to improved discriminative capability of the reconstruction matrix and boosted clustering performance. Although the integrated sampling process makes the proposed algorithm less efficient than the linear complexity algorithms, the elaborate formulation makes our algorithm straightforward for parallelization. Through the acceleration of GPU and multicore techniques, our algorithm achieves superior performance against the compared state-of-the-art methods on six datasets with comparable time cost to the linear complexity algorithms.
Efficiently identifying the most important communities and key transition nodes in weighted and unweighted networks is a prevalent problem in a wide range of disciplines. Here, we focus on the optimal clustering using...
详细信息
Efficiently identifying the most important communities and key transition nodes in weighted and unweighted networks is a prevalent problem in a wide range of disciplines. Here, we focus on the optimal clustering using variational kinetic parameters, linked to Markov processes defined on the underlying networks, namely, the slowest relaxation time and the Kemeny constant. We derive novel relations in terms of mean first passage times for optimizing clustering via the Kemeny constant and show that the optimal clustering boundaries have equal round-trip times to the clusters they separate. We also propose an efficient method that first projects the network nodes onto a 1D reaction coordinate and subsequently performs a variational boundary search using a parallel tempering algorithm, where the variational kinetic parameters act as an energy function to be extremized. We find that maximization of the Kemeny constant is effective in detecting communities, while the slowest relaxation time allows for detection of transition nodes. We demonstrate the validity of our method on several test systems, including synthetic networks generated from the stochastic block model and real world networks (Santa Fe Institute collaboration network, a network of co-purchased political books, and a street network of multiple cities in Luxembourg). Our approach is compared with existing clustering algorithms based on modularity and the robust Perron cluster analysis, and the identified transition nodes are compared with different notions of node centrality.
Advances in single-cell biotechnologies have generated the single-cell RNA sequencing (scRNA-seq) of gene expression profiles at cell levels, providing an opportunity to study cellular distribution. Although significa...
详细信息
Advances in single-cell biotechnologies have generated the single-cell RNA sequencing (scRNA-seq) of gene expression profiles at cell levels, providing an opportunity to study cellular distribution. Although significant efforts developed in their analysis, many problems remain in studying cell types distribution because of the heterogeneity, high dimensionality, and noise of scRNA-seq. In this study, a multi-view clustering with graph learning algorithm (MCGL) for scRNA-seq data is proposed, which consists of multi-view learning, graph learning, and cell type clustering. In order to avoid a single feature space of scRNA-seq being inadequate to comprehensively characterize the functions of cells, MCGL constructs the multiple feature spaces and utilizes multi-view learning to comprehensively characterize scRNA-seq data from different perspectives. MCGL adaptively learns the similarity graphs of cells that overcome the dependence on fixed similarity, transforming scRNA-seq analysis into the analysis of multi-view clustering. MCGL decomposes the networks of cells into view-specific and common networks in multi-view learning, which better characterizes the topological relationship of cells. MCGL simultaneously utilizes multiple types of cell-cell networks and fully exploits the connection relationship between cells through the complementarity between networks to improve clustering performance. The graph learning, graph factorization, and cell-type clustering processes are accomplished simultaneously under one optimization framework. The performance of the MCGL algorithm is validated with ten scRNA-seq datasets from different scales, and experimental results imply that the proposed algorithm significantly outperforms fourteen state-of-the-art scRNA-seq algorithms.
To protect data privacy, users prefer to store encrypted data in cloud servers. Cloud servers reduce the cost of storage and network bandwidth by eliminating duplicate copies. To address the potential internal data le...
详细信息
To protect data privacy, users prefer to store encrypted data in cloud servers. Cloud servers reduce the cost of storage and network bandwidth by eliminating duplicate copies. To address the potential internal data leakage problem, the concept of clustering deviation is proposed for the first time. We improve the DBSCAN algorithm to tolerate clustering deviation. A data deduplication scheme is built upon the new algorithm, which considers users as clustering samples. Instead of immediately re-clustering new users, a certain deviation is tolerated to assign the users to the existing classes. We determine the popularity of the data according to user clustering results and apply different encryption schemes to protect the security of unpopular data more effectively. The performance of the algorithm is analyzed and compared with other methods through experiments, and the results verify the feasibility and efficiency of the proposed deduplication scheme.
Discovering clusters of different sizes, shapes, and densities is a challenging duty. DBSCAN can find clusters of different shapes and sizes. But it has trouble finding clusters of different densities because it depen...
详细信息
Discovering clusters of different sizes, shapes, and densities is a challenging duty. DBSCAN can find clusters of different shapes and sizes. But it has trouble finding clusters of different densities because it depends on a global value for its parameter Eps. Several methods have been proposed to tackle this problem, each method has its drawbacks. This paper introduces a new stand-alone method to discover clusters of different densities. The proposed method depends on the k-nearest neighbors to compute the local density of each object as the sum of distances to its k1-nearest neighbors, where 0 < k1 < k, it starts from any object. This object is called a cluster initiator. Any object that is reachable from a cluster initiator and has a local density similar to the local density of the cluster initiator is assigned the same cluster. So, the method requires a threshold for similarity, which will be called SR (Similarity Ratio). The proposed method discovers clusters of different densities, shapes, and sizes. The experimental results show the superior ability of the proposed method to detect clusters of different densities even with no discernible separations between them.
Band selection (BS) reduces effectively the spectral dimension of a hyperspectral image (HSI) by selecting relatively few representative bands, which allows efficient processing in subsequent tasks. Existing unsupervi...
详细信息
Band selection (BS) reduces effectively the spectral dimension of a hyperspectral image (HSI) by selecting relatively few representative bands, which allows efficient processing in subsequent tasks. Existing unsupervised BS methods based on subspace clustering are built on matrix-based models, where each band is reshaped as a vector. They encode the correlation of data only in the spectral mode (dimension) and neglect strong correlations between different modes, i.e., spatial modes and spectral mode. Another issue is that the subspace representation of bands is performed in the raw data space, where the dimension is often excessively high, resulting in a less efficient and less robust performance. To address these issues, in this article, we propose a tensor-based subspace clustering model for hyperspectral BS. Our model is developed on the well-known Tucker decomposition. The three factor matrices and a core tensor in our model encode jointly the multimode correlations of HSI, avoiding effectively to destroy the tensor structure and information loss. In addition, we propose well-motivated heterogeneous regularizations (HRs) on the factor matrices by taking into account the important local and global properties of HSI along three dimensions, which facilitates the learning of the intrinsic cluster structure of bands in the low-dimensional subspaces. Instead of learning the correlations of bands in the original domain, a common way for the matrix-based models, our model learns naturally the band correlations in a low-dimensional latent feature space, which is derived by the projections of two factor matrices associated with spatial dimensions, leading to a computationally efficient model. More importantly, the latent feature space is learned in a unified framework. We also develop an efficient algorithm to solve the resulting model. Experimental results on benchmark datasets demonstrate that our model yields improved performance compared to the state-of-the-art.
In multi-view clustering, an eigen-decomposition of the Laplacian matrix of the graph is usually necessary. This leads to a significant increase in time cost and also requires post-processing such as $k$ -means. In ad...
详细信息
In multi-view clustering, an eigen-decomposition of the Laplacian matrix of the graph is usually necessary. This leads to a significant increase in time cost and also requires post-processing such as $k$ -means. In addition, some methods require learning a uniform graph matrix. In large-scale data, this process significantly increase time and memory costs. To address these problems, this paper proposes Fast Multi-view clustering (FMvC). First, non-negative constraints are added to the objective function from the unified view of relaxed normalized and ratio cuts. Then, graph reconstruction is performed on the similarity matrix using an indication matrix to ensure that the obtained graph has robust intra-cluster and weak inter-cluster connectivity. Besides, the operation speed of the method can be further enhanced by setting a common labeling matrix. Finally, the problem is solved optimally based on the strategy of alternating directional multipliers. Experimental results on eight real-world datasets demonstrate the effectiveness of the proposed algorithm, which can always outperform eleven existing baseline algorithms.
暂无评论