Effective resistance (ER) is a fundamental metric for measuring node similarities in a graph, and it finds applications in various domains including graph clustering, recommendation systems, link prediction, and graph...
详细信息
Effective resistance (ER) is a fundamental metric for measuring node similarities in a graph, and it finds applications in various domains including graph clustering, recommendation systems, link prediction, and graph neural networks. The state-of-the-art algorithm for computing effective resistance relies on a landmark technique, which involves selecting a node that is easy to reach by all the other nodes as a landmark. The performance of this technique heavily depends on the chosen landmark node. However, in many real-life graphs, it is not always possible to find an easily reachable landmark node, which can significantly hinder the algorithm's efficiency. To overcome this problem, we propose a novel multiple landmarks technique which involves selecting a set of landmark nodes Vl such that the other nodes in the graph can easily reach any one of a landmark node in Vl. Specifically, we first propose several new formulas to compute ER with multiple landmarks, utilizing the concept of Schur complement. These new formulas allow us to pre-compute and maintain several small-sized matrices related to Vl as a compact index. With this powerful index technique, we demonstrate that both single-pair and single-source ER queries can be efficiently answered using a newly-developed Vl-absorbed random walk sampling or Vl-absorbed push technique. Comprehensive theoretical analysis shows that all proposed index-based algorithms achieve provable performance guarantees for both single-pair and single-source ER queries. Extensive experiments on 5 real-life datasets demonstrate the high efficiency of our multiple landmarks-based index techniques. For instance, our algorithms, with a 1.5 GB index size, can be up to 4 orders of magnitude faster than the state-of-the-art algorithms while achieving the same accuracy on a large road network.
The machine learning techniques for Markov random fields are fundamental in various fields involving pattern recognition, image processing, sparse modeling, and earth science, and a Boltzmann machine is one of the mos...
详细信息
The machine learning techniques for Markov random fields are fundamental in various fields involving pattern recognition, image processing, sparse modeling, and earth science, and a Boltzmann machine is one of the most important models in Markov random fields. However, the inference and learning problems in the Boltzmann machine are NP-hard. The investigation of an effective learning algorithm for the Boltzmann machine is one of the most important challenges in the field of statistical machine learning. In this paper, we study Boltzmann machine learning based on the (first-order) spatial Monte Carlo integration method, referred to as the 1-SMCI learning method, which was proposed in the author's previous paper. In the first part of this paper, we compare the method with the maximum pseudo-likelihood estimation (MPLE) method using a theoretical and a numerical approaches, and show the 1-SMCI learning method is more effective than the MPLE. In the latter part, we compare the 1-SMCI learning method with other effective methods, ratio matching and minimum probability flow, using a numerical experiment, and show the 1-SMCI learning method outperforms them.
In this paper, the problem about optimal iterative learning control of general nonlinear discrete-time systems has been studiedBased on the sufficient conditions of the existence of optimal iterative learning control ...
详细信息
In this paper, the problem about optimal iterative learning control of general nonlinear discrete-time systems has been studiedBased on the sufficient conditions of the existence of optimal iterative learning control in general nonlinear discrete-time systems, in view of the practical application, propose an approximate iterative algorithm, and prove the approximate iterative control restraining to the optimum control.
Traditional methods for solving multi-class problems,well-known as multi-SVMs,always combine certain decomposed binary-SVMs' results to formulate the final decision *** prevalent methods are‘one ***' and‘one...
详细信息
Traditional methods for solving multi-class problems,well-known as multi-SVMs,always combine certain decomposed binary-SVMs' results to formulate the final decision *** prevalent methods are‘one ***' and‘one ***',which are based on a voting scheme among the binary classifiers to derive the winning ***, they do not scale well with the data size and class *** Vector Machine(CVM) is a promising technique for scaling up a binary-SVM to handle large data sets with the greedy-expansion strategy,where the kernels are required to be normalized to ensure the equivalence between the kernel-induced spaces of SVM and Minimum Enclosing Ball(MEB). The idea proposed by CVM can also be utilized to formulate multi-SVM to MEB,by which we propose an approximate MEB algorithm with smaller core sets to handle *** experimental results on synthetic and benchmark data sets demonstrate the competitive performances of the method we proposed both on training time and training accuracy.
Recently, the counting algorithm of local topology structures, such as triangles, has been widely used in social network analysis, recommendation systems, user portraits and other fields. At present, one-pass streamin...
详细信息
ISBN:
(纸本)9781728125848
Recently, the counting algorithm of local topology structures, such as triangles, has been widely used in social network analysis, recommendation systems, user portraits and other fields. At present, one-pass streaming algorithm for counting global and local triangles has been widely studied, and most researches focus on the single-machine streaming algorithm in a 'offline+batch processing' mode. However, researches on distributed online algorithm on multiple machines are still in its infancy, and this stage has not been thoroughly studied. In this paper, we investigate the triangle counting problem in large-scale simple undirected graphs whose edges arrive as a stream. We propose two distributed online streaming algorithms to estimate the global number of triangles, which are based on the current best performance sampling-based streaming algorithm. We mainly realize the reasonable partition of the graph stream, so that each worker independently estimates the number of triangles in a subgraph of the graph stream. Experimental results show that our algorithms reduce the estimation error and are several times more accurate than state-of-the-art streaming algorithms.
k- and t-optimality algorithms [9, 6] provide solutions to DCOPs that are optimal in regions characterized by its size and distance respectively. Moreover, they provide quality guarantees on their solutions. Here we g...
详细信息
ISBN:
(纸本)9780982657157
k- and t-optimality algorithms [9, 6] provide solutions to DCOPs that are optimal in regions characterized by its size and distance respectively. Moreover, they provide quality guarantees on their solutions. Here we generalise the k- and t-optimal framework to introduce C-optimality, a flexible framework that provides reward-independent quality guarantees for optima in regions characterised by any arbitrary criterion. Therefore, C-optimality allows us to explore the space of criteria (beyond size and distance) looking for those that lead to better solution qualities. We benefit from this larger space of criteria to propose a new criterion, the so-called size-bounded-distance criterion, which outperforms k-and t-optimality.
In high-dimensional metric spaces, similarity search is extremely time-consuming. approximate algorithm has been suggested as a viable way to overcome the high-dimensional indexing problem. In medium and high-dimensio...
详细信息
In high-dimensional metric spaces, similarity search is extremely time-consuming. approximate algorithm has been suggested as a viable way to overcome the high-dimensional indexing problem. In medium and high-dimensional metric spaces, the permutation index is one of the effective approximate algorithms. However, how to choose proper permutants is still a challenging problem. The authors of permutation index method only provide a random selection solution for it, but there is still no good answer to determine a proper parameter k which is the size of permutants. In this paper, above problems are solved in two parts respectively. Firstly, a proper permutant selection method is suggested to improve the accuracy of indexing. Secondly, an efficient approach is provided to determine the number of permutants in permutation index, which improves searching performance without reducing the accuracy. We also show empirical evidence that supports our techniques.
Traditional methods for solving multi-class problems, well-known as multi-SVMs, always combine certain decomposed binary-SVMs' results to formulate the final decision function. The prevalent methods are 'one v...
详细信息
ISBN:
(纸本)9781424451814;9781424451821
Traditional methods for solving multi-class problems, well-known as multi-SVMs, always combine certain decomposed binary-SVMs' results to formulate the final decision function. The prevalent methods are 'one vs. one' and 'one vs. all' which are based on a voting scheme among the binary *** to derive the winning class. However, they do not scale well with the data size and class number. Core Vector Machine (CVM) is a promising technique for scaling up a binary-SVM to handle large data sets with the greedy-expansion strategy, where the kernels are required to be normalized to ensure the equivalence between the kernel-induced spaces of SVM and Minimum Enclosing Ball (MEB). The idea proposed by CVM can also be utilized to formulate multi-SVM to MEB, by which we propose an approximate MEB algorithm with smaller core sets to handle multi-SVM. The experimental results on synthetic and benchmark data sets demonstrate the competitive performances of the method we proposed both on training time and training accuracy.
This paper discusses the computation of matrix chain products of the form M1 × M22 × ··· × Mn where Mi‘s are matrices. The order in which the matrices are computed affects the number of ...
详细信息
Background: The discovery of patterns in DNA, RNA, and protein sequences has led to the solution of many vital biological problems. For instance, the identification of patterns in nucleic acid sequences has resulted i...
详细信息
Background: The discovery of patterns in DNA, RNA, and protein sequences has led to the solution of many vital biological problems. For instance, the identification of patterns in nucleic acid sequences has resulted in the determination of open reading frames, identification of promoter elements of genes, identification of intron/exon splicing sites, identification of SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have proven to be extremely helpful in domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, etc. Motifs are important patterns that are helpful in finding transcriptional regulatory elements, transcription factor binding sites, functional genomics, drug design, etc. As a result, numerous papers have been written to solve the motif search problem. Results: Three versions of the motif search problem have been proposed in the literature: Simple Motif Search (SMS), (l, d)-motif search (or Planted Motif Search (PMS)), and Edit-distance-based Motif Search (EMS). In this paper we focus on PMS. Two kinds of algorithms can be found in the literature for solving the PMS problem: exact and approximate. An exact algorithm identifies the motifs always and an approximate algorithm may fail to identify some or all of the motifs. The exact version of PMS problem has been shown to be NP-hard. Exact algorithms proposed in the literature for PMS take time that is exponential in some of the underlying parameters. In this paper we propose a generic technique that can be used to speedup PMS algorithms. Conclusions: We present a speedup technique that can be used on any PMS algorithm. We have tested our speedup technique on a number of algorithms. These experimental results show that our speedup technique is indeed very effective. The implemen
暂无评论