We provide evidence that nonlinear dimensionality reduction, clustering, and data set parameterization can be solved within one and the same framework. The main idea is to define a system of coordinates with an explic...
详细信息
We provide evidence that nonlinear dimensionality reduction, clustering, and data set parameterization can be solved within one and the same framework. The main idea is to define a system of coordinates with an explicit metric that reflects the connectivity of a given data set and that is robust to noise. Our construction, which is based on a Markov random walk on the data, offers a general scheme of simultaneously reorganizing and subsampling graphs and arbitrarily shaped data sets in high dimensions using intrinsic geometry. We show that clustering in embedding spaces is equivalent to compressing operators. The objective of data partitioning and clustering is to coarse-grain the random walk on the data while at the same time preserving a diffusion operator for the intrinsic geometry or connectivity of the data set up to some accuracy. We show that the quantization distortion in diffusion space bounds the error of compression of the operator, thus giving a rigorous justification fork-means clustering in diffusion space and a precise measure of the performance of general clustering algorithms.
We introduce a concept of similarity between vertices of directed graphs. Let CA and G(B) be two directed graphs with, respectively, n(A) and n(B) vertices. We define an n(B) x n(A) similarity matrix S whose real entr...
详细信息
We introduce a concept of similarity between vertices of directed graphs. Let CA and G(B) be two directed graphs with, respectively, n(A) and n(B) vertices. We define an n(B) x n(A) similarity matrix S whose real entry s(ij) expresses how similar vertex j (in G(A)) is to vertex i (in G(B)): we say that s(ij) is their similarity score. The similarity matrix can be obtained as the limit of the normalized even iterates of Sk+1 = BS(k)A(T) + B(T)S(k)A, where A and B are adjacency matrices of the graphs and So is a matrix whose entries are all equal to 1. In the special case where G(A) = G(B) = G, the matrix S is square and the score s(ij) is the similarity score between the vertices i and j of G. We point out that Klemberg's "hub and authority" method to identify web-pages relevant to a given query can be viewed as a special case of our definition in the case where one of the graphs has two vertices and a unique directed edge between them. In analogy to Kleinberg, we show that our similarity scores are given by the components of a dominant eigenvector of a nonnegative matrix. Potential applications of our similarity concept are numerous. We illustrate an application for the automatic extraction of synonyms in a monolingual dictionary.
In this paper, we present an experimental comparison of various graph-based approximate nearest neighbor (ANN) search algorithms deployed on edge devices for real-time nearest neighbor search applications, such as sma...
详细信息
A potential maximal clique of a graph is a vertex set that induces a maximal clique in some minimal triangulation of that graph. It is known that if these objects can be listed in polynomial time for a class of graphs...
详细信息
A potential maximal clique of a graph is a vertex set that induces a maximal clique in some minimal triangulation of that graph. It is known that if these objects can be listed in polynomial time for a class of graphs, the treewidth and the minimum fill-in are polynomially tractable for these graphs. We show here that the potential maximal cliques of a graph can be generated in polynomial time in the number of minimal separators of the graph. Thus, the treewidth and the minimum fill-in are polynomially tractable for all classes of graphs with a polynomial number of minimal separators. (C) 2002 Elsevier Science B.V. All rights reserved.
graphs that are used to model real-world entities with vertices and relationships among entities with edges,have proven to be a powerful tool for describing real-world problems in *** most real-world scenarios,entitie...
详细信息
graphs that are used to model real-world entities with vertices and relationships among entities with edges,have proven to be a powerful tool for describing real-world problems in *** most real-world scenarios,entities and their relationships are subject to constant *** that record such changes are called dynamic *** recent years,the widespread application scenarios of dynamic graphs have stimulated extensive research on dynamic graph processing systems that continuously ingest graph updates and produce up-to-date graph analytics *** the scale of dynamic graphs becomes larger,higher performance requirements are demanded to dynamic graph processing *** the massive parallel processing power and high memory bandwidth,GPUs become mainstream vehicles to accelerate dynamic graph processing ***-based dynamic graph processing systems mainly address two challenges:maintaining the graph data when updates occur(i.e.,graph updating)and producing analytics results in time(i.e.,graph computing).In this paper,we survey GPU-based dynamic graph processing systems and review their methods on addressing both graph updating and graph *** comprehensively discuss existing dynamic graph processing systems on GPUs,we first introduce the terminologies of dynamic graph processing and then develop a taxonomy to describe the methods employed for graph updating and graph *** addition,we discuss the challenges and future research directions of dynamic graph processing on GPUs.
Recently, the counting algorithm of local topology structures, such as triangles, has been widely used in social network analysis, recommendation systems, user portraits and other fields. At present, one-pass streamin...
详细信息
ISBN:
(纸本)9781728125848
Recently, the counting algorithm of local topology structures, such as triangles, has been widely used in social network analysis, recommendation systems, user portraits and other fields. At present, one-pass streaming algorithm for counting global and local triangles has been widely studied, and most researches focus on the single-machine streaming algorithm in a 'offline+batch processing' mode. However, researches on distributed online algorithm on multiple machines are still in its infancy, and this stage has not been thoroughly studied. In this paper, we investigate the triangle counting problem in large-scale simple undirected graphs whose edges arrive as a stream. We propose two distributed online streaming algorithms to estimate the global number of triangles, which are based on the current best performance sampling-based streaming algorithm. We mainly realize the reasonable partition of the graph stream, so that each worker independently estimates the number of triangles in a subgraph of the graph stream. Experimental results show that our algorithms reduce the estimation error and are several times more accurate than state-of-the-art streaming algorithms.
We propose a new model for graph editing problems on intersection graphs called Geometricgraph Edit *** well-studied graph editing problems, adding and deleting vertices and edges are used as graph editing *** a graph...
详细信息
graph covers are a way to describe continuous maps (and homeomorphisms) of the Cantor set, more generally than e.g. Bratteli-Vershik systems. Every continuous map on a zero-dimensional compact set can be expressed by ...
详细信息
A graph is c-closed if every pair of vertices with at least c common neighbors is adjacent. The c-closure of a graph G is the smallest number c such that G is c-closed. Fox et al. [SIAM J. Comput.’20] defined c-closu...
详细信息
In this work we design graph neural network architectures that capture optimal approximation algorithms for a large class of combinatorial optimization problems, using powerful algorithmic tools from semidefinite prog...
详细信息
暂无评论