A topological order of the vertices of a directed acyclic graph G = (V,E) is any total order ord such that if (x, y) is an element of E, then x precedes y in ord. In this paper we consider the dynamic version of this ...
详细信息
A topological order of the vertices of a directed acyclic graph G = (V,E) is any total order ord such that if (x, y) is an element of E, then x precedes y in ord. In this paper we consider the dynamic version of this problem, and provide simple algorithms and data structures achieving O(n) amortized time per edge insertion starting from an empty graph, which favorably compares to the trivial O(m+n) time bound per operation obtained applying the off-line algorithm. The additional space requirement, beside the representation of the graph itself, is O(n). Experimental results show that our algorithm performs in practice orders of magnitude faster than the off-line algorithm.
Computing shortest distances is a central task in many domains. The growing number of applications dealing with dynamic graphs calls for incremental algorithms, as it is impractical to recompute shortest distances fro...
详细信息
Computing shortest distances is a central task in many domains. The growing number of applications dealing with dynamic graphs calls for incremental algorithms, as it is impractical to recompute shortest distances from scratch every time updates occur. In this paper, we address the problem of maintaining all-pairs shortest distances in dynamic graphs. We propose efficient incremental algorithms to process sequences of edge deletions/insertions/updates and vertex deletions/insertions. The proposed approach relies on some general operators that can be easily "instantiated" both in main memory and on top of different underlying DBMSs. We provide complexity analyses of the proposed algorithms. Experimental results on several real-world datasets show that current main-memory algorithms become soon impractical, disk-based ones are needed for larger graphs, and our approach significantly outperforms state-of-the-art algorithms.
In many modern applications, the generated data is a dynamic networks. The networks are graphs that change over time by a sequence of update operations (node addition, node deletion, edge addition, edge deletion, and ...
详细信息
In many modern applications, the generated data is a dynamic networks. The networks are graphs that change over time by a sequence of update operations (node addition, node deletion, edge addition, edge deletion, and edge weight change). In these networks, it is inefficient to compute from scratch the solution of a data mining/machine learning task, after any update operation. Therefore in recent years, several so-called dynamical algorithms have been proposed that update the solution, instead of computing it from scratch. In this paper, first we formulate this emerging setting and discuss its high-level algorithmic aspects. Then, we review state of the art dynamical algorithms proposed for several data mining and machine learning tasks, including frequent pattern discovery, betweenness/closeness/PageRank centralities, clustering, classification, and regression. This article is categorized under: Technologies > Structure Discovery and Clustering Technologies > Machine Learning Fundamental Concepts of Data and Knowledge > Big Data Mining
Arising in connection with multiobjective selection and archiving, the hypervolume subset selection problem (HSSP) consists in finding a subset of size k = 3. The decremental greedy counterpart has no known approximat...
详细信息
Arising in connection with multiobjective selection and archiving, the hypervolume subset selection problem (HSSP) consists in finding a subset of size k <= n of a set X. R-d of n nondominated points in d-dimensional space that maximizes the hypervolume indicator. The incremental greedy approximation to the HSSP has an approximation guarantee of 1 - 1/e, and is polynomial in n and k, while no polynomial exact algorithms are known for d >= 3. The decremental greedy counterpart has no known approximation guarantee, but is potentially faster for large k, and still leads to good approximations in practice. The computation and update of individual hypervolume contributions are at the core of the implementation of this greedy strategy. In this paper, new algorithms for the computation and update of hypervolume contributions are developed. In three dimensions, updating the total hypervolume and all individual contributions under single-point changes is performed in linear time, while in the 4-D case all contributions are computed in O(n(2)) time. As a consequence, the decremental greedy approximation to the HSSP can now be obtained in O(n(n - k) + n log n) and O(n(2)(n - k)) time for d = 3 and d = 4, respectively. Experimental results show that the proposed algorithms significantly outperform existing ones.
We consider the problem of a Parameter Server (PS) that wishes to learn a model that fits data distributed on the nodes of a graph. We focus on Federated Learning (FL) as a canonical application. One of the main chall...
详细信息
We consider the problem of a Parameter Server (PS) that wishes to learn a model that fits data distributed on the nodes of a graph. We focus on Federated Learning (FL) as a canonical application. One of the main challenges of FL is the communication bottleneck between the nodes and the parameter server. A popular solution in the literature is to allow each node to do several local updates on the model in each iteration before sending it back to the PS. While this mitigates the communication bottleneck, the statistical heterogeneity of the data owned by the different nodes has proven to delay convergence and bias the model. In this work, we study random walk (RW) learning algorithms for tackling the communication and data heterogeneity problems. The main idea is to leverage available direct connections among the nodes themselves, which are typically "cheaper" than the communication to the PS. In a random walk, the model is thought of as a "baton" that is passed from a node to one of its neighbors after being updated in each iteration. The challenge in designing the RW is the data hetErogeneity and the uncertainty about the data distributions. Ideally, we would want to visit more often nodes that hold more informative data. We cast this problem as a sleeping multi-armed bandit (MAB) to design near-optimal node sampling strategy that achieves a variance reduced gradient estimates and approaches sub-linearly the optimal sampling strategy. Based on this framework, we present an adaptive random walk learning algorithm. We provide theoretical guarantees on its convergence. Our numerical results validate our theoretical findings and show that our algorithm outperforms existing random walk algorithms.
In the process of concept learning, target concepts may have portions with short-term changes, other portions may support long-term changes, and yet others may not change at all. For this reason several local windows ...
详细信息
In the process of concept learning, target concepts may have portions with short-term changes, other portions may support long-term changes, and yet others may not change at all. For this reason several local windows need to be handled. We suggest facing this problem, which naturally exists in the field of concept learning, by allocating windows which can adapt their size to portions of the target concept. We propose an incremental decision tree that is updated with incoming examples. Each leaf of the decision tree holds a time window and a local performance measure as the main parameter to be controlled. When the performance of a leaf decreases, the size of its local window is reduced. This learning algorithm, called OnlineTree2, automatically adjusts its internal parameters in order to face the current dynamics of the data stream. Results show that it is comparable to other batch algorithms when facing problems with no concept change, and it is better than evaluated methods in its ability to deal with concept drift when dealing with problems in which: concept change occurs at different speeds, noise may be present and, examples may arrive from different areas of the problem domain ( virtual drift).
We present a unified framework for convergence analysis of generalized subgradient-type algorithms in the presence of perturbations. A principal novel feature of our analysis is that perturbations need not tend to zer...
详细信息
We present a unified framework for convergence analysis of generalized subgradient-type algorithms in the presence of perturbations. A principal novel feature of our analysis is that perturbations need not tend to zero in the limit. It is established that the iterates of the algorithms are attracted, in a certain sense, to an epsilon-stationary set of the problem, where epsilon depends on the magnitude of perturbations. Characterization of the attraction sets is given in the general (nonsmooth and nonconvex) case. The results are further strengthened for convex, weakly sharp, and strongly convex problems. Our analysis extends and unifies previously known results on convergence and stability properties of gradient and subgradient methods, including their incremental, parallel, and heavy ball modifications.
We construct an optimal linear-time algorithm for the maximal planar subgraph problem: given a graph G, find a planar subgraph G' of G such that adding to G' an extra edge of G results in a nonplanar graph. Ou...
详细信息
We construct an optimal linear-time algorithm for the maximal planar subgraph problem: given a graph G, find a planar subgraph G' of G such that adding to G' an extra edge of G results in a nonplanar graph. Our solution is based on a fast data structure for incremental planarity testing of triconnected graphs and a dynamic graph search procedure. Our algorithm can be transformed into a new optimal planarity testing algorithm.
This article presents OLOC, an incremental concept formation system that learns and uses overlapping concepts. OLOC learns probabilistic concepts that have overlapping extensions and does so to maximize expected predi...
详细信息
This article presents OLOC, an incremental concept formation system that learns and uses overlapping concepts. OLOC learns probabilistic concepts that have overlapping extensions and does so to maximize expected predictive accuracy. When making predictions, OLOC can combine multiple overlapping concepts.
We give the first efficient parallel algorithms for solving the arrangement problem. We give a deterministic algorithm for the CREW PRAM which runs in nearly optimal bounds of O (log n log* n) time and n(2)/log n proc...
详细信息
We give the first efficient parallel algorithms for solving the arrangement problem. We give a deterministic algorithm for the CREW PRAM which runs in nearly optimal bounds of O (log n log* n) time and n(2)/log n processors. We generalize this to obtain an O (log n log* n)-time algorithm using n(d)/log n processors for solving the problem in d dimensions. We also give a randomized algorithm for the EREW PRAM that constructs an arrangement of n lines on-line, in which each insertion is done in optimal O (log n) time using n/log n processors. Our algorithms develop new parallel data structures and new methods for traversing an arrangement.
暂无评论