A data-graph computation - popularized by such programming systems as Galois, Pregel, graphLab, Powergraph, and graphChi - is an algorithm that performs local updates on the vertices of a graph. During each round of a...
详细信息
ISBN:
(纸本)9781450328210
A data-graph computation - popularized by such programming systems as Galois, Pregel, graphLab, Powergraph, and graphChi - is an algorithm that performs local updates on the vertices of a graph. During each round of a data-graph computation, an update function atomically modifies the data associated with a vertex as a function of the vertex's prior data and that of adjacent vertices. A dynamic data-graph computation updates only an active subset of the vertices during a round, and those updates determine the set of active vertices for the next round. This paper introduces PRISM, a chromatic-scheduling algorithm for executing dynamic data-graph computations. PRISM uses a vertex-coloring of the graph to coordinate updates performed in a round, precluding the need for mutual-exclusion locks or other nondeterministic data synchronization. A multibag data structure is used by PRISM to maintain a dynamic set of active vertices as an unordered set partitioned by color. We analyze PRISM using work-span analysis. Let G = (V,E) be a degree-Delta graph colored with colors, and suppose that Q subset of V is the set of active vertices in a round. Define size(Q) = vertical bar Q vertical bar+ Sigma (v is an element of q) deg (v) which is proportional to the space required to store the vertices of Q using a sparsegraph layout. We show that a P-processor execution of PRISM performs updates in Q using O(chi(lg(Q/chi) + lg Delta D) + lgP) span and Theta(size(Q)+ chi + P) work. These theoretical guarantees are matched by good empirical performance. We modified graphLab to incorporate PRISM and studied seven application benchmarks on a 12-core multicore machine. PRISM executes the benchmarks 1:2-2:1 times faster than graphLab's nondeterministic lock-based scheduler while providing deterministic behavior. This paper also presents PRISM-R, a variation of PRISM that executes dynamic data-graph computations deterministically even when updates modify global variables with associative operat
A data-graph computation — popularized by such programming systems as Galois, Pregel, graphLab, Powergraph, and graphChi — is an algorithm that performs local updates on the vertices of a graph. During each round of...
详细信息
ISBN:
(纸本)9781450328210
A data-graph computation — popularized by such programming systems as Galois, Pregel, graphLab, Powergraph, and graphChi — is an algorithm that performs local updates on the vertices of a graph. During each round of a data-graph computation, an update function atomically modifies the data associated with a vertex as a function of the vertex's prior data and that of adjacent vertices. A dynamic data-graph computation updates only an active subset of the vertices during a round, and those updates determine the set of active vertices for the next *** paper introduces PRISM, a chromatic-scheduling algorithm for executing dynamic data-graph computations. PRISM uses a vertex-coloring of the graph to coordinate updates performed in a round, precluding the need for mutual-exclusion locks or other nondeterministic data synchronization. A multibag data structure is used by PRISM to maintain a dynamic set of active vertices as an unordered set partitioned by color. We analyze PRISM using work-span analysis. Let G=(V,E) be a degree-Δ graph colored with Χ colors, and suppose that Q⊆V is the set of active vertices in a round. Define size(Q)=[Q] + Σv∈Qdeg(v), which is proportional to the space required to store the vertices of Q using a sparse-graph layout. We show that a P-processor execution of PRISM performs updates in Q using O(Χ(lg (Q/Χ)+lgΔ)+ lgP) span and Θ(size(Q)+Χ+P) work. These theoretical guarantees are matched by good empirical performance. We modified graphLab to incorporate PRISM and studied seven application benchmarks on a 12-core multicore machine. PRISM executes the benchmarks 1.2–2.1 times faster than graphLab's nondeterministic lock-based scheduler while providing deterministic *** paper also presents PRISM-R, a variation of PRISM that executes dynamic data-graph computations deterministically even when updates modify global variables with associative operations. PRISM-R satisfies the same theoretical bounds as PRISM, but its implementation is
Scientific computing problems are frequently solved using data-graph computations - algorithms that perform local updates on application-specific data associated with vertices of a graph, over many time steps. The dat...
详细信息
ISBN:
(纸本)9781450357999
Scientific computing problems are frequently solved using data-graph computations - algorithms that perform local updates on application-specific data associated with vertices of a graph, over many time steps. The data-graph in such computations is commonly a mesh graph, where vertices have positions in 3D space, and edges connect physically nearby vertices. A scheduler controls the parallel execution of the algorithm. Two classes of parallel schedulers exist: double-buffering and in-place. Double-buffering schedulers do not incur synchronization overheads due to an absence of read-write conflicts, but require two copies of the vertices, as well as a higher iteration count due to a slower convergence rate. computations for which this difference in convergence rate is significant (e.g., multigrid method) are frequently performed using an in-place scheduler, which incurs synchronization overheads to avoid read-write conflicts on the single copy of vertex data. We present LAIKA, a deterministic in-place scheduler we created using a principled three-step design strategy for high-performance schedulers. LAIKA reorders the input graph using a Hilbert space-filling curve to improve cache locality and minimizes parallel coordination overhead by explicitly curbing excess execution parallelism. Consequently, Laika has significantly lower scheduling overhead than alternative in-place schedulers and is even faster per iteration than the parallel double-buffered implementation on a reordered input graph. We derive an improved bound on the expected number of cache misses incurred during a traversal of a graph reordered using a space-filling curve. We also prove that on a mesh graph G = (V, E), Laika performs O(vertical bar V vertical bar+vertical bar E vertical bar) total work and achieves linear expected speedup with P = O(vertical bar V vertical bar/log(2)vertical bar V vertical bar) workers. On 48 cores, LAIKA yields 38.4x parallel speedup and empirically fares well against co
暂无评论