We present a tool for mesh-partitioning parallelization of numerical programs working iteratively on an unstructured mesh. this conventional method splits a mesh into sub-meshes, adding some overlap on the boundaries ...
详细信息
ISBN:
(纸本)9780897919067
We present a tool for mesh-partitioning parallelization of numerical programs working iteratively on an unstructured mesh. this conventional method splits a mesh into sub-meshes, adding some overlap on the boundaries of the sub-meshes. the program is then run in SPMD mode on a parallel architecture with distributed memory. It is necessary to add calls to communication routines at a few carefully selected locations in the code. the tool presented here uses the data-dependence information to mechanize the placement of these synchronizations. Additionally, we see that there is not a unique solution for placing these synchronizations, and performance depends on this choice.
this poster is a case study on the application of a novel programming model, called Concurrent Collections (CnC), to the implementation of an asynchronous-parallel algorithm for computing the Cholesky factorization of...
详细信息
ISBN:
(纸本)9781605587080
this poster is a case study on the application of a novel programming model, called Concurrent Collections (CnC), to the implementation of an asynchronous-parallel algorithm for computing the Cholesky factorization of dense matrices. In CnC, the programmer expresses her computation in terms of application-specific operations, partially-ordered by semantic scheduling constraints. We demonstrate the performance potential of CnC in this poster, by showing that our Cholesky implementation nearly matches or exceeds competing vendor-tuned codes and alternative programming models. We conclude that the CnC model is well-suited for expressing asynchronous-parallel algorithms on emerging multicore systems.
A variety of programming models exist to support large-scale, distributed memory, parallel computation. these programming models have historically targeted coarse-grained applications with natural locality such as tho...
详细信息
ISBN:
(纸本)9781450301190
A variety of programming models exist to support large-scale, distributed memory, parallel computation. these programming models have historically targeted coarse-grained applications with natural locality such as those found in a variety of scientific simulations of the physical world. Fine-grained, irregular, and unstructured applications such as those found in biology, social network analysis, and graph theory are less well supported. We propose Active Pebbles, a programming model which allows these applications to be expressed naturally;an accompanying execution model ensures performance and scalability.
Object-oriented programming languages like Java provide only low-level constructs (e.g., starting a thread) to describe concurrency. High-level abstractions (e.g., thread pools) are merely provided as a library. As a ...
详细信息
ISBN:
(纸本)9781450311601
Object-oriented programming languages like Java provide only low-level constructs (e.g., starting a thread) to describe concurrency. High-level abstractions (e.g., thread pools) are merely provided as a library. As a result, a compiler is not aware of the high-level semantics of a parallel library and therefore misses important optimization opportunities. this paper presents a simple source language extension based on which a compiler can perform new optimizations that are particularly effective for parallel code.
We introduce our major ideas of a wait-free, linearizable, and disjoint-access parallel NCAS library, called RTNCAS. It focuses the construction of wait-free data structure operations (DSO) in real-time circumstances....
详细信息
ISBN:
(纸本)9781450301190
We introduce our major ideas of a wait-free, linearizable, and disjoint-access parallel NCAS library, called RTNCAS. It focuses the construction of wait-free data structure operations (DSO) in real-time circumstances. RTNCAS is able to conditionally swap multiple independent words (NCAS) in an atomic manner. It allows us, furthermore, to implement arbitrary DSO by means of their sequential specification.
Graph processing, especially high-performance graph traversal, plays a more and more important role in data analytics. the successor of Sunway TaihuLight, NEW SUNWAY, is equipped with nearly 10 PB memory and over 40 m...
详细信息
ISBN:
(纸本)9781450392044
Graph processing, especially high-performance graph traversal, plays a more and more important role in data analytics. the successor of Sunway TaihuLight, NEW SUNWAY, is equipped with nearly 10 PB memory and over 40 million cores, which brings the opportunity to process hundreds of trillions of edges graphs. However, the graph with an unprecedented scale also brings severe performance challenges, including load imbalance, poor locality, and irregular access of graph traversal workload. To address the scalability problem, we propose a novel 3-level degree-aware 1.5D graph partitioning, which benefits from both delegated 1D and 2D partitioning. By delegating extremely heavy vertices globally and other heavy vertices on columns and rows in the processes mesh, we break the scalability wall of previous partitioning methods. Together with sub-iteration direction optimization, core group -aware core subgraph segmenting, and a new on-chip sorting mechanism using RMA, we achieve 180,792 GTEPS on a graph with 281 trillion edges, using 103,912 processors with over 40 million cores, achieving 1.75x performance and 8x capacity compared to the previous state of the art and conforming to the Graph 500 BFS benchmark[14].
暂无评论