Transactions are a simple and powerful mechanism for establishing fault-tolerance. To allow multiple processes to cooperate in a transaction we relax the isolation property and use message passing for communication. W...
详细信息
ISBN:
(纸本)9781595936028
Transactions are a simple and powerful mechanism for establishing fault-tolerance. To allow multiple processes to cooperate in a transaction we relax the isolation property and use message passing for communication. We call the new abstraction a speculation.
The explosive growth of data and the demand for learning useful information from such data have stimulated a crossdisciplinary area named big learning, which leverages a cluster of commodity machines to run machine le...
详细信息
ISBN:
(纸本)9781450344937
The explosive growth of data and the demand for learning useful information from such data have stimulated a crossdisciplinary area named big learning, which leverages a cluster of commodity machines to run machine learning applications on large datasets. Those applications are usually implemented on high-level programming frameworks [1-4], which are running atop managed runtimes. Since the applications usually involve many iterations of computation over input data, the frameworks will cache them for performance consideration. In this work, we investigate a set of vectorbased machine learning applications in Spark MLlib [3] and find that cached datasets have a strong impact on overall efficiency. Copyright is held by the Owner/Author(s).
Many sequential loops are actually scans or reductions and can be parallelized across iterations despite the loop-carried dependences. In this work, we consider the parallelization of such scan/reduction loops, and pr...
详细信息
ISBN:
(纸本)9781450349826
Many sequential loops are actually scans or reductions and can be parallelized across iterations despite the loop-carried dependences. In this work, we consider the parallelization of such scan/reduction loops, and propose a practical runtime approach called sampling-and-reconstruction to extract the hidden scan/reduction patterns in these loops.
In this paper, we evaluate the performance and usability of the parallelprogramming model OpenMP Superscalar (OmpSs), apply it to 10 different benchmarks and compare its performance with corresponding POSIX threads i...
详细信息
ISBN:
(纸本)9781450311601
In this paper, we evaluate the performance and usability of the parallelprogramming model OpenMP Superscalar (OmpSs), apply it to 10 different benchmarks and compare its performance with corresponding POSIX threads implementations.
We present a tool for mesh-partitioning parallelization of numerical programs working iteratively on an unstructured mesh. This conventional method splits a mesh into sub-meshes, adding some overlap on the boundaries ...
详细信息
ISBN:
(纸本)9780897919067
We present a tool for mesh-partitioning parallelization of numerical programs working iteratively on an unstructured mesh. This conventional method splits a mesh into sub-meshes, adding some overlap on the boundaries of the sub-meshes. The program is then run in SPMD mode on a parallel architecture with distributed memory. It is necessary to add calls to communication routines at a few carefully selected locations in the code. The tool presented here uses the data-dependence information to mechanize the placement of these synchronizations. Additionally, we see that there is not a unique solution for placing these synchronizations, and performance depends on this choice.
This poster is a case study on the application of a novel programming model, called Concurrent Collections (CnC), to the implementation of an asynchronous-parallel algorithm for computing the Cholesky factorization of...
详细信息
ISBN:
(纸本)9781605587080
This poster is a case study on the application of a novel programming model, called Concurrent Collections (CnC), to the implementation of an asynchronous-parallel algorithm for computing the Cholesky factorization of dense matrices. In CnC, the programmer expresses her computation in terms of application-specific operations, partially-ordered by semantic scheduling constraints. We demonstrate the performance potential of CnC in this poster, by showing that our Cholesky implementation nearly matches or exceeds competing vendor-tuned codes and alternative programming models. We conclude that the CnC model is well-suited for expressing asynchronous-parallel algorithms on emerging multicore systems.
暂无评论