Transactions are a simple and powerful mechanism for establishing fault-tolerance. To allow multiple processes to cooperate in a transaction we relax the isolation property and use message passing for communication. W...
详细信息
ISBN:
(纸本)9781595936028
Transactions are a simple and powerful mechanism for establishing fault-tolerance. To allow multiple processes to cooperate in a transaction we relax the isolation property and use message passing for communication. We call the new abstraction a speculation.
Many sequential loops are actually scans or reductions and can be parallelized across iterations despite the loop-carried dependences. In this work, we consider the parallelization of such scan/reduction loops, and pr...
详细信息
ISBN:
(纸本)9781450349826
Many sequential loops are actually scans or reductions and can be parallelized across iterations despite the loop-carried dependences. In this work, we consider the parallelization of such scan/reduction loops, and propose a practical runtime approach called sampling-and-reconstruction to extract the hidden scan/reduction patterns in these loops.
In this paper, we evaluate the performance and usability of the parallelprogramming model OpenMP Superscalar (OmpSs), apply it to 10 different benchmarks and compare its performance with corresponding POSIX threads i...
详细信息
ISBN:
(纸本)9781450311601
In this paper, we evaluate the performance and usability of the parallelprogramming model OpenMP Superscalar (OmpSs), apply it to 10 different benchmarks and compare its performance with corresponding POSIX threads implementations.
The explosive growth of data and the demand for learning useful information from such data have stimulated a crossdisciplinary area named big learning, which leverages a cluster of commodity machines to run machine le...
详细信息
ISBN:
(纸本)9781450344937
The explosive growth of data and the demand for learning useful information from such data have stimulated a crossdisciplinary area named big learning, which leverages a cluster of commodity machines to run machine learning applications on large datasets. Those applications are usually implemented on high-level programming frameworks [1-4], which are running atop managed runtimes. Since the applications usually involve many iterations of computation over input data, the frameworks will cache them for performance consideration. In this work, we investigate a set of vectorbased machine learning applications in Spark MLlib [3] and find that cached datasets have a strong impact on overall efficiency. Copyright is held by the Owner/Author(s).
We present a tool for mesh-partitioning parallelization of numerical programs working iteratively on an unstructured mesh. This conventional method splits a mesh into sub-meshes, adding some overlap on the boundaries ...
详细信息
ISBN:
(纸本)9780897919067
We present a tool for mesh-partitioning parallelization of numerical programs working iteratively on an unstructured mesh. This conventional method splits a mesh into sub-meshes, adding some overlap on the boundaries of the sub-meshes. The program is then run in SPMD mode on a parallel architecture with distributed memory. It is necessary to add calls to communication routines at a few carefully selected locations in the code. The tool presented here uses the data-dependence information to mechanize the placement of these synchronizations. Additionally, we see that there is not a unique solution for placing these synchronizations, and performance depends on this choice.
暂无评论