Task-based programming models like OmpSs-2 and OpenMP provide a flexible data-flow execution model to exploit dynamic, irregular and nested parallelism. Providing an efficient implementation that scales well with smal...
详细信息
Dense linear algebra kernels are fundamental components of many scientific computing applications. In this work we present a novel method of deriving parallel I/O lower bounds for this broad family of programs. Based ...
详细信息
Task-parallel programs often enjoy deadlock freedom under certain restrictions, such as the use of structured join operations, as in Cilk and X10, or the use of asynchronous task futures together with deadlock-avoidin...
详细信息
Hypergraph partitioning is used in many problem domains including VLSI design, linear algebra, Boolean satisfiability, and data mining. Most versions of this problem are NP-complete or NP-hard, so practical hypergraph...
详细信息
It is a challenging task to train large DNN models on sophisticated GPU platforms with diversified interconnect capabilities. Recently, pipelined training has been proposed as an effective approach for improving devic...
详细信息
Binary code analysis is widely used to help assess a program's correctness, performance, and provenance. Binary analysis applications often construct control flow graphs, analyze data flow, and use debugging infor...
详细信息
We introduce Stream-K, a work-centric parallelization of matrix multiplication (GEMM) and related computations in dense linear algebra. Whereas contemporary decompositions are primarily tile-based, our method operates...
详细信息
ISBN:
(纸本)9798400700156
We introduce Stream-K, a work-centric parallelization of matrix multiplication (GEMM) and related computations in dense linear algebra. Whereas contemporary decompositions are primarily tile-based, our method operates by partitioning an even share of the aggregate inner loop iterations among physical processing elements. This provides a near-perfect utilization of computing resources, regardless of how efficiently the output tiling for any given problem quantizes across the underlying processing elements.
Bayesian networks (BNs) are attractive, because they are graphical and interpretable machine learning models. However, exact inference on BNs is time-consuming, especially for complex problems. To improve the efficien...
详细信息
ISBN:
(纸本)9798400700156
Bayesian networks (BNs) are attractive, because they are graphical and interpretable machine learning models. However, exact inference on BNs is time-consuming, especially for complex problems. To improve the efficiency, we propose a fast BN exact inference solution named Fast-BNI on multi-core CPUs. Fast-BNI enhances the efficiency of exact inference through hybrid parallelism that tightly integrates coarse- and fine-grained parallelism. We also propose techniques to further simplify the bottleneck operations of BN exact inference. Fast-BNI source code is freely available at https://***/jjiantong/FastBN.
We present ParGeo, a multicore library for computational geometry algorithms. We describe two of the algorithms from ParGeo, convex hull and the smallest enclosing ball, and present a short evaluation of all implement...
ISBN:
(纸本)9781450392044
We present ParGeo, a multicore library for computational geometry algorithms. We describe two of the algorithms from ParGeo, convex hull and the smallest enclosing ball, and present a short evaluation of all implementations currently in ParGeo.
We introduce a structure-aware parallel technique for context-bounded analysis of concurrent programs. The key intuition consists in decomposing the set of concurrent traces into symbolic subsets that are separately e...
详细信息
ISBN:
(纸本)9781450368186
We introduce a structure-aware parallel technique for context-bounded analysis of concurrent programs. The key intuition consists in decomposing the set of concurrent traces into symbolic subsets that are separately explored by multiple instances of the same decision procedure running in parallel. The decision procedures work on different partitions of the search space without cooperating, whence distribution follows effortlessly. Our experiments on a selection of complex multi-threaded programs show significant analysis speedups and scalability, and greater performance gains than with general-purpose parallel solvers.
暂无评论