We introduce Stream-K, a work-centric parallelization of matrix multiplication (GEMM) and related computations in dense linear algebra. Whereas contemporary decompositions are primarily tile-based, our method operates...
详细信息
We present PARGEO, a multicore library for computational geometry algorithms. We describe two of the algorithms from PARGEO, convex hull and the smallest enclosing ball, and present a short evaluation of all implement...
详细信息
ISBN:
(纸本)9781450392044
We present PARGEO, a multicore library for computational geometry algorithms. We describe two of the algorithms from PARGEO, convex hull and the smallest enclosing ball, and present a short evaluation of all implementations currently in PARGEO.
the proceedings contain 13 papers. the topics discussed include: a guided walk into link key candidate extraction with relational concept analysis;reflections on profiling and cataloguing the content of SPARQL endpoin...
the proceedings contain 13 papers. the topics discussed include: a guided walk into link key candidate extraction with relational concept analysis;reflections on profiling and cataloguing the content of SPARQL endpoints using SPORTAL;reflections on: modeling linked open statistical data;reflections on: DCAT-AP representation of Czech national open data catalog and its impact;reflections on: deep learning for noise-tolerant RDFS reasoning;reflections on: finding melanoma drugs through a probabilistic knowledge graph;reflections on: knowledge graph fact prediction via knowledge-enriched tensor factorization;the semantic sensor network ontology, revamped;and reflections on: knowmore - knowledge base augmentation with structured web markup.
While there has been significant work on parallel graph processing, there has been very surprisingly little work on high-performance hypergraph processing. this paper presents a collection of efficient parallel algori...
详细信息
ISBN:
(纸本)9781450368186
While there has been significant work on parallel graph processing, there has been very surprisingly little work on high-performance hypergraph processing. this paper presents a collection of efficient parallel algorithms for hypergraph processing, including algorithms for betweenness centrality, maximal independent set, k-core decomposition, hyper-trees, hyperpaths, connected components, PageRank, and single-source shortest paths. For these problems, we either provide new parallel algorithms or more efficient implementations than prior work. Furthermore, our algorithms are theoretically-efficient in terms of work and depth. To implement our algorithms, we extend the Ligra graph processing framework to support hypergraphs, and our implementations benefit from graph optimizations including switching between sparse and dense traversals based on the frontier size, edge-aware parallelization, using buckets to prioritize processing of vertices, and compression. Our experiments on a 72-core machine and show that our algorithms obtain excellent parallel speedups, and are *** faster than algorithms in existing hypergraph processing frameworks.
the proceedings contain 45 papers. the topics discussed include: a peta-scalable CPU-GPU algorithm for global atmospheric simulations;adoption protocols for fanout-optimal fault-tolerant termination detection;betweenn...
ISBN:
(纸本)9781450319225
the proceedings contain 45 papers. the topics discussed include: a peta-scalable CPU-GPU algorithm for global atmospheric simulations;adoption protocols for fanout-optimal fault-tolerant termination detection;betweenness centrality: algorithms and implementations;complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU;fast concurrent queues for x86 processors;FASTLANE: improving performance of software transactional memory for low thread counts;Ligra: a lightweight graph processing framework for shared memory;ownership passing: efficient distributed memory programming on multi-core systems;parallel suffix array and least common prefix for the GPU;Streamscan: fast scan algorithms for GPUs without global barrier synchronization;using hardware transactional memory to correct and simplify a readers-writer lock algorithm;and exploring different automata representations for efficient regular expression matching on GPUs.
Chase and Lev's concurrent deque is a key data structure in shared-memory parallelprogramming and plays an essential role in work-stealing schedulers. We provide the first correctness proof of an optimized implem...
详细信息
In the sciences, it is common to use the so-called "big operator" notation to express the iteration of a binary operator (the reducer) over a collection of values. Such a notation typically assumes that the ...
详细信息
We explore the challenges in expressing and managing concurrency in browsers on mobile devices. Browsers are complex applications that implement multiple standards, need to support legacy behavior, and are highly dyna...
详细信息
In this paper we propose a novel approach which automatizes task partitioning in heterogeneous systems. Our framework is based on the Insieme Compiler and Runtime infrastructure [1]. the compiler translates a single-d...
详细信息
ISBN:
(纸本)9781450319225
In this paper we propose a novel approach which automatizes task partitioning in heterogeneous systems. Our framework is based on the Insieme Compiler and Runtime infrastructure [1]. the compiler translates a single-device OpenCL program into a multi-device OpenCL program. the runtime system then performs dynamic task partitioning based on an offline-generated prediction model. In order to derive the prediction model, we use a machine learning approach that incorporates static program features as well as dynamic, input sensitive features. Our approach has been evaluated over a suite of 23 programs and achieves performance improvements compared to an execution of the benchmarks on a single CPU and a single GPU only.
暂无评论