检索结果-内蒙古大学图书馆

ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA)

作者： Pai, Sreepathi Pingali, Keshav Univ Texas Austin Austin TX 78712 USA

ISBN: (纸本)9781450344449

Writing high-performance GPU implementations of graph algorithms can be challenging. In this paper, we argue that three optimizations called throughput optimizations are key to high-performance for this application class. These optimizations describe a large implementation space making it unrealistic for programmers to implement them by hand. To address this problem, we have implemented these optimizations in a compiler that produces CUDA code from an intermediate-level program representation called IrGL. Compared to state-of-the-art handwritten CUDA implementations of eight graph applications, code generated by the IrGL compiler is up to 5.95x times faster (median 1.4x) for five applications and never more than 30% slower for the others. Throughput optimizations contribute an improvement up to 4.16x (median 1.4x) to the performance of unoptimized IrGL code.

关键词： Graph applications amorphous data-parallelism GPUs compilers optimization throughput

来源：评论

学校读者我要写书评

暂无评论

A Compiler for Throughput Optimization of Graph Algorithms on GPUs

引用

ACM SIGPLAN NOTICES 2016年第10期51卷 1-19页

作者： Pai, Sreepathi Pingali, Keshav Univ Texas Austin Austin TX 78712 USA

关键词： Graph applications amorphous data-parallelism GPUs compilers optimization throughput

来源：评论

学校读者我要写书评

暂无评论

Scaling Runtimes for Irregular Algorithms to Large-Scale NUMA Systems

引用

COMPUTER 2015年第8期48卷 35-44页

作者： Lenharth, Andrew Pingali, Keshav Univ Texas Austin Inst Computat Sci & Engn Austin TX 78712 USA Univ Texas Austin Dept Comp Sci Austin TX 78712 USA

The Galois system can automatically parallelize irregular algorithms written in a serial programming model and execute them efficiently on nonuniform memory access (NUMA) machines. Experimental results for five comple... 详细信息

关键词： multi-threading parallel algorithms automatic irregular algorithm parallelization complex irregular algorithms large-scale NUMA systems nonuniform memory access machines runtime scaling serial programming model Computer architecture Computer graphics Galois fields Irregular algorithms Large-scale systems Memory management Parallel programming Runtime Scalability Software engineering ADP Galois NUMA amorphous data-parallelism computer architecture graph analytics irregular algorithms irregular applications memory allocation nonuniform memory access parallel programming scalability software engineering

来源：评论

学校读者我要写书评

暂无评论

Synthesizing Parallel Graph Programs via Automated Planning 15

Synthesizing Parallel Graph Programs via Automated Planning

引用

36th ACM SIGPLAN Conference on Programming Language Design and Implementation

作者： Prountzos, Dimitrios Manevich, Roman Pingali, Keshav Univ Texas Austin Austin TX 78712 USA Ben Gurion Univ Negev IL-84105 Beer Sheva Israel

ISBN: (纸本)9781450334686

We describe a system that uses automated planning to synthesize correct and efficient parallel graph programs from high-level algorithmic specifications. Automated planning allows us to use constraints to declaratively encode program transformations such as scheduling, implementation selection, and insertion of synchronization. Each plan emitted by the planner satisfies all constraints simultaneously, and corresponds to a composition of these transformations. In this way, we obtain an integrated compilation approach for a very challenging problem domain. We have used this system to synthesize parallel programs for four graph problems: triangle counting, maximal independent set computation, preflow-push maxflow, and connected components. Experiments on a variety of inputs show that the synthesized implementations perform competitively with hand-written, highly-tuned code.

关键词： Languages Performance Verification Synthesis Compiler Optimization Concurrency parallelism amorphous data-parallelism Irregular Programs

来源：评论

学校读者我要写书评

暂无评论

Parallel Flow-Sensitive Pointer Analysis by Graph-Rewriting

Parallel Flow-Sensitive Pointer Analysis by Graph-Rewriting

引用

22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

作者： Nagaraj, Vaivaswatha Govindarajan, R. Indian Inst Sci Bangalore 560012 Karnataka India

ISBN: (纸本)9781479910212

Precise pointer analysis is a problem of interest to both the compiler and the program verification community. Flow-sensitivity is an important dimension of pointer analysis that affects the precision of the final result computed. Scaling flow-sensitive pointer analysis to millions of lines of code is a major challenge. Recently, staged flow-sensitive pointer analysis has been proposed, which exploits a sparse representation of program code created by staged analysis. In this paper we formulate the staged flow-sensitive pointer analysis as a graph-rewriting problem. Graph-rewriting has already been used for flow-insensitive analysis. However, formulating flow-sensitive pointer analysis as a graph-rewriting problem adds additional challenges due to the nature of flow-sensitivity. We implement our parallel algorithm using Intel Threading Building Blocks and demonstrate considerable scaling (upto 2.6x) for 8 threads on a set of 10 benchmarks. Compared to the sequential implementation of staged flow-sensitive analysis, a single threaded execution of our implementation performs better in 8 of the benchmarks.

关键词： Flow-sensitive Pointer Analysis Staged Flow Sensitive Pointer Analysis amorphous data-parallelism Graph-Rewriting

来源：评论

学校读者我要写书评

暂无评论

Betweenness Centrality: Algorithms and Implementations 13

Betweenness Centrality: Algorithms and Implementations

引用

18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

作者： Prountzos, Dimitrios Pingali, Keshav Univ Texas Austin Austin TX 78712 USA

ISBN: (纸本)9781450319225

Betweenness centrality is an important metric in the study of social networks, and several algorithms for computing this metric exist in the literature. This paper makes three contributions. First, we show that the problem of computing betweenness centrality can be formulated abstractly in terms of a small set of operators that update the graph. Second, we show that existing parallel algorithms for computing betweenness centrality can be viewed as implementations of different schedules for these operators, permitting all these algorithms to be formulated in a single framework. Third, we derive a new asynchronous parallel algorithm for betweenness centrality that (i) works seamlessly for both weighted and unweighted graphs, (ii) can be applied to large graphs, and (iii) is able to extract large amounts of parallelism. We implemented this algorithm and compared it against a number of publicly available implementations of previous algorithms on two different multicore architectures. Our results show that the new algorithm is the best performing one in most cases, particularly for large graphs and large thread counts, and is always competitive against other algorithms.

关键词： Algorithms Performance Concurrency parallelism amorphous data-parallelism Irregular Programs Optimistic Parallelization Betweenness Centrality

来源：评论

学校读者我要写书评

暂无评论

Parallel flow-sensitive pointer analysis by graph-rewriting 13

Parallel flow-sensitive pointer analysis by graph-rewriting

引用

Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

作者： Vaivaswatha Nagaraj R. Govindarajan Indian Institute of Science Bangalore India

ISBN: (纸本)9781479910212

Precise pointer analysis is a problem of interest to both the compiler and the program verification community. Flow-sensitivity is an important dimension of pointer analysis that affects the precision of the final result computed. Scaling flow-sensitive pointer analysis to millions of lines of code is a major challenge. Recently, staged flow-sensitive pointer analysis has been proposed, which exploits a sparse representation of program code created by staged analysis. In this paper we formulate the staged flow-sensitive pointer analysis as a graph-rewriting problem. Graph-rewriting has already been used for flow-insensitive analysis. However, formulating flow-sensitive pointer analysis as a graph-rewriting problem adds additional challenges due to the nature of *** implement our parallel algorithm using Intel Threading Building Blocks and demonstrate considerable scaling (upto 2.6x) for 8 threads on a set of 10 benchmarks. Compared to the sequential implementation of staged flow-sensitive analysis, a single threaded execution of our implementation performs better in 8 of the benchmarks.

关键词： staged flow sensitive pointer analysis flow-sensitive pointer analysis graph-rewriting amorphous data-parallelism

来源：评论

学校读者我要写书评

暂无评论

Elixir: A System for Synthesizing Concurrent Graph Programs 12

Elixir: A System for Synthesizing Concurrent Graph Programs

引用

ACM International Conference on Object Oriented Programming Systems Languages and Applications

作者： Prountzos, Dimitrios Manevich, Roman Pingali, Keshav Univ Texas Austin Austin TX 78712 USA

ISBN: (纸本)9781450315616

Algorithms in new application areas like machine learning and network analysis use "irregular" data structures such as graphs, trees and sets. Writing efficient parallel code in these problem domains is very challenging because it requires the programmer to make many choices: a given problem can usually be solved by several algorithms, each algorithm may have many implementations, and the best choice of algorithm and implementation can depend not only on the characteristics of the parallel platform but also on properties of the input data such as the structure of the graph. One solution is to permit the application programmer to experiment with different algorithms and implementations without writing every variant from scratch. Auto-tuning to find the best variant is a more ambitious solution. These solutions require a system for automatically producing efficient parallel implementations from high-level specifications. Elixir, the system described in this paper, is the first step towards this ambitious goal. Application programmers write specifications that consist of an operator, which describes the computations to be performed, and a schedule for performing these computations. Elixir uses sophisticated inference techniques to produce efficient parallel code from such specifications. We used Elixir to automatically generate many parallel implementations for three irregular problems: breadth-first search, single source shortest path, and betweenness-centrality computation. Our experiments show that the best generated variants can be competitive with handwritten code for these problems from other research groups;for some inputs, they even outperform the handwritten versions.

关键词： Algorithms Languages Performance Verification Synthesis Compiler Optimization Concurrency parallelism amorphous data-parallelism Irregular Programs Optimistic Parallelization

来源：评论

学校读者我要写书评

暂无评论

Processor Allocation for Optimistic Parallelization of Irregular Programs 1

引用

12th International Conference on Computational Science and Its Applications (ICCSA)

作者： Versaci, Francesco Pingali, Keshav Univ Padua TU Wien I-35100AOGJ Padua Italy Univ Texas Austin Austin TX USA

ISBN: (数字)9783642311253

ISBN: (纸本)9783642311246;9783642311253

Optimistic parallelization is a promising approach for the parallelization of irregular algorithms: potentially interfering tasks are launched dynamically, and the runtime system detects conflicts between concurrent activities, aborting and rolling back conflicting tasks. However, parallelism in irregular algorithms is very complex. In a regular algorithm like dense matrix multiplication, the amount of parallelism can usually be expressed as a function of the problem size, so it is reasonably straightforward to determine how many processors should be allocated to execute a regular algorithm of a certain size (this is called the processor allocation problem). In contrast, parallelism in irregular algorithms can be a function of input parameters, and the amount of parallelism can vary dramatically during the execution of the irregular algorithm. Therefore, the processor allocation problem for irregular algorithms is very difficult. In this paper, we describe the first systematic strategy for addressing this problem. Our approach is based on a construct called the conflict graph, which (i) provides insight into the amount of parallelism that can be extracted from an irregular algorithm, and (ii) can be used to address the processor allocation problem for irregular algorithms. We show that this problem is related to a generalization of the unfriendly seating problem and, by extending Turan's theorem, we obtain a worst-case class of problems for optimistic parallelization, which we use to derive a lower bound on the exploitable parallelism. Finally, using some theoretically derived properties and some experimental facts, we design a quick and stable control strategy for solving the processor allocation problem heuristically.

关键词： Irregular algorithms Optimistic parallelization Automatic parallelization amorphous data-parallelism Processor allocation Unfriendly seating Turan's theorem

来源：评论

学校读者我要写书评

暂无评论

A Shape Analysis for Optimizing Parallel Graph Programs 11

A Shape Analysis for Optimizing Parallel Graph Programs

引用

38th Symposium on Principles of Programming Languages

作者： Prountzos, Dimitrios Manevich, Roman Pingali, Keshav McKinley, Kathryn S. Univ Texas Austin Dept Comp Sci Austin TX 78712 USA

ISBN: (纸本)9781450304900

Computations on unstructured graphs are challenging to parallelize because dependences in the underlying algorithms are usually complex functions of runtime data values, thwarting static parallelization. One promising general-purpose parallelization strategy for these algorithms is optimistic parallelization. This paper identifies the optimization of optimistically parallelized graph programs a:: a new application area, and develops the first shape analysis for addressing this problem. Our shape analysis identifies failsafe points in the program after which the execution is guaranteed not to abort and backup copies of modified data are not needed;additionally, the analysis can be used to eliminate redundant conflict checking. It uses two key ideas: a novel top-down heap abstraction that controls state space explosion, and a strategy for predicate discovery that exploits common patterns of data structure usage. We implemented the shape analysis in TVLA, and used it to optimize benchmarks from the Lonestar suite. The optimized programs were executed on the Galois system. The analysis was successful in eliminating all costs related to rollback logging for our benchmarks. Additionally, it reduced the number of lock acquisitions by a factor ranging from 10x to 50x, depending on the application and the number of threads. These optimizations were effective in reducing the running times of the benchmarks by factors of 2x to 12 x

关键词： Abstract Interpretation Compiler Optimization Concurrency parallelism Shape Analysis Static Analysis amorphous data-parallelism Irregular Programs Optimistic Parallelization Synchronization Overheads Cautious Operators

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：