检索结果-内蒙古大学图书馆

Operator-level GPU-Accelerated Branch and Bound algorithms

Procedia Computer Science 2013年 18卷 280-289页

作者： I. Chakroun N. Melab Université Lille 1 LIFL/UMR CNRS 8022 - INRIA Lille Nord Europe 59655 - Villeneuve d’Ascq cedex - France

Branch-and-Bound (B&B) algorithms are well-known tree-based exploratory methods for solving to optimality NP-hard discrete optimization problems. The construction of the B&B tree and its exploration are performed using four operators: branching, bounding, selection and pruning. Such algorithms are irregular which makes challenging their parallel design and implementation on GPU accelerators. Among the few existing related works, we have recently revisited on GPU the bounding operator. The reported results show that speedups up to × 100 can be obtained on recent GPU cards. In this paper, we address the GPU-based design and implementation of B&B algorithms considering the branching and pruning operators as well as the bounding one. The proposed template transforms the unpredictable and irregular workload associated to the explored B&B tree into regular data-parallel kernels optimized for the SIMD-based execution model of GPUs. Thread divergence and uncoalesced memory accesses are considered in the optimization process. The proposed approach has been experimented on the Flow-Shop scheduling problem and compared to another GPU-based strategy and to a cluster of workstations (COWs) based approach. The reported results demonstrate the efficiency of the proposed approach over the two other ones. Speedups up to × 160 are obtained for large problem instances using an Nvidia Tesla C2050 hardware configuration.

关键词： irregular algorithms GPU Computing Parallel Branch and Bound Flow-Shop Scheduling Problem.

来源：评论

学校读者我要写书评

暂无评论

Processor Allocation for Optimistic Parallelization of irregular Programs 1

引用

12th International Conference on Computational Science and Its Applications (ICCSA)

作者： Versaci, Francesco Pingali, Keshav Univ Padua TU Wien I-35100AOGJ Padua Italy Univ Texas Austin Austin TX USA

ISBN: (数字)9783642311253

ISBN: (纸本)9783642311246;9783642311253

Optimistic parallelization is a promising approach for the parallelization of irregular algorithms: potentially interfering tasks are launched dynamically, and the runtime system detects conflicts between concurrent activities, aborting and rolling back conflicting tasks. However, parallelism in irregular algorithms is very complex. In a regular algorithm like dense matrix multiplication, the amount of parallelism can usually be expressed as a function of the problem size, so it is reasonably straightforward to determine how many processors should be allocated to execute a regular algorithm of a certain size (this is called the processor allocation problem). In contrast, parallelism in irregular algorithms can be a function of input parameters, and the amount of parallelism can vary dramatically during the execution of the irregular algorithm. Therefore, the processor allocation problem for irregular algorithms is very difficult. In this paper, we describe the first systematic strategy for addressing this problem. Our approach is based on a construct called the conflict graph, which (i) provides insight into the amount of parallelism that can be extracted from an irregular algorithm, and (ii) can be used to address the processor allocation problem for irregular algorithms. We show that this problem is related to a generalization of the unfriendly seating problem and, by extending Turan's theorem, we obtain a worst-case class of problems for optimistic parallelization, which we use to derive a lower bound on the exploitable parallelism. Finally, using some theoretically derived properties and some experimental facts, we design a quick and stable control strategy for solving the processor allocation problem heuristically.

关键词： irregular algorithms Optimistic parallelization Automatic parallelization Amorphous data-parallelism Processor allocation Unfriendly seating Turan's theorem

来源：评论

学校读者我要写书评

暂无评论

Brief Announcement: Processor Allocation for Optimistic Parallelization of irregular Programs 11

Brief Announcement: Processor Allocation for Optimistic Para...

引用

23rd Annual Symposium on Parallelism in algorithms and Architectures

作者： Versaci, Francesco Pingali, Keshav Univ Padua Dept Informat Engn I-35100 Padua Italy

ISBN: (纸本)9781450307437

关键词： irregular algorithms Optimistic parallelization Amorphous data-parallelism Processor allocation Turan's theorem

来源：评论

学校读者我要写书评

暂无评论

Leveraging Data-Structure Semantics for Efficient Algorithmic Parallelism 11

Leveraging Data-Structure Semantics for Efficient Algorithmi...

引用

8th ACM International Conference on Computing Frontiers (CF)

作者： Cledat, Romain Ravichandran, Kaushik Pande, Santosh Georgia Inst Technol Coll Comp Sch Comp Sci Atlanta GA 30332 USA

ISBN: (纸本)9781450306980

irregular or pointer-based structures such as graphs and trees are commonly used in algorithms dealing with sparse data. Given their reliance on pointers, these algorithms are difficult to analyze and the structure of their memory accesses is obfuscated which makes the extraction of parallelism difficult. In this work, we present a framework that is capable of reasoning about the semantics of the dynamic data footprints of operations to determine their potential overlap. We leverage the knowledge the programmer has about access patterns for the algorithm but is currently unable to express. This knowledge allows our runtime to make either a parallelization decision or throttle concurrency to improve performance in Software Transactional Memories (STMs) [6]. Our framework relies on programmer-supplied predicates that are appropriately evaluated at runtime and utilized to probabilistically assert certain properties about data footprints. We present simple abstractions and a low-overhead runtime to support our framework. We demonstrate our work by parallelizing a graph-coloring benchmark and by improving the transactional performance of benchmarks from the STAMP suite.

关键词： Software Transactional Memories irregular algorithms Access Pattern Semantics

来源：评论

学校读者我要写书评

暂无评论

Towards a Science of Parallel Programming 10

Towards a Science of Parallel Programming

引用

19th International Conference on Parallel Architectures and Compilation Techniques

作者： Pingali, Keshav Univ Texas Austin Dept Comp Sci Austin TX 78712 USA

How do we give parallel programming a more scientific foundation? In this talk, I will discuss the approach we are taking in the Galois project.

ISBN: (纸本)9781450301787

How do we give parallel programming a more scientific foundation? In this talk, I will discuss the approach we are taking in the Galois project.

关键词： irregular algorithms Graph computations Operator Formulation of algorithms Amorphous Data-parallelism Optimistic Parallelization Multicore processors

来源：评论

学校读者我要写书评

暂无评论

Scalable Communication Protocols for Dynamic Sparse Data Exchange

Scalable Communication Protocols for Dynamic Sparse Data Exc...

引用

15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

作者： Hoefler, Torsten Siebert, Christian Lumsdaine, Andrew Indiana Univ Open Syst Lab Bloomington IN 47405 USA

ISBN: (纸本)9781605587080

Many large-scale parallel programs follow a bulk synchronous parallel (BSP) structure with distinct computation and communication phases. Although the communication phase in such programs may involve all (or large numbers) of the participating processes, the actual communication operations are usually sparse in nature. As a result, communication phases are typically expressed explicitly using point-to-point communication operations or collective operations. We define the dynamic sparse data-exchange (DSDE) problem and derive bounds in the well known LogGP model. While current approaches work well with static applications, they run into limitations as modern applications grow in scale, and as the problems that are being solved become increasingly irregular and dynamic. To enable the compact and efficient expression of the communication phase, we develop suitable sparse communication protocols for irregular applications at large scale. We discuss different irregular applications and show the sparsity in the communication for real-world input data. We discuss the time and memory complexity of commonly used protocols for the DSDE problem and develop NBX-a novel fast algorithm with constant memory overhead for solving it. Algorithm NBX improves the runtime of a sparse data-exchange among 8,192 processors on BlueGene/P by a factor of 5.6. In an application study, we show improvements of up to a factor of 28.9 for a parallel breadth first search on 8,192 BlueGene/P processors.

关键词： Sparse data exchange irregular algorithms Alltoall Distributed termination Nonblocking collective operations

来源：评论

学校读者我要写书评

暂无评论

Task Pool Teams:: a hybrid programming environment for irregular algorithms on SMP clusters

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2006年第12期18卷 1575-1594页

作者： Hippold, Judith Ruenger, Gudula Tech Univ Chemnitz Dept Comp Sci D-09107 Chemnitz Germany

Clusters of symmetric multiprocessors (SMPs) are popular platforms for parallel programming since they provide large computational power for a reasonable price. For irregular application programs with dynamically changing computation and data access behavior, a flexible programming model is needed to achieve efficiency. In this paper we propose Task Pool Teams as, a hybrid parallel programming environment to realize irregular algorithms on clusters of SMPs. Task Pool Teams combine task pools on single cluster nodes by an explicit message passing layer. They offer load balance together with multi-threaded, asynchronous communication. Appropriate communication protocols and task pool implementations are provided and accessible by an easy-to-use application programmer interface. As application examples we present a branch and bound algorithm and the hierarchical radiosity algorithm. Copyright (c) 2006 John Wiley & Sons, Ltd.

关键词： irregular algorithms hybrid programming SMP clusters

来源：评论

学校读者我要写书评

暂无评论

Towards an adaptive task pool implementation

Towards an adaptive task pool implementation

引用

22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS 2008)

作者： Hofmann, M. Ruenger, G. Tech Univ Chemnitz Dept Comp Sci Chemnitz Germany

ISBN: (纸本)9781424416936

Task pools can be used to achieve the dynamic load balancing that is required for an efficient parallel implementation of irregular applications. However, the performance strongly depends on a task pool implementation that is well suited for the specific application. This paper introduces an adaptive task pool implementation that enables a step-wise transition between the common strategies of central and distributed task pools. The influence of the task size on the parallel performance is investigated and it is shown that the adaptive implementation provides the flexibility to adapt to different situations. Performance results from benchmark programs and from an irregular application for anomalous diffusion simulation are presented to demonstrate the need for an adaptive strategy. It is shown that profiling information about the overhead of the task pool implementation can be used to determine an optimal task pool strategy.

关键词： adaptive software irregular algorithms multithreading parallel computing profiling task pools

来源：评论

学校读者我要写书评

暂无评论

A comparison of task pools for dynamic load balancing of irregular algorithms

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2004年第1期16卷 1-47页

作者： Korch, M Rauber, T Univ Bayreuth Lehrstuhl Angew Informat 2 Fac Math Phys & Comp Sci D-95440 Bayreuth Germany

Since a static work distribution does not allow for satisfactory speed-ups of parallel irregular algorithms, there is a need for a dynamic distribution of work and data that can be adapted to the runtime behavior of the algorithm. Task pools are data structures which can distribute tasks dynamically to different processors where each task specifies computations to be performed and provides the data for these computations. This paper discusses the characteristics of task-based algorithms and describes the implementation of selected types of task pools for shared-memory multiprocessors. Several task pools have been implemented in C with POSIX threads and in Java. The task pools differ in the data structures to store the tasks, the mechanism to achieve load balance, and the memory manager used to store the tasks. Runtime experiments have been performed on three different shared-memory systems using a synthetic algorithm, the hierarchical radiosity method, and a volume rendering algorithm. Copyright (C) 2004 John Wiley Sons, Ltd.

关键词： task pools dynamic task scheduling irregular algorithms hierarchical radiosity volume rendering performance evaluation threads

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：