In this paper, we describe a parallel Branch-and-Bound (B&B) algorithm with a history-based domination technique, and we apply it to the Sequential Ordering Problem (SOP). To the best of our knowledge, the propose...
详细信息
ISBN:
(纸本)9781450392044
In this paper, we describe a parallel Branch-and-Bound (B&B) algorithm with a history-based domination technique, and we apply it to the Sequential Ordering Problem (SOP). To the best of our knowledge, the proposed algorithm is the first parallel B&B algorithm that includes a history-based domination technique and is the first parallel B&B algorithm for solving the SOP using a pure B&B approach. the proposed algorithm takes a pool-based approach and employs a collection of novel techniques that we have developed to achieve effective parallel exploration of the solution space, including parallel history domination, history table memory management, and a thread restart technique. the proposed algorithm was experimentally evaluated using the SOPLIB and TSPLIB benchmarks. the results show that using ten threads with a time limit of one hour on the medium-difficulty instances, the proposed algorithm gives a geometric-mean speedup of 19.9 on SOPLIB and 10.23 on TSPLIB, with super-linear speedups up to 65x seen on 17 instances.
the increasing demand in HPC to utilize accelerators has motivated the development of pragma-based directives to target these devices. OmpSs-2 and OpenACC are both directive-based solutions that allow application prog...
详细信息
ISBN:
(纸本)9781450392044
the increasing demand in HPC to utilize accelerators has motivated the development of pragma-based directives to target these devices. OmpSs-2 and OpenACC are both directive-based solutions that allow application programmers to utilize accelerators. the two leverage distinct types of parallelism: task parallelism and data parallelism, respectively. Non-trivial scientific applications can benefit from both types of available parallelism. However, the combination of pragma-based models is difficult to coordinate, as both assume full control and are unaware of each other at runtime. We propose an interoperation mechanism to enable novel composability across pragma-based programming models. We study and propose a clear separation of duties and implement our approach by augmenting the OmpSs-2 programming model, compiler and runtime to support OmpSs-2 + OpenACC programming.
programming languages using functions on collections of values, such as map, reduce, scan and filter, have been used for over fifty years. Such collections have proven to be particularly useful in the context of paral...
详细信息
ISBN:
(纸本)9781450392044
programming languages using functions on collections of values, such as map, reduce, scan and filter, have been used for over fifty years. Such collections have proven to be particularly useful in the context of parallelism because such functions are naturally parallel. However, if implemented naively they lead to the generation of temporary intermediate collections that can significantly increase memory usage and runtime. To avoid this pitfall, many approaches use "fusion" to combine operations and avoid temporary results. However, most of these approaches involve significant changes to a compiler and are limited to a small set of functions, such as maps and reduces. In this paper we present a library-based approach that fuses widely used operations such as scans, filters, and flattens. In conjunction with existing techniques, this covers most of the common operations on collections. Our approach is based on a novel technique which parallelizes over blocks, with streams within each block. We demonstrate the approach by implementing libraries targeting multicore parallelism in two languages: parallel ML and C++, which have very different semantics and compilers. To help users understand when to use the approach, we define a cost semantics that indicates when fusion occurs and how it reduces memory allocations. We present experimental results for a dozen benchmarks that demonstrate significant reductions in both time and space. In most cases the approach generates code that is near optimal for the machines it is running on.
We present a simple yet effective technique for improving performance of lock-based code using the hardware lock elision (HLE) feature in Intel's upcoming Haswell processor. We also describe how to extend Haswell&...
详细信息
ISBN:
(纸本)9781450319225
We present a simple yet effective technique for improving performance of lock-based code using the hardware lock elision (HLE) feature in Intel's upcoming Haswell processor. We also describe how to extend Haswell's HLE mechanism to achieve a similar effect to our lock elision scheme entirely in hardware.
Performance analysis is widely used to identify performance issues of parallel applications. However, complex communications and data dependence, as well as the interactions between different kinds of performance issu...
详细信息
ISBN:
(纸本)9781450392044
Performance analysis is widely used to identify performance issues of parallel applications. However, complex communications and data dependence, as well as the interactions between different kinds of performance issues make high-efficiency performance analysis even harder. Although a large number of performance tools have been designed, accurately pinpointing root causes for such complex performance issues still needs specific in-depth analysis. To implement each such analysis, significant human efforts and domain knowledge are normally required. To reduce the burden of implementing accurate performance analysis, we propose a domain specific programming framework, named PERFLOW. PERFLOW abstracts the step-by-step process of performance analysis as a dataflow graph. this dataflow graph consists of main performance analysis sub-tasks, called passes, which can either be provided by PERFLOW'S built-in analysis library, or be implemented by developers to meet their requirements. Moreover, to achieve effective analysis, we propose a Program Abstraction Graph to represent the performance of a program execution and then leverage various graph algorithms to automate the analysis. We demonstrate the efficacy of PERFLOW by three case studies of real-world applications with up to 700K lines of code. Results show that PERFLOW significantly eases the implementation of customized analysis tasks. In addition, PERFLOW is able to perform analysis and locate performance bugs automatically and effectively.
the Problem-Based Benchmark Suite (PBBS) is a set of benchmark problems designed for comparing algorithms, implementations and platforms. For each problem, the suite defines the problem in terms of the input-output re...
详细信息
ISBN:
(纸本)9781450392044
the Problem-Based Benchmark Suite (PBBS) is a set of benchmark problems designed for comparing algorithms, implementations and platforms. For each problem, the suite defines the problem in terms of the input-output relationship, and supplies a set of input instances along with input generators, a default implementation, code for checking correctness or accuracy, and a timing harness. the suite makes it possible to compare different algorithms, platforms (e.g. CPU vs CPU), and implementations using different programming languages or libraries. the purpose is to better understand how well a wide variety of problems parallelize, and what techniques/algorithms are most effective. the suite was first announced in 2012 with 14 benchmark problems. Here we describe some significant updates. In particular, we have added nine new benchmarks from a mix of problems in text processing, computational geometry and machine learning. We have further optimized the default implementations;several are the fastest available for multicore CPUs, often achieving near perfect speedup on the 72 core machine we test them on. the suite now also supplies significantly larger default test instances, as well as a broader variety, with many derived from real-world data.
In this paper, we evaluate the performance and usability of the parallelprogramming model OpenMP Superscalar (OmpSs), apply it to 10 different benchmarks and compare its performance with corresponding POSIX threads i...
详细信息
ISBN:
(纸本)9781450311601
In this paper, we evaluate the performance and usability of the parallelprogramming model OpenMP Superscalar (OmpSs), apply it to 10 different benchmarks and compare its performance with corresponding POSIX threads implementations.
暂无评论