Two major trends are converging to reshape the landscape of concurrent object-oriented programming languages. First, trends in modern architectures (multi-core, accelerators, high performance clusters such as Blue Gen...
详细信息
ISBN:
(纸本)1595936025
Two major trends are converging to reshape the landscape of concurrent object-oriented programming languages. First, trends in modern architectures (multi-core, accelerators, high performance clusters such as Blue Gene) are making concurrency and distribution inescapable for large classes of OO programmers. Second, experience with first-generation concurrent OO languages (e.g. Java threads and synchronization) have revealed several drawbacks of unstructured threads with lock-based synchronization. X10 is a second generation OO language designed to address both programmer productivity and parallel performance for modern architectures. It extends sequential Java with a handful of constructs for concurrency and distribution. It introduces a clustered address space to deal with distribution. A computation is thought of as running at multiple places, with many simultaneous activities operating in each place. Objects and activities once created in a particular place stay confined to that place. However, a data-structure (object) allocated in one place may contain a reference to an object allocated in anoter place. (Thus X10 supports a partitioned global address space. X10 is an explicitly concurrent language. It provides constructs for lightweight asynchrony, making it easy for programmers to write code for target architectures that provide massive parallelism. It provides for recursive fork-join parallelism for structured concurrency. It provides for termination detection so that collections of activities may be reliably sequenced (even if they run across multiple places). It provides for a very simple form of atomic blocks in lieu of locks for mutual exclusion. These constructs can be used to define more sophisticated synchronization constructs such as futures and clocks. X10 supports a rich notion of multi-dimensional index spaces (regions), together with a rich set of operations on regions. Regions are first-class data-structures - they can be produced dynamically, st
Disk subsystem is known to be a major contributor to overall power consumption of high-end parallel systems. Past research proposed several architectural-level techniques to reduce disk power by taking advantage of id...
详细信息
Disk subsystem is known to be a major contributor to overall power consumption of high-end parallel systems. Past research proposed several architectural-level techniques to reduce disk power by taking advantage of idle periods experienced by disks. Although such techniques have been known to be effective in certain cases, they share a common drawback: They operate in a reactive manner, i.e., they control disk power by observing past disk activity (for example, idle and active periods) and estimating future ones. Consequently, they can miss opportunities for saving power and incur significant performance penalties due to inaccuracies in predicting idle and active times. Motivated by this observation, this paper proposes and evaluates a compiler-driven approach to reducing disk power consumption of array-based scientific applications executing on parallel architectures. The proposed approach exposes disk layout information to the compiler, allowing it to derive the disk access pattern, i.e., the order in which parallel disks are accessed. This paper demonstrates two uses of this information. First, we can implement proactive disk power management, i.e., we can select the most appropriate power-saving strategy and disk-preactivation strategy based on the compiler-predicted future idle and active periods of parallel disks. Second, we can restructure the application code to increase the length of idle disk periods, which leads to better exploitation of available power-saving capabilities. We implemented both these approaches within an optimizing compiler and tested their effectiveness using a set of benchmark codes from the Spec 2000 suite and a disk power simulator. Our results show that the compiler-driven disk power management is very promising. The experimental results also reveal that, although proactive disk power management is very effective, code restructuring for disk power achieves additional energy savings across all the benchmarks tested, and these savings a
The proceedings contains 20 papers from the Conference on Proceedings of the 2003 acm sigplan symposium on principles and practice of parallel programming (PPOPP'03). The topics discussed include: using thread-lev...
详细信息
The proceedings contains 20 papers from the Conference on Proceedings of the 2003 acm sigplan symposium on principles and practice of parallel programming (PPOPP'03). The topics discussed include: using thread-level speculation to simplify manual parallelization;toward efficient and robust software speculative parallelization on multiprocessors;improving server software support for simultaneous multithreaded processors;programming the FlexRAM parallel intelligent memory system and automated application-level checkpointing of MPI programs.
The polyhedral model is a well developed formalism and has been extensively used in a variety of contexts viz. the automatic parallelization of loop programs, program verification, locality, hardware generationand mor...
详细信息
ISBN:
(纸本)9781595936028
The polyhedral model is a well developed formalism and has been extensively used in a variety of contexts viz. the automatic parallelization of loop programs, program verification, locality, hardware generationand more recently, in the automatic reduction of asymptotic program complexity. Such analyses and transformations rely on certain closure properties. However, the model is limited in expressivity and the need for a more general class of programs is widely *** provide the extension to ⁰-polyhedra which are the intersection of polyhedra and lattices. We prove the required closure properties using a novel representation and interpretation of ⁰-polyhedra. In addition, we also prove closure in the ⁰-polyhedral model under images by dependence functions---thereby proving that unions of LBLs, widely assumedto be a richer class of sets, is equal to unions of ⁰-polyhedra. Another corollary of this result is the equivalence of the unions of ⁰-polyhedraand Presburger sets. Our representation and closure properties constitute the foundations of the ⁰-polyhedral model. As an example, we presentthe transformation for automatic reduction of complexity in the ⁰-polyhedral model.
The proceedings contain 28 records. The topics discussed include: compiler techniques for high performance sequentially consistent Java programs;effective communication coalescing for data-parallel applications;a line...
详细信息
The proceedings contain 28 records. The topics discussed include: compiler techniques for high performance sequentially consistent Java programs;effective communication coalescing for data-parallel applications;a linear-time algorithm for optimal barrier placement;composable memory transactions;static analysis of atomicity for programs with non-blocking synchronization;revocable locks for non-blocking programming;automated type-based analysis of data races and atomicity;scaling model checking of dataraces using dynamic information;a novel approach for partitioning iteration spaces with variable densities;applications of synchronization coverage;and fault tolerant high performance computing by a coding approach.
When a Search Engine responds to your query, thousands of machines from around the world have cooperated to produce your result. With a global reach of hundreds-of-millions of users, Search Engines are arguably the mo...
详细信息
ISBN:
(纸本)1595931899
When a Search Engine responds to your query, thousands of machines from around the world have cooperated to produce your result. With a global reach of hundreds-of-millions of users, Search Engines are arguably the most commonly used massively-parallel computing systems on the planet. In this talk, we examine Web Search Engines as a case study of parallelprogramming in a practical context. We focus primarily on the practice of parallelprogramming, reviewing many ways in which parallelprogramming is used in a modern Search Engine. We also discuss briefly the principles of parallelprogramming, listing some of the principles that guide our use of parallelism and speculating a bit on how the mechanics of parallelism might better be automated in our context.
Tiling has proven to be an effective mechanism to develop high performance implementations of algorithms. Tiling can be used to organize computations so that communication costs in parallel programs are reduced and lo...
详细信息
This work presents a general methodology for estimating the performance of an HPC workload when running on a future hardware architecture. Further, it demonstrates the methodology by estimating the performance of a si...
详细信息
ISBN:
(纸本)1595931899
This work presents a general methodology for estimating the performance of an HPC workload when running on a future hardware architecture. Further, it demonstrates the methodology by estimating the performance of a significant scientific application - the Gyrokinetic Toroidal Code (GTC) - when executing on Sun's proposed next-generation petascale computer architecture. For GTC, we identify the important phases of the iteration and perform low-level analysis that includes instruction tracing and component simulations of processor and memory systems. Low-level analysis is complemented with scalability estimates based on modeling MPI, OpenMP and I/O activity in the code. The work's approach permits accurate end-to-end performance projections from the microarchitecture level to the petascale.
As part of the DARPA program for High Productivity Computing Systems, the programming Language Research Group at Sun Microsystems Laboratories is developing Fortress, a language intended to support large-scale scienti...
详细信息
ISBN:
(纸本)1595931899
As part of the DARPA program for High Productivity Computing Systems, the programming Language Research Group at Sun Microsystems Laboratories is developing Fortress, a language intended to support large-scale scientific computation with the same level of portability that the Java programming language provided for multithreaded commercial applications. One of the design principles of Fortress is that parallelism be encouraged everywhere;for example, it is intentionally just a little bit harder to write a sequential loop than a parallel loop. Another is to have rich mechanisms for encapsulation and abstraction;the idea is to have a fairly complicated language for library writers that enables them to write libraries that present a relatively simple set of interfaces to the application programmer. Thus Fortress is as much a framework for language developers as it is a language for coding scientific applications. We will discuss ideas for using a rich parameterized polymorphic type system to organize multithreading and data distribution on large parallel machines. The net result is similar in some ways to data distribution facilities in other languages such as HPF and Chapel, but more open-ended, because in Fortress the facilities are defined by user-replaceable and -extendable libraries rather than wired into the compiler. A sufficiently rich type system can take the place of certain kinds of flow analysis to guide certain kinds of code selection and optimization, again moving policymaking out of the compiler and into libraries coded in the Fortress source language.
暂无评论