Increasing the delivered performance of computers by running programs in parallel is an old idea with a new urgency. Multi cores (multi processors) on chips have emerged as a way to increase performance wherever chips...
详细信息
ISBN:
(纸本)9781595937957
Increasing the delivered performance of computers by running programs in parallel is an old idea with a new urgency. Multi cores (multi processors) on chips have emerged as a way to increase performance wherever chips are used. the talk will focus on the role programming languages and compilers must play in delivering parallel performance to users and applications. the speaker's personal experiences with languages and compilers for high performance systems will provide the basis for her observations. the talk is intended to encourage the exploration of new approaches.
the polyhedral model is a well developed formalism and has been extensively used in a variety of contexts viz. the automatic parallelization of loop programs, program verification, locality, hardware generationand mor...
详细信息
ISBN:
(纸本)9781595936028
the polyhedral model is a well developed formalism and has been extensively used in a variety of contexts viz. the automatic parallelization of loop programs, program verification, locality, hardware generationand more recently, in the automatic reduction of asymptotic program complexity. Such analyses and transformations rely on certain closure properties. However, the model is limited in expressivity and the need for a more general class of programs is widely *** provide the extension to ⁰-polyhedra which are the intersection of polyhedra and lattices. We prove the required closure properties using a novel representation and interpretation of ⁰-polyhedra. In addition, we also prove closure in the ⁰-polyhedral model under images by dependence functions---thereby proving that unions of LBLs, widely assumedto be a richer class of sets, is equal to unions of ⁰-polyhedra. Another corollary of this result is the equivalence of the unions of ⁰-polyhedraand Presburger sets. Our representation and closure properties constitute the foundations of the ⁰-polyhedral model. As an example, we presentthe transformation for automatic reduction of complexity in the ⁰-polyhedral model.
In 2002, we first brought High Performance Computing (HPC) methods to the college classroom as a way to enrich Computational Science education. through the years, we have continued to facilitate college faculty in sci...
详细信息
ISBN:
(纸本)1595931899
In 2002, we first brought High Performance Computing (HPC) methods to the college classroom as a way to enrich Computational Science education. through the years, we have continued to facilitate college faculty in science, technology, engineering, and mathematics (STEM) disciplines to stay current with HPC methodologies. We have accomplished this by designing and delivering faculty workshops, hosted in a variety of lab settings, as well as by developing tools supporting the technical infrastructure necessary for HPC education, all this without requiring access to traditional HPC computing platforms. In all, we have so far presented 16 professional development workshops for close to 400 predominantly undergraduate STEM faculty. this paper presents the result of internal formative evaluation by workshop instructors and the materials and tools developed during that process. We did this work as part of the National Computational Science Institute (NCSI) and in collaboration withthe following groups: the Minority Serving Institutions - High Performance Computing (MSI-HPC) program of the National Computational Science Alliance the Consortium for Computing Sciences in Colleges (CCSC) the Center for Excellence in High Performance Computing the Oklahoma University Supercomputing symposium series the Super Computing (SC) conference series education program We presented learners with a sequence of interactive, "run it, modify it, build it" open-ended lab exercises drawn from a variety of disciplines. Interactivity means having the ability to change parallel and algorithmic parameters, e.g. running software on more than one machine, using different models, refining the model, changing the problem scale, using different parallel algorithms. there is a lack of scientific parallel curricula suitable for illustrating Computational Science principles in the classroom. We addressed this need by locating, and where necessary creating, suitable open source software, data-sets, and cur
In programming high performance applications, shared address-space platforms are preferable for fine-grained computation, while distributed address-space platforms are more suitable for coarse-grained computation. How...
详细信息
ISBN:
(纸本)9781581135886
In programming high performance applications, shared address-space platforms are preferable for fine-grained computation, while distributed address-space platforms are more suitable for coarse-grained computation. However, currently only distributed address-space systems scale beyond the low hundreds of processors. In this paper we introduce a hybrid architecture that allows users to trade off local memory usage for coherence communication, making possible larger-scale shared memory architectures. We introduce a programming model and examine possible implementations of hardware mechanisms, evaluating some of the trade-offs inherent in each. Preliminary experiments on an application with particularly fine-grained communication requirements indicate that effective placement of directives can reduce coherence communication by more than a factor of 10 for 64 processors.
ARMI is a communication library that provides a framework for expressing fine-grain parallelism and mapping it to a particular machine using shared-memory and message passing library calls. the library is an advanced ...
详细信息
ISBN:
(纸本)9781581135886
ARMI is a communication library that provides a framework for expressing fine-grain parallelism and mapping it to a particular machine using shared-memory and message passing library calls. the library is an advanced implementation of the RMI protocol and handles low-level details such as scheduling incoming communication and aggregating outgoing communication to coarsen parallelism when necessary. these details can be tuned for different platforms to allow user codes to achieve the highest performance possible without manual modification. ARMI is used by STAPL, our generic parallel library, to provide a portable, user transparent communication layer, We present the basic design as well as the mechanisms used in the current Pthreads/OpenMP, MPI implementations and/or a combination thereof. Performance comparisons between ARMI and explicit use of Pthreads or MPI are given on a variety of machines, including an HP V2200, SGI Origin 3800, IBM Regatta-HPC and IBM RS6000 SP cluster.
Collecting a program's execution profile is important for many reasons: code optimization, memory layout, program debugging and program comprehension. Path based execution profiles are more detailed than count bas...
详细信息
ISBN:
(纸本)9781581135886
Collecting a program's execution profile is important for many reasons: code optimization, memory layout, program debugging and program comprehension. Path based execution profiles are more detailed than count based execution profiles, since they present the order of execution of the various blocks in a program: modules, procedures, basic blocks etc. Recently, online string compression techniques have been employed for collecting compact representations of sequential program executions. In this paper, we show how a similar approach can be taken for shared memory parallel programs. Our compaction scheme yields one to two orders of magnitude compression compared to the uncompressed parallel program trace on some of the SPLASH benchmarks. Our compressed execution traces contain detailed information about synchronization and control/data flow which can be exploited for post-mortem analysis. In particular, information in our compact execution traces are useful for accurate data race detection (detecting unsynchronized shared variable accesses that occurred in the execution).
A design pattern is a mechanism for encapsulating the knowledge of experienced designers into a re-usable artifact. parallel design patterns reflect commonly occurring parallel communication and synchronization struct...
详细信息
ISBN:
(纸本)9781581135886
A design pattern is a mechanism for encapsulating the knowledge of experienced designers into a re-usable artifact. parallel design patterns reflect commonly occurring parallel communication and synchronization structures. Our tools, CO2P3S (Correct Object-Oriented Pattern-based parallelprogramming System) and MetaCO(2)P(3)S, use generative design patterns. A programmer selects the parallel design patterns that are appropriate for an application, and then adapts the patterns for that specific application by selecting from a small set of code-configuration options. CO2P3S then generates a custom framework for the application that includes all of the structural code necessary for the application to ran in parallel. the programmer is only required to write simple code that launches the application and to fill in some application-specific sequential hook routines. We use generative design patterns to take an application specification (parallel design patterns + sequential user code) and use it to generate parallel application code that achieves good performance in shared memory and distributed memory environments. Although our implementations are for Java, the approach we describe is tool and language independent. this paper describes generalizing CO2P3S to generate distributed-memory parallel solutions.
暂无评论