Increasing the delivered performance of computers by running programs in parallel is an old idea with a new urgency. Multi cores (multi processors) on chips have emerged as a way to increase performance wherever chips...
详细信息
ISBN:
(纸本)9781595937957
Increasing the delivered performance of computers by running programs in parallel is an old idea with a new urgency. Multi cores (multi processors) on chips have emerged as a way to increase performance wherever chips are used. The talk will focus on the role programming languages and compilers must play in delivering parallel performance to users and applications. The speaker's personal experiences with languages and compilers for high performance systems will provide the basis for her observations. The talk is intended to encourage the exploration of new approaches.
The proceedings contain 27 papers. The topics discussed include: parallelprogramming and code selection in fortress;collective communication on architectures that support simultaneous communication over multiple link...
详细信息
ISBN:
(纸本)1595931899
The proceedings contain 27 papers. The topics discussed include: parallelprogramming and code selection in fortress;collective communication on architectures that support simultaneous communication over multiple links;performance evaluation of adaptive MPI;mobile MPI programs in computational grid;RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits;programming for parallelism and locality with hierarchically tiled arrays;parallelprogramming in modern web search engines;on-line automated performance diagnosis on thousands of processes;a case study in top-down performance estimation for a large-scale parallel application;adaptive scheduling with parallelism feedback;predicting bounds on queuing delay for batch-scheduled parallel machines;optimizing irregular shared-memory applications for distributed memory systems;and scalable synchronous queues.
Transactions are a simple and powerful mechanism for establishing fault-tolerance. To allow multiple processes to cooperate in a transaction we relax the isolation property and use message passing for communication. W...
详细信息
ISBN:
(纸本)9781595936028
Transactions are a simple and powerful mechanism for establishing fault-tolerance. To allow multiple processes to cooperate in a transaction we relax the isolation property and use message passing for communication. We call the new abstraction a speculation.
Our goal in this work was to identify and quantify the overheads of tracing parallel applications. We investigate several different sources of overhead related to tracing: trace instrumentation, periodic writing of tr...
详细信息
ISBN:
(纸本)9781595936028
Our goal in this work was to identify and quantify the overheads of tracing parallel applications. We investigate several different sources of overhead related to tracing: trace instrumentation, periodic writing of trace files to disk, differing trace buffer sizes, system changes, and increasing numbers of processors in the target application. We encountered overheads as large as 26.7% for writing the trace file to disk. We found that buffer sizes can make a difference in the overheads, and that differences in system software can also contribute to the level of the perturbation. Our results show that the overhead of instrumentation correlates strongly with the number of events, while the overhead of writing the trace buffer increases with increasing numbers of processors.
Implicit parallelism with Ordered Transactions (IPOT) is all extension of sequential or explicitly parallelprogramming models to support speculative parallelization. The key idea is to specify opportunities for paral...
详细信息
ISBN:
(纸本)9781595936028
Implicit parallelism with Ordered Transactions (IPOT) is all extension of sequential or explicitly parallelprogramming models to support speculative parallelization. The key idea is to specify opportunities for parallelization in a sequential program using annotations similar to transactions. Unlike explicit parallelism, IPOT annotations do not require the absence of data dependence, since the parallelization relies oil runtime support for speculative execution. IPOT as a parallelprogramming model is determinate, i.e., program semantics are independent of the thread scheduling. For optimization, non-determinism can be introduced selectively. We describe the programming model of IPOT and an online tool that recommends boundaries of ordered transactions by observing a sequential execution. On three example HPC workloads we demonstrate that Our method is effective in identifying opportunities for fine-grain parallelization. Using the automated task recommendation tool, we were able to perform the parallelization of each program within a few hours.
Increasing system and algorithmic complexity combined with a growing number of tunable application parameters pose significant challenges for analytical performance modeling. We propose a series of robust techniques t...
详细信息
ISBN:
(纸本)9781595936028
Increasing system and algorithmic complexity combined with a growing number of tunable application parameters pose significant challenges for analytical performance modeling. We propose a series of robust techniques to address these challenges. In particular, we apply statistical techniques such as clustering, association, and correlation analysis, to understand the application parameter space better. We construct and compare two classes of effective predictive models: piecewise polynomial regression and artifical neural networks. We compare these techniques with theoretical analyses and experimental results. Overall, both regression and neural networks are accurate with median error rates ranging from 2.2 to 10.5 percent. The comparable accuracy of these models suggest differentiating features will arise from ease of use, transparency, and computational efficiency.
This full-day tutorial will teach the attendees about Cluster OpenMP and the tools that are available to assist the programmer in debugging and tuning. Cluster OpenMP is an Intel programming system that allows the use...
详细信息
ISBN:
(纸本)1595936025
This full-day tutorial will teach the attendees about Cluster OpenMP and the tools that are available to assist the programmer in debugging and tuning. Cluster OpenMP is an Intel programming system that allows the user to run an OpenMP program on a cluster of computers without a common hardware shared memory. The tutorial will consist of a short tutorial on OpenMP, a longer description of Cluster OpenMP, its concepts, mechanisms and tools, a set of short hands-on porting exercises for the participants, and a set of exercises with the Cluster OpenMP debugging and tuning tools.
暂无评论