In the era of future embedded systems the designer is confronted with multi-processor systems both for performance and energy reasons. Exploiting (sub)task-levelparallelism is becoming crucial because the instruction...
详细信息
ISBN:
(纸本)0769521517
In the era of future embedded systems the designer is confronted with multi-processor systems both for performance and energy reasons. Exploiting (sub)task-levelparallelism is becoming crucial because the instruction-levelparallelism alone is insufficient. the challenge is to build compiler tools that support the exploration of the task-levelparallelism in the programs. To achieve this goal, we have designed an analysis framework to evaluate the potential parallelism from sequential object-oriented programs. parallel-performance and data-access analysis are the crucial techniques for estimation of the transformation effects. We have implemented support for platform-independent data-access analysis and profiling of Java programs, which is an extension to our earlier parallel-performance analysis framework. the toolkit comprises automated design-time analysis for performance and data-access characterisation, program instrumentation, program-profiling support and post-processing analysis. We demonstrate the usability of our approach on a number of realistic Java applications.
the pipeline is a simple and intuitive structure to speed up many problems. Novice parallel programmers are usually taught this structure early on. However, expert parallel programmers typically eschew using the pipel...
详细信息
ISBN:
(纸本)0769521517
the pipeline is a simple and intuitive structure to speed up many problems. Novice parallel programmers are usually taught this structure early on. However, expert parallel programmers typically eschew using the pipeline in coarse-grained applications because it has three serious problems that make it difficult to implement efficiently. First, processors are idle when the pipeline is not full. Second, load balancing is crucial to obtaining good speedup. third, it is difficult to incrementally incorporate more processors into an existing pipeline. Instead, experts recast the problem as a master/slave structure which does not suffer from these problems. this paper details a transformation that allows programs written in a pipeline style to execute using the master/slave structure. parallel programmers can benefit from boththe intuitive simplicity of the pipeline and the efficient execution of a master/slave structure. this is demonstrated by performance results from two applications.
We present a case study of the alternatives and design trade-offs encountered when adapting an established numerical library into a form compatible with modern component-software implementation practices. Our study wi...
详细信息
ISBN:
(纸本)0769521517
We present a case study of the alternatives and design trade-offs encountered when adapting an established numerical library into a form compatible with modern component-software implementation practices. Our study will help scientific software users, authors, and maintainers develop their own roadmaps for shifting to component-oriented software. the primary library studied, LSODE, and the issues involved in the adaptation are typical of many commonly used numerical libraries. We examine the adaptation of a related library, CVODE, and compare the impact on applications of the two different designs. the LSODE-derived components solve models composed with CCA components developed independently at the Argonne and Oak Ridge National Laboratories. the resulting applications run in the Ccaffeine framework implementation of the Commmon Component Architecture specification. We provide CCA-style interface specifications appropriate to linear equations, ordinary differential equations (ODE), and differential algebraic equations (DAE) solvers.
Dynamic parallel Schedules (DPS) is a high-level framework for developing parallel applications on distributed memory computers (e.g. clusters of PCs). Its model relies on compositional customizable split-compute-merg...
详细信息
ISBN:
(纸本)076951880X
Dynamic parallel Schedules (DPS) is a high-level framework for developing parallel applications on distributed memory computers (e.g. clusters of PCs). Its model relies on compositional customizable split-compute-merge graphs of operations (directed acyclic flow graphs). the graphs and the mapping of operations to processing nodes are specified dynamically at runtime. DPS applications are pipelined and multithreaded by construction, ensuring a maximal overlap of computations and communications. DPS applications can call parallel services exposed by other DPS applications, enabling the creation of reusable parallel components. the DPS framework relies on a C++ class library. thanks to its dynamic nature, DPS offers new perspectives for the creation and deployment of parallel applications running on server clusters.
In previous work we have introduced JavaSymphony, a system whose purpose is to simplify the development of distributed and parallel Java applications. JavaSymphony is a Java library that allows to control parallelism,...
详细信息
ISBN:
(纸本)076951880X
In previous work we have introduced JavaSymphony, a system whose purpose is to simplify the development of distributed and parallel Java applications. JavaSymphony is a Java library that allows to control parallelism, load balancing, and locality at a highlevel. Objects can be explicitly distributed and migrated within virtual architectures, which impose a virtual hierarchy on a distributed system of physical computing nodes. In this paper we present the design of the JavaSymphony Runtime System and the JavaSymphony Shell. Moreover we discuss details about an agent-based implementation of the JavaSymphony Runtime System which comprises the Network Agent, Object Agent, and Event Agent. We present a detailed comparison of the functionality provided by JavaSymphony with several related systems.
this article describes and compares two parallel implementations of Branch-and-Bound skeletons. Using the C++ programming language, the user has to specify the type of the problem, the type of the solution and the spe...
详细信息
ISBN:
(纸本)076951880X
this article describes and compares two parallel implementations of Branch-and-Bound skeletons. Using the C++ programming language, the user has to specify the type of the problem, the type of the solution and the specific characteristics of the Branch-and-Bound technique. this information is combined withthe provided resolution skeletons to obtain a distributed and a shared parallel programs. MPI has been used to develop the Message Passing algorithm and for the Shared Memory one OpenMP has been chosen. Computational results for the 0/1 Knapsack Problem on a Sunfire 6800 SMP, a Origin 3000 and a PCs cluster are presented.
parallel Skeletons have been proposed as a possible programming model for parallel architectures. One of the problems withthis approach is the choice of the skeleton which is best suited to the characteristics of the...
详细信息
ISBN:
(纸本)076951880X
parallel Skeletons have been proposed as a possible programming model for parallel architectures. One of the problems withthis approach is the choice of the skeleton which is best suited to the characteristics of the algorithm program to be developed/parallelized, and of the target architecture, in terms of performance of the parallel implementation. Another problem arising withparallelization of legacy codes is the attempt to minimize the effort needed for program comprehension, and thus to achieve the minimum restructuring of the sequential code when producing the parallel version. In this paper we propose automated Program Comprehension at the algorithmic level as a driving feature in the task of selection of the proper parallel Skeleton, best suited to the characteristics of the algorithm/program and of the target architecture. Algorithmic concept recognition can automate or support the generation of parallel code through instantiation of the selected parallel Skeleton(s) with template based transformations of recognized code segments.
the proceedings contain 9 papers. the topics discussed include: supporting peer-2-peer interactions in the consumer grid;DPS - dynamic parallel schedules;ParoC++: a requirement-driven parallel object-oriented programm...
ISBN:
(纸本)076951880X
the proceedings contain 9 papers. the topics discussed include: supporting peer-2-peer interactions in the consumer grid;DPS - dynamic parallel schedules;ParoC++: a requirement-driven parallel object-oriented programming language;on the implementation of JavaSymphony;compiler and runtime support for running OpenMP programs on Pentium- and Itanium-architectures;SMP-aware message passing programming;a comparison between MPI and OpenMP branch-and-bound skeletons;a comparison between MPI and OpenMP branch-and-bound skeletons;and algorithmic concept recognition support for skeleton based parallelprogramming.
暂无评论