Programmers and users of compute intensive scientific applications often do not want to (or even cannot) code load balancing and fault tolerance into their programs. the PBEAM system [18] uses a global virtual name sp...
详细信息
ISBN:
(纸本)3540643591
Programmers and users of compute intensive scientific applications often do not want to (or even cannot) code load balancing and fault tolerance into their programs. the PBEAM system [18] uses a global virtual name space to provide migration and rollback transparency in user space for distributed groups of processes on workstations. the system calls are interposed and their parameters translated be tween the name spaces. Unlike other migration mechanisms, PBEAM does not require the applications to be written for a specific programming model or communication library. In this paper we describe design and implementation of a separate system call interposition process [3] that accesses the application via the debugging interface. the main advantage of this approach is that it can handle even unmodified (e. g. commercially bought) application programs. We compare measured performance figures with previous similar approaches [15, 20].
the increasing size and complexity of high-performance applications have motivated a new round of innovation related to configuration, build, and launch of applications for large computing platforms, especially hetero...
详细信息
ISBN:
(纸本)3540643591
the increasing size and complexity of high-performance applications have motivated a new round of innovation related to configuration, build, and launch of applications for large computing platforms, especially heterogeneous multicomputers. this paper describes the software technology of the Talaris(TM) Environment, created by, Mercury Computer Systems, Inc. to enable a new generation of tools that construct and initiate applications for large distributed and parallel computer systems. the Talaris Environment provides an extensible framework for cooperating tools that share application configuration information. Tools developed by Mercury for the Environment focus on high-performance embedded DSP applicationsthat run on Mercury's RACE(R) series multicomputer systems. Additional tools under development by Mercury and other organizations support other target systems and programming interfaces that include UNIX workstation networks, the IBM SP/2, real-time DSP platforms, the Message Passing Interface(MPI), and POSIX. Development of the Talaris Environment has been funded in part by the Defense Advanced Research Projects Agency (DARPA) under the "Bridging the Gap" and "three Steps" programs. the Talaris Environment is currently available in connection withthese DARPA programs.
On-line tools for parallel and distributed programs require a facility to observe and possibly manipulate the programs' run-time behavior, a so called monitoring system. Currently, most tools use proprietary monit...
详细信息
ISBN:
(纸本)3540649522
On-line tools for parallel and distributed programs require a facility to observe and possibly manipulate the programs' run-time behavior, a so called monitoring system. Currently, most tools use proprietary monitoring techniques that are incompatible to each other and usually apply only to specific target platforms. the On-line Monitoring Interface Specification (OMIS) is the first specification of a universal interface between different tools and a monitoring system, thus enabling interoperable, portable and uniform tool environments. the paper gives an introduction into the basic concepts of OMIS and presents the design and implementation of an OMIS compliant monitoring system (OCM).
there has been relatively little analytical work on processor optimizations for multimedia applications. Withthe introduction of MMX by Intel, it is clear that this is an area of increasing importance. Building on pr...
详细信息
ISBN:
(纸本)0818684038
there has been relatively little analytical work on processor optimizations for multimedia applications. Withthe introduction of MMX by Intel, it is clear that this is an area of increasing importance. Building on previous work [4, 5, 6, 7, 13, 14], we propose optimizations for multimedia architectures that support independent parallel execution of instructions within dynamically assembled traces, resulting in dramatic performance improvements. Specifically, we propose simplified instruction scheduling and register renaming algorithms due to constraints on trace formation. In addition, we suggest specific instruction pool and trace cache parameters. We constructed a simulator in order to measure the benefits of these processor optimizations for multimedia applications. the simulated machine, which could fetch/decode 2 instructions per cycle, performed better than a superscalar machine that could fetch/decode 8 instructions per cycle. Execution rates as high as 7.3 instructions per cycle were achieved for the benchmarks simulated, assuming 16 instructions per trace.
Checkpointing distributedapplications involving mobile hosts is an important task to reduce the rollback during a recovery from a failure and to manage voluntary disconnections. In this paper we show the basic charac...
详细信息
ISBN:
(纸本)3540643591
Checkpointing distributedapplications involving mobile hosts is an important task to reduce the rollback during a recovery from a failure and to manage voluntary disconnections. In this paper we show the basic characteristics a checkpointing protocol needs to work with mobile hosts, namely, reduction of the number of checkpoints, the use of incremental checkpointing and consistent global checkpoint built on the fly. Previous points must be implemented by using as small control information as possible and ensuring little rollback. A comparative analysis of the performance of some interesting communication-induced checkpointing protocols, adapted to a mobile setting, is presented. the analysis has been carried out by using discrete event simulation and several models have been considered for the hosts mobility.
the proceedings contain 133 papers. the topics discussed include: very distributed media stories: presence, time, imagination;random number generation and simulation on vector and parallel computers;quantum cryptograp...
ISBN:
(纸本)3540649522
the proceedings contain 133 papers. the topics discussed include: very distributed media stories: presence, time, imagination;random number generation and simulation on vector and parallel computers;quantum cryptography on optical fiber networks;heterogeneous HPC environments;HPcc as high performance commodity computing on top of integrated Java, CORBA, COM and Web standards;a parallel-system design toolset for vision and image processing;achieving portability and efficiency through automatic optimisation: an investigation in parallel image processing;verifying a performance estimator for parallel DBMSs;a universal infrastructure for the run-time monitoring of parallel and distributedapplications;a graphical tool for the visualization and animation of communicating sequential processes;Net-dbx: a Java powered tool for interactive debugging of MPI programs across the Internet;and analysing an SQL application with a BSPlib call-graph profiling tool.
Finite automata are among the most extensively studied and well understood models of computation. they have wide ranging applications — for example, in image compression, protocol validation, game theory an...
详细信息
ISBN:
(纸本)3540643591
Finite automata are among the most extensively studied and well understood models of computation. they have wide ranging applications — for example, in image compression, protocol validation, game theory and computational biology just to mention only some recent ones. Here we will attempt to present a comprehensive survey of parallel algorithms for many fundamental computational problems on finite automata. It is well known that fundamental analysis problems involving deterministic finite automata have polynomial time algorithms, but the problems become hard when the input automata are nondeterministic. A similar difference is observed for parallel algorithms: most problems involving DFA as input have NC algorithms, while such algorithms are unlikely with NFA as input.
In this paper;we present an adaptive version of our previously proposed quality equalizing (QE) load balancing strategy that attempts to maximize the performance of parallel branch-and-bound (B&B) by adapting to a...
详细信息
ISBN:
(纸本)0818684038
In this paper;we present an adaptive version of our previously proposed quality equalizing (QE) load balancing strategy that attempts to maximize the performance of parallel branch-and-bound (B&B) by adapting to application and target computing system characteristics. Adaptive QE (AQE) incorporates the following salient adaptive features: (I) Anticipatory quantitative and qualitative load balancing mechanisms. (2) Regulation of load information exchange overhead. (3) Deterministic loan balancing in extended neighborhoods instead of just immediate neighborhoods as in non-adaptive QE. (4) Randomized global load balancing to fetch work from outside the extended neighborhood. AQE yields speedup improvements of lip to 80%, and 15% on the average, compared to that provided by QE for several real-world mixed-integer programming (MIP) problems, and near-ideal speedups for two of the largest problems in the MIPLIB benchmark suite on an IBM SP2 system.
this paper presents a new approach for parallel heuristic algorithms based on adaptive parallelism. Adaptive parallelism was used to dynamically adjust the parallelism degree of the application with respect to the sys...
详细信息
ISBN:
(纸本)3540643591
this paper presents a new approach for parallel heuristic algorithms based on adaptive parallelism. Adaptive parallelism was used to dynamically adjust the parallelism degree of the application with respect to the system load. this approach demonstrates that high-performance computing using heterogeneous workstations combined with massively parallel machines is feasible to solve large assignment problems. the fault-tolerant algorithm allows a minimal loss of computation in case of failures. the proposed algorithm exploits the properties of this class of applications in order to reduce the complexity of the algorithm. the parallel heuristic algorithm combines different search strategies: simulated annealing and tabu search. Encouraging results have been obtained in solving the quadratic assignment problem. We have improved the best known solutions for some large real-world problems.
High Performance Fortran (HPF) is the de facto standard language for writing data parallel programs. In case of applicationsthat use indirect addressing on distributed arrays, HPF compilers have limited capabilities ...
详细信息
ISBN:
(纸本)3540649522
High Performance Fortran (HPF) is the de facto standard language for writing data parallel programs. In case of applicationsthat use indirect addressing on distributed arrays, HPF compilers have limited capabilities for optimizing such codes on distributed memory architectures, especially for optimizing communication and reusing communication schedules between subroutine boundaries. this paper describes a dynamic approach for optimizing unstructured communication in codes with indirect addressing. the basic idea is that runtime data reflecting the communication patterns will be reused if possible. the user has only to specify which data in the program has to be traced for modifications. the experiments and results show the effectiveness of the chosen approach.
暂无评论