We describe the use of temporal logic formulas as runtime assertions in a parallel debugging environment. the user asserts in a message passing program the expected system behavior by one or several such formulas. the...
详细信息
ISBN:
(纸本)3540440496
We describe the use of temporal logic formulas as runtime assertions in a parallel debugging environment. the user asserts in a message passing program the expected system behavior by one or several such formulas. the debugger allows by "macro-stepping" to interactively elaborate the execution tree (i.e., the set of possible execution paths) which arises from the use of non-deterministic communication operations. In each macro-step, a temporal logic checker verifies that the once asserted temporal formulas are not violated by the current program state. Our approach thus introduces powerful runtime assertions into parallel and distributed debugging by incorporating ideas from the model checking of temporal formulas.
the proceedings contain 101 papers. the special focus in this conference is on Grid architectures, Load Balancing, Performance Analysis, Prediction, parallel Non-numerical algorithms and parallel Programming. the topi...
ISBN:
(纸本)9783540437925
the proceedings contain 101 papers. the special focus in this conference is on Grid architectures, Load Balancing, Performance Analysis, Prediction, parallel Non-numerical algorithms and parallel Programming. the topics include: Interrupt and cancellation as synchronization methods;applications of virtual data in the LIGO experiment;a parallel system architecture based on dynamically configurable shared memory clusters;simultaneous allocation and scheduling with exclusion and precedence relations algorithm;a greedy approach for a time-dependent scheduling problem;dedicated scheduling of biprocessor tasks to minimize mean flow time;heterogeneous dynamic load balancing with a scheme based on the laplacian polynomial;task scheduling for dynamically configurable multiple SMP clusters based on extended DSC approach;processing time and memory requirements for multi-instalment divisible job processing;estimating execution time of distributed applications;evaluation of parallel programs by measurement of its granularity;the performance of different communication mechanisms and algorithms used for parallelization of molecular dynamics code;benchmarking tertiary storage systems with file fragmentation;fem computations on clusters using different models of parallel programming;parallel skeletons for tabu search method based on search strategies and neighborhood partition;a new parallel approach for multi-dimensional packing problems;three parallelalgorithms for simulated annealing;solving the flow shop problem by parallel simulated annealing;automated verification of infinite state concurrent systems;criteria of satisfiability for homogeneous systems of linear Diophantine constraints and irregular and out-of-core parallel computing on clusters.
Today, parallel programming is dominated by message passing libraries such as MPI. Algorithmic skeletons intend to simplify parallel programming by increasing the expressive power. the idea is to offer typical paralle...
详细信息
ISBN:
(纸本)3540440496
Today, parallel programming is dominated by message passing libraries such as MPI. Algorithmic skeletons intend to simplify parallel programming by increasing the expressive power. the idea is to offer typical parallel programming patterns as polymorphic higher-order functions which are efficiently implemented in parallel. the approach presented here integrates the main features of existing skeleton systems. Moreover, it does not come along with a new programming language or language extension, which parallel programmers may hesitate to learn, but it is offered in form of a library, which can easily be used by e.g. C and C++ programmers. A major technical difficulty is to simulate the main requirements for a skeleton implementation, namely higher-order functions, partial applications, and polymorphism as efficiently as possible in an imperative programming language. Experimental results based on a draft implementation of the suggested skeleton library show that this can be achieved without a significant performance penalty.
In this paper, we propose a blocking algorithm for a. parallel one-dimensional fast Fourier transform (FFT) on clusters of PCs. Our proposed parallel FFT algorithm is based on the six-step FFT algorithm. the six-step ...
详细信息
ISBN:
(纸本)3540440496
In this paper, we propose a blocking algorithm for a. parallel one-dimensional fast Fourier transform (FFT) on clusters of PCs. Our proposed parallel FFT algorithm is based on the six-step FFT algorithm. the six-step FFT algorithm can be altered into a block nine-step FFT algorithm to reduce the number of cache misses. the block nine-step FFT algorithm improves performance by utilizing the cache memory effectively. We use the block nine-step FFT algorithm to design the parallel, one-dimensional FFT algorithm. In our proposed parallel FFT algorithm, since we use cyclic distribution, all-to-all communication is required only once. Moreover, the input data and output data are both can be given in natural order. We successfully achieved performance of over 1.3 GFLOPS on an 8-node dual Pentium III 1 GHz PC SMP cluster.
Matrix partitioning problems that arise in the efficient estimation of sparse Jacobians andHessians can be modeledusing variants of graph coloring problems. In a previous work [6], we argue that distance-2 and distanc...
详细信息
the SB-PRAM is a parallel architecture which uses i) multithreading in order to hide latency, ii) a pipelined combining butterfly network in order to reduce hot spots and iii) address hashing in order to randomize net...
详细信息
A new algorithm for reducing the division operation to a series of smaller divisions is introduced. Partitioning the dividend into segments, we perform divisions, shifts, and accumulations taking into account the weig...
详细信息
ISBN:
(纸本)0780375963
A new algorithm for reducing the division operation to a series of smaller divisions is introduced. Partitioning the dividend into segments, we perform divisions, shifts, and accumulations taking into account the weight of dividend bits. Each partial division can be performed by any existing division algorithm. From an algorithmic point of view, computational complexity analysis is performed in comparison with existing algorithms. From an implementation point of view, since the division can be performed by any existing divider, the designer can chose the divider which meets his specifications best. Two possible implementations of the algorithm, namely the sequential and parallel are derived, with several variations, allowing performance, cost, and cost/performance trade-offs.
the proceedings contain 12 papers. the special focus in this conference is on Job Scheduling Strategies for parallelprocessing. the topics include: A self-tuning job scheduler family with dynamic policy switching;pre...
ISBN:
(纸本)9783540361800
the proceedings contain 12 papers. the special focus in this conference is on Job Scheduling Strategies for parallelprocessing. the topics include: A self-tuning job scheduler family with dynamic policy switching;preemption based backfill;job scheduling for the bluegene/L system;selective reservation strategies for backfill job scheduling;multiple-queue backfilling scheduling with priorities and reservations for parallel systems;scheduling jobs on parallel systems using a relaxed backfill strategy;the impact of more accurate requested runtimes on production job scheduling performance;economic scheduling in grid computing;a protocol for negotiating service level agreements and coordinating resource management in distributed systems;local versus global schedulers with processor co-allocation in multicluster systems;practical heterogeneous placeholder scheduling in overlay metacomputers and current activities in the scheduling and resource management area of the global grid forum.
the computer graphics industry, and in particular those involved with films, games and virtual reality, continue to demand more realistic computer generated images. Despite the ready availability of modem high perform...
详细信息
ISBN:
(纸本)3540440496
the computer graphics industry, and in particular those involved with films, games and virtual reality, continue to demand more realistic computer generated images. Despite the ready availability of modem high performance graphics cards, the complexity of the scenes being modeled and the high fidelity required of the images means that rendering such images is still simply not possible in a reasonable, let alone real-time on a single computer. Two approaches may be considered in order to achieve such realism in real-time: parallelprocessing and Visual Perception. parallelprocessing has a number of computers working together to render a single image, which appears to offer almost unlimited performance, however, enabling many processors to work efficiently together is a significant challenge. Visual Perception, on the other hand, takes into account that it is the human who will ultimately be looking at the resultant images, and while the human eye is good, it is not perfect. Exploiting knowledge of the human visual system can save significant rendering time by simply not computing those parts of a scene that the human will fail to notice. A combination of these two approaches may indeed enable us to achieve realistic rendering in real-time.
A demand-driven parallelization of the ray tracing algorithm is presented. Correctness and optimality of a perfect load balancing algorithm for image space subdivision are proved and its exact message complexity is gi...
详细信息
ISBN:
(纸本)3540440496
A demand-driven parallelization of the ray tracing algorithm is presented. Correctness and optimality of a perfect load balancing algorithm for image space subdivision are proved and its exact message complexity is given. An integration of antialiasing into the load balancing algorithm is proposed. A distributed object database allows rendering of complex scenes which cannot be stored in the memory of a single processor. Each processor maintains a permanent subset of the object database as well as a cache for a temporary storage of other objects. A use of object bounding boxes and bounding hierarchy in combination withthe LRU (Last Recently Used) caching policy reduces the number of requests for missing data to a necessary minimum. the proposed parallelization is simple and robust. It should be easy to implement with any sequential ray tracer and any message-passing system. Our implementation is based on POV-Ray and PVM.
暂无评论