In this paper we analyse a very simple dynamic work-stealing algorithm. In the work-generation model, there are n generators which are arbitrarily distributed among a set of n processors. During each time-step, with p...
详细信息
In this paper we analyse a very simple dynamic work-stealing algorithm. In the work-generation model, there are n generators which are arbitrarily distributed among a set of n processors. During each time-step, with probability /spl lambda/, each generator generates a unit-time task which it inserts into the queue of its host processor. After the new tasks are generated, each processor removes one task from its queue and services it. Clearly, the work-generation model allows the load to grow more and more imbalanced, so, even when /spl lambda/<1, the system load can be unbounded. The natural work-stealing algorithm that we analyse works as follows. During each time step, each empty processor sends a request to a randomly selected other processor. Any non-empty processor having received at least one such request in turn decides (again randomly) in favour of one of the requests. The number of tasks which are transferred from the non-empty processor to the empty one is determined by the so-called work-stealing function f. We analyse the long-term behaviour of the system as a function of /spl lambda/ and f. We show that the system is stable for any constant generation rate /spl lambda/<1 and for a wide class of functions f. We give a quantitative description of the functions f which lead to stable systems. Furthermore, we give upper bounds on the average system load (as a function of f and n).
In this paper we propose a model to predict the performance of synchronous discrete event simulation. The model considers parameters including the number of active objects per cycle, event execution granularity and co...
详细信息
ISBN:
(纸本)0780372476
In this paper we propose a model to predict the performance of synchronous discrete event simulation. The model considers parameters including the number of active objects per cycle, event execution granularity and communication cost. We, derive a single formula that predicts the performance of synchronous simulation. We have benchmarked several VHDL circuits on SGI Origin 2000. The benchmark results show that the prediction model explains more than 90% of parallel simulation execution time. We also measure the effect of computation granularity over performance. The benchmark results show that although higher granularity can have better speedup because of dominance of computation over communication, the computational granularity cannot overshadow the inherent synchronization cost. This model can be used to predict the speed-up expected for synchronous simulation, and to decide whether it is worthwhile to use synchronous simulation before actually implementing it.
The optimum receiver to detect the bits of multiple CDMA users has exponential complexity in the number of active users in the system. Previous work showed that the successive and parallel soft interference cancellers...
详细信息
The optimum receiver to detect the bits of multiple CDMA users has exponential complexity in the number of active users in the system. Previous work showed that the successive and parallel soft interference cancellers correspond to nonlinear programming relaxations of the optimum multiuser detection problem. We use this approximation method combined with the slowest descent approach to improve the performance of soft interference cancellers. The aim is to achieve a performance closer to the performance of the optimum receiver without significantly compromising the low complexity of the resulting receiver. We derive the resulting detectors and evaluate their performance. Results show that they can achieve near-optimum performance and outperform several previously proposed multiuser detectors.
In general, the coupling modes may divide into immediate, deferred and detached modes in active database systems, but this classification doesn't suffer to clearly separate various execution models in concurrent a...
详细信息
In general, the coupling modes may divide into immediate, deferred and detached modes in active database systems, but this classification doesn't suffer to clearly separate various execution models in concurrent active rule system. The ambiguity had been exploited that the same coupling modes (e.g. immediate) have different semantics in various active database systems. To clear up this ambiguity, we classified the coupling modes into syn-coupling and asyn-coupling modes, according to the key issue, synchrony/asynchrony, in parallel programming language. Rule execution semantics for various coupling modes are distinctly defined. It is beneficial to implementation and usage of the active rule system. After graph-based rule system (E-RG) and its execution model, we show the various strategies to construct the syn-coupling and asyn-coupling modes in E-RG rule system, based on semantics for coupling modes.
Complexity in real codes is sometimes due to the utilization of multi-vector data structures, but there are not many compile-time approaches dealing with this problem, Moreover current compilation techniques only anal...
详细信息
ISBN:
(纸本)0769509908
Complexity in real codes is sometimes due to the utilization of multi-vector data structures, but there are not many compile-time approaches dealing with this problem, Moreover current compilation techniques only analyze single vectors. This paper describes how the performance can be improved if semantical bindings are taken into account during the parallelization. Our approach is a first step to converge from the data-parallel paradigm to the automatic parallelization, by reducing the number of directives on code. We apply a multi-loop analysis and a sparse privatization to replace the owner computes rule. Additionally, our support will be able to parallelize loops with some of levels of indirections on a left-hand side. In this paper, we also present three alternatives to store the sending information, and two algorithms to calculate coordinates from pointers. Both issues have a critical importance when the parallelized algorithm requires a sparse communication.
We present a computation model to describe a clustered memory hierarchy of distributed shared memory machines. The computation model includes the access to shared data stored in different levels of the hierarchy as we...
详细信息
ISBN:
(纸本)0769509878
We present a computation model to describe a clustered memory hierarchy of distributed shared memory machines. The computation model includes the access to shared data stored in different levels of the hierarchy as well as the transfer of entire blocks of data between different levels of the memory. Pure shared memory machines and pure message passing machines can be expressed within the model. As example we use the model to analyze a hierarchical matrix multiplication algorithm.
This paper addresses parallel programming paradigms for nonlinear, explicit finite element simulations primarily employed for crashworthiness and occupant safety simulations in the automotive industry. The reliance of...
详细信息
This paper addresses parallel programming paradigms for nonlinear, explicit finite element simulations primarily employed for crashworthiness and occupant safety simulations in the automotive industry. The reliance of industrial design on computer simulation and state-of-the-art high performance computing architectures will be discussed as a motivation for the need for parallel-implementations of such codes. Concrete descriptions of parallelisation strategies using shared-memory micro-tasking, message-passing, and high performance Fortran will be given for the industrial simulation code PAM-CRASH(TM) land the related code PAM-SAFE(TM), together with performance results on a variety of parallel platforms. (C) 2000 Elsevier Science Ltd. All rights reserved.
A number of interesting models have been proposed and used to support coordination languages and systems. In this introductory paper, we first present a number of important concepts that form a context for classificat...
详细信息
A number of interesting models have been proposed and used to support coordination languages and systems. In this introductory paper, we first present a number of important concepts that form a context for classification and comparison of various coordination models and languages, and their applications. Next, we review three models and their associated languages, representing three different approaches to coordination. We illustrate the application of each model and language by using it to solve the classical dining philosophers problem. This paper ends with an overview of the rest of the papers that appear in this special issue. (C) 1998 Elsevier Science B.V. All rights reserved.
暂无评论