One of the promises of parallelized discrete-event simulation is that it might provide significant speedups over sequential simulation. In reality, high performance cannot be achieved unless the system is fine-tuned t...
详细信息
One of the promises of parallelized discrete-event simulation is that it might provide significant speedups over sequential simulation. In reality, high performance cannot be achieved unless the system is fine-tuned to balance computation, communication, and synchronization requirements. In this paper, we discuss our experiments in automated load balancing using the SPEEDES simulation framework. Specifically, we examine three mapping algorithms that use run-time measurements. Using simulation models of queuing networks and the National Airspace System, we investigate (i) the use of run-time data to guide mapping, (ii) the utility of considering communication costs in a mapping algorithm, (iii) the degree to which computational 'hot-spots' ought to be broken up in the linearization, and (iv) the relative execution costs of the different algorithms. We compare the performance of the three algorithms using results from the Intel Paragon.
The simulation of computational fluid dynamics problems in two and more dimensions involves computations of multiple degrees of freedom, such as the components of velocity, which are an obvious source of parallelism. ...
详细信息
A load distribution system is proposed to enable a single Time Warp program to execute in background, spreading over a collection of possibly heterogeneous workstations (including multiprocessor hosts), utilizing what...
详细信息
A load distribution system is proposed to enable a single Time Warp program to execute in background, spreading over a collection of possibly heterogeneous workstations (including multiprocessor hosts), utilizing whatever otherwise unused CPU cycles are available. The system uses a simple processor allocation policy to dynamically add or delete hosts from the set of processors utilized by the Time Warp program during its execution. A load balancing algorithm is used that allocates logical processes (LPs) to processors, taking into account other computations executing on the host from the system or other user applications. A clustering mechanism is used to group collections of logical processes together, reducing process migration overheads and helping to retain locality of communication for simulations containing large number of LPs. An initial, prototype implementation of the load distribution system is described that executes on a homogeneous network of Silicon Graphics workstations. Initial experiments indicate this approach shows promise in enabling efficient execution of Time Warp programs 'in background' on distributed computing platforms.
We present an Incremental State Saving technique for which the state saving calls are inserted automatically by directly editing the application executable. This method has the advantage of being easy to use since it ...
详细信息
We present an Incremental State Saving technique for which the state saving calls are inserted automatically by directly editing the application executable. This method has the advantage of being easy to use since it is fully automatic, and has good performance since it adds overhead only where state is being modified. Since the editing happens on executable code, the method is independent of the compiler, and allows third party libraries to be used. None of the previous incremental state saving methods have both of these features. We find that it is beneficial to use Automatic Incremental State Saving if less than 15% of the state is modified in each event as compared to copy state saving. This technique allows us to efficiently parallelize existing simulations, and makes Time Warp more accessible to non-Time Warp experts.
The proceedings contain 45 papers. The topics discussed include: use case maps for attributing behavior to system architecture;modeling real-time distributed software systems;the integration of real-time applications ...
ISBN:
(纸本)0818675152
The proceedings contain 45 papers. The topics discussed include: use case maps for attributing behavior to system architecture;modeling real-time distributed software systems;the integration of real-time applications into the global command and control system;implementing distributed real-time control systems in a functional programming language;modeling and simulation in reactive systems;entity-life modeling in a distributed environment;an environment for incremental development of distributed extensible asynchronous real-time systems;development and validation of network clock measurement techniques;inter- and intra-processor synchronizations in multiprocessor real-time kernel;and behavior analysis of parallel, real-time and embedded sytems for monitoring and optimizing industrial processes.
The event horizon is a very important concept that applies to both parallel and sequential discrete-event simulations. By exploiting the event horizon, parallelsimulations can processes events optimistically in a ris...
详细信息
The event horizon is a very important concept that applies to both parallel and sequential discrete-event simulations. By exploiting the event horizon, parallelsimulations can processes events optimistically in a risk-free manner (i.e., without requiring antimessages) using adaptable 'breathing' time cycles with variable time widths. Additionally, by exploiting the event horizon, one can significantly reduce the overhead of event list management that is common to virtually every discrete-event simulation. This paper is a continuation of work previously reported at PADS94. In that report, a complete mathematical formulation of the event horizon was derived under equilibrium conditions using the hold model. Various forms of the beta density function were consequently used to verify the predicted results of the analytic model. This second report describes how the concept of the event horizon can also be applied to event list management. By exploiting the event horizon, the performance of several priority queue data structures are improved including: linked lists, various binary trees, and heaps. A somewhat detailed description of these modified data structures along with other relevant background information is provided for completeness. Performance results for each priority queue data structure is presented.
For the performability evaluation of complex soft realtime systems simulation often remains the only feasible method. simulation experiments tend to be very time consuming if rare events have to be considered. This pa...
详细信息
The paper presents a modeling and simulation method to evaluate the performance of distributed computer control systems (DCCSs). Task response time, resource utilization, and network delay are considered as performanc...
详细信息
The single-disk, D-head model of parallel I/O was introduced by Agarwal and Vitter to analyze algorithms for problem instances that are too large to fit in primary memory. Subsequently Vitter and Shriver proposed a mo...
详细信息
ISBN:
(纸本)9780897918138
The single-disk, D-head model of parallel I/O was introduced by Agarwal and Vitter to analyze algorithms for problem instances that are too large to fit in primary memory. Subsequently Vitter and Shriver proposed a more realistic model in which the disk space is partitioned into D disks, with a single head per disk. To date, each problem for which there is a known optimal algorithm for both models has the same asymptotic bounds on both models. Therefore, it has been unknown whether the models are equivalent or whether the single-disk model is strictly more powerful. In this paper we provide evidence that the single-disk model is strictly more powerful. We prove a lower bound on any general simulation of the single-disk model on the multi-disk model and establish randomized and deterministic upper bounds. Let N be the problem size and let T be the number of parallel I/Os required by a program on the single-disk model. Then any simulation of this program on the multi-disk model will require Ω (Tlog(N/D)/log log(N/D)) parallel I/Os. This lower bound holds even if replication is allowed in the multi-disk model. We also show an O (log D/log log D) randomized upper bound and an O (log D(log log D)2) deterministic upper bound. These results exploit an interesting analogy between the disk models and the PRAM and DCM models of parallel computation.
ITL and Tempura are used for respectively the formal specification and simulation of a large scale system, namely the general purpose multi threaded dataflow processor EP/3. The paper shows that this processor can be ...
详细信息
暂无评论