Many systems rely on the ability to rollback (or restore) parts of the system state to undo or recover from undesired or erroneous computations. Examples of such systems include fault tolerant systems with checkpointi...
详细信息
ISBN:
(纸本)9780818675393
Many systems rely on the ability to rollback (or restore) parts of the system state to undo or recover from undesired or erroneous computations. Examples of such systems include fault tolerant systems with checkpointing, editors with undo capabilities, transaction and data base systems and optimistically synchronized parallel and distributedsimulations. An essential part of such systems is the state saving mechanism. It should not only allow efficient state saving, but also support efficient state restoration in case of roll back. Furthermore, it is often a requirement that this mechanism is transparent to the user. In this paper we present a method to implement a transparent incremental state saving mechanism in an optimistically synchronized parallel discrete event simulation system based on the Time Warp mechanism. The usefulness of this approach is demonstrated by simulations of large, detailed, realistic FCA and a DCA-like cellular phone systems.
We investigate conservative parallel discrete event simulations for logical circuits on shared-memory multiprocessors. For a first estimation of the possible speedup, we extend the critical path analysis technique by ...
详细信息
ISBN:
(纸本)9780818675393
We investigate conservative parallel discrete event simulations for logical circuits on shared-memory multiprocessors. For a first estimation of the possible speedup, we extend the critical path analysis technique by partitioning strategies. To incorporate overhead due to the management of data structures, we use a simulation on an ideal parallel machine (PRAM). This simulation can be directly executed on the SB-PRAM prototype, yielding both an implementation and a basis for data structure optimizations. One of the major tools to achieve these is the SB-PRAM's hardware support for parallel prefix operations. Our reimplementation of the PTHOR program on the SB-PRAM yields substantially higher speedups than before.
One of the methods used to reduce the time spent simulating VHDL designs is by parallelizing the simulation. In this paper, we describe the implementation of an object-oriented Time Warp simulator for VHDL on an actor...
详细信息
ISBN:
(纸本)9780818675393
One of the methods used to reduce the time spent simulating VHDL designs is by parallelizing the simulation. In this paper, we describe the implementation of an object-oriented Time Warp simulator for VHDL on an actor based environment. The actor model of computation allows the exploitation of fine grained parallelism in a truly asynchronous manner and allows for the overlap of computation with communication. Some preliminary results obtained by simulating a set of multipliers and some ISCAS benchmark circuits are provided. In addition, the importance of placing processes based on circuit partitioning techniques for improving runtimes and scalability is demonstrated. Results are reported on a Sun SPARCServer 1000 and an Intel Paragon.
Performance monitoring in the Annai tool environment for distributed-memory parallel computing systems is achieved through a flexible combination of different types of instrumentation. Latent instrumentation in the co...
详细信息
Performance monitoring in the Annai tool environment for distributed-memory parallel computing systems is achieved through a flexible combination of different types of instrumentation. Latent instrumentation in the communication library and inserted in executables by the compilation system can be dynamically configured during program execution. Analysis of the subject program and knowledge of the associated intrusion and costs allows Annai to guide the user in selecting appropriate instrumentation. Instrumentation processing and transport costs are also explicitly available as part of Annai/PMA performance analysis and visualizations, such that they can be clearly accounted for and their potential impact on program execution considered. While it is not possible to completely eliminate or compensate for instrumentation intrusion, Annai provides an integrated environment where intrusion is explicitly recognized and can be minimized as part of detailed parallel program performance analysis.
A network of workstations (NOW) has become an important distributed platform for large-scale scientific computations. A practical NOW system is heterogeneous and nondedicated, where computing power varies among the wo...
详细信息
ISBN:
(纸本)0818672358
A network of workstations (NOW) has become an important distributed platform for large-scale scientific computations. A practical NOW system is heterogeneous and nondedicated, where computing power varies among the workstations and multiple jobs may interact with each other during execution. I present the design and implementation of a simulation system for a nondedicated heterogeneous NOW. This simulator provides many options to users to specify and quantify system architectures, network heterogeneity and time-sharing factors, such as speeds of different processors, memory organizations, network topology, communication structures, and workload distributions. The simulator also supports execution of message-passing parallel programs written in C and the PVM library. The software structure of the simulator is well-modularized and highly extensible, which makes it easy to integrate other existing processor, memory and network simulators.
In this paper we study message flow processes in distributed simulators of open queueing networks. We develop and study queueing models for distributed simulators with maximum lookahead sequencing. We characterize the...
详细信息
ISBN:
(纸本)9780818675393
In this paper we study message flow processes in distributed simulators of open queueing networks. We develop and study queueing models for distributed simulators with maximum lookahead sequencing. We characterize the "external'' arrival process, and the message feedback process in the simulator of a simple queueing network with feedback. We show that a certain "natural'' modelling construct for the arrival process is exactly correct, whereas an ``obvious'' model for the feedback process is wrong; we then show how to develop the correct model. Our analysis throws light on the stability of distributed simulators of queueing networks with feedback. We show how the stability of such simulators depends on the parameters of the queueing network.
A simulation-oriented language can significantly enhance the usability of parallel Discrete Event simulation (PDES) by hiding the complexities of the synchronization protocol used to ensure that events are processed i...
详细信息
ISBN:
(纸本)9780818675393
A simulation-oriented language can significantly enhance the usability of parallel Discrete Event simulation (PDES) by hiding the complexities of the synchronization protocol used to ensure that events are processed in the correct order. The higher-level interface presented to the user by such a language also allows optimizations to be performed that are difficult and cumbersome with current parallel simulators, such as granularity control. APOSTLE is a new high-level simulation-oriented language for PDES, and in this paper we report that the APOSTLE granularity control mechanism reduced simulation run-times by as much as 80%. We also report that APOSTLE achieved a parallel speed-up of around 9 on 16 processors relative to its optimized sequential implementation and a parallel speed-up of around 6 on 16 processors relative to MODSIM II. Overall, we believe that the widespread success of PDES can only be achieved using a simulation-oriented language, and that APOSTLE has made a significant contribution towards this goal.
Based on a linear ordering of vertices in a directed graph, a linear-time partitioning algorithm for parallel logic simulation is presented. Unlike most other partitioning algorithms, the proposed algorithm preserves ...
详细信息
ISBN:
(纸本)9780818675393
Based on a linear ordering of vertices in a directed graph, a linear-time partitioning algorithm for parallel logic simulation is presented. Unlike most other partitioning algorithms, the proposed algorithm preserves circuit concurrency by assigning to processors circuit gates that can be evaluated at about the same time. As a result, the concurrency preserving partitioning (CPP) algorithm can provide better load balancing throughout the period of a parallelsimulation. This is especially important when the algorithm is used together with a Time Warp simulation where a high degree of concurrency can lead to fewer rollbacks and better performance. The algorithm consists of three phases, and three conflicting goals can be separately considered in each phase so to reduce computational complexity. A parallel gate-level circuit simulator is implemented on an Intel Paragon machine to evaluate the performance of the CPP algorithm. The results are compared with two other partitioning algorithms to show that reasonable speedup may be achieved with the algorithm.
This paper describes two forms of feedback in the simulation runtime of VHDL circuits that greatly influences performance. While circuit feedback and strongly connected components have been observed and documented as ...
详细信息
ISBN:
(纸本)9780818675393
This paper describes two forms of feedback in the simulation runtime of VHDL circuits that greatly influences performance. While circuit feedback and strongly connected components have been observed and documented as detrimental influences to conservative parallel discrete event simulation (PDES) efficiency, that influence has never been quantified. Moreover, in this study, the phenomenon of induced feedback was observed to diminish speedup to the same degree as explicit feedback. In this paper the influence of feedback on simulation runtime is analyzed and an algorithm for its elimination is presented. In addition, a metric for the quantification of feedback is introduced. By measuring feedback, it is possible to balance its influence on simulation runtime with that of other factors (e.g. load balance, number of processors, machine granularity, etc. ) through the use of a cost-based partitioning approach. This paper reports significant improvements in runtime for three circuits due to the prevention of feedback using the partitioning algorithm presented. In addition, strong correlation between the feedback metric and conservative parallelsimulation overhead is demonstrated.
Synchronization is often the dominant cost in conservative parallelsimulation, particularly in simulations of parallel computers, in which low-latency simulated communication requires frequent synchronization. We pre...
详细信息
ISBN:
(纸本)9780818675393
Synchronization is often the dominant cost in conservative parallelsimulation, particularly in simulations of parallel computers, in which low-latency simulated communication requires frequent synchronization. We present and evaluate LOCAL BARRIERS and PREDICTIVE BARRIER SCHEDULING, two techniques for reducing synchronization overhead in the simulation of message-passing multicomputers. Local barriers use nearest-neighbor synchronization to reduce waiting time at synchronization points. Predictive barrier scheduling, a novel technique that schedules synchronizations using both compile-time and runtime analysis, reduces the frequency of synchronization operations. In contrast to other work in this area, both techniques reduce synchronization overhead without decreasing the accuracy of network simulation. These techniques were evaluated by comparing their performance to that of periodic global synchronization. Experiments show that local barriers improve performance by up to 24% for communication-bound applications, while predictive barrier scheduling improves performance by up to 65% for applications with long local computation phases. Because the two techniques are complementary, we advocate a combined approach. This work was done in the context of parallel PROTEUS, a new parallel simulator of message-passing multicomputers.
暂无评论