Optimistic and conservative simulation algorithms have been effective for speeding up the execution of many simulation programs. Event stepped techniques have also been shown to be effective for certain types of probl...
ISBN:
(纸本)9780769511047
Optimistic and conservative simulation algorithms have been effective for speeding up the execution of many simulation programs. Event stepped techniques have also been shown to be effective for certain types of problems. This paper presents a conjecture that sparse systems are effectively simulated by conservative and optimistic techniques, where sparseness is characterized by the ratio of output generating states to non-output generating states. The speedup results for a representative sparse simulation are shown to support this conjecture.
This paper describes a parallel proximity detection algorithm and illustrates its application to the problem of conflict detection in an aviation simulation. The algorithm invokes a previously designed sequential func...
ISBN:
(纸本)9780769511047
This paper describes a parallel proximity detection algorithm and illustrates its application to the problem of conflict detection in an aviation simulation. The algorithm invokes a previously designed sequential function in parallel, using spatial information acquired during the traversal of a quad tree, to keep the separate invocations of the function as independent as possible. The method is generally applicable to any function (not just conflict detection) whose arguments are spatially organized. Empirical results show that a single-threaded version of the algorithm sped up the simulation by 57%, while a four-threaded parallel version extracted 30% of the remaining additional speedup. These results are even more noteworthy given that the architecture of the simulation remains intact: we only replace the invocation mechanism for one function.
In a distributedsimulation, simulation components of various types are executed at geographically different locations, forming a simulation federation to create a common virtual environment. Under the High Level Arch...
详细信息
ISBN:
(纸本)9780769511047
In a distributedsimulation, simulation components of various types are executed at geographically different locations, forming a simulation federation to create a common virtual environment. Under the High Level Architecture (HLA), information that will be produced and consumed by a simulation component is defined in its object model, and how that information is produced and consumed is well encapsulated inside the simulation component's implementation. However, in the current implementation of the HLA's Runtime Infrastructure (RTI), information hiding between groups of simulation components in a simulation federation is not addressed. In this paper, we discuss how hierarchical federations architectures can be used to tackle this problem. The hierarchical federations architecture adopted in this paper differs from the existing architectures in that it is based on a hybrid approach for inter-operability between simulation federations. To demonstrate the information hiding using the architecture, a distributed semiconductor supply-chain simulation is also described in the paper.
A new parallel algorithm for simulating Ising spin systems is presented. The sequential prototype is the n-fold way algorithm [2], which is efficient but is hard to parallelize using conservative methods. Our parallel...
详细信息
ISBN:
(纸本)9780769511047
A new parallel algorithm for simulating Ising spin systems is presented. The sequential prototype is the n-fold way algorithm [2], which is efficient but is hard to parallelize using conservative methods. Our parallel algorithm is optimistic. Unlike other optimistic algorithms, e.g., Time Warp, our algorithm is synchronous. It also belongs to the class of simulations known as “relaxation” [3]; hence it is named “synchronous relaxation.” We derive performance guarantees for this algorithm. If N is the number of PEs, then under weak assumptions we show that the number of correct events processed per unit of time is, on average, at least of order N/ log N. All communication delays, processing time, and busy waits are taken into account.
Optimistic techniques can improve the performance of discrete-event simulations, but one area where optimistic simulators have been unable to show performance improvement is in the simulation of parallel programs. Unf...
ISBN:
(纸本)9780769511047
Optimistic techniques can improve the performance of discrete-event simulations, but one area where optimistic simulators have been unable to show performance improvement is in the simulation of parallel programs. Unfortunately parallel program simulation using direct execution is difficult: the use of direct execution implies that the memory and computation requirements of the simulator are at least as large as that of the target application, which restricts the target systems and application problem sizes that can be studied. Memory usage is especially important for optimistic simulators due to the need for periodic state-saving and rollback. In our research we addressed this problem and have implemented a simulation library running a Time-Warp-based optimistic engine that uses direct execution to simulate and predict the performance of parallel MPI programs while attaining good simulation speedup. For programs with data sets too large to be directly executed with our optimistic simulator, we reduced the memory and computational needs of these programs by utilizing a static task graph and code-slicing methodology; an approach which also exhibited good performance speedup.
With fixed lookahead information in a simulation model, the overhead of asynchronous conservative parallelsimulation lies in the mechanism used for propagating time updates in order for logical processes to safely ad...
ISBN:
(纸本)9780769511047
With fixed lookahead information in a simulation model, the overhead of asynchronous conservative parallelsimulation lies in the mechanism used for propagating time updates in order for logical processes to safely advance their local simulation clocks. Studies have shown that a good scheduling algorithm should preferentially schedule processes containing events on the critical path. This paper introduces a lock-free algorithm for scheduling logical processes in conservative parallel discrete-event simulation on shred-memory multiprocessor machines. The algorithm uses fetch & add operations that help avoid inefficiencies associated with using locks. The lock-free algorithm is robust. Experiments show that, compared with the scheduling algorithm using locks, the lock-free algorithm exhibits better performance when the number of logical processes assigned to each processor is small or when the workload becomes significant. In models with large number of logical processes, our algorithm shows only modest increase in execution time due to the overhead in the algorithm for extra bookkeeping.
This paper addresses the issue of efficient and accurate performance prediction of large-scale message-passing applications on high performance architectures using simulation. Such simulators are often based on parall...
ISBN:
(纸本)9780769511047
This paper addresses the issue of efficient and accurate performance prediction of large-scale message-passing applications on high performance architectures using simulation. Such simulators are often based on parallel discrete event simulation, typically using the conservative protocol to synchronize the simulation threads. The paper considers how a compiler can be used to automatically extract information about the lookahead present in the application, and how this can be used to improve the performance of the null protocol used for synchronization. These techniques are implemented in the MPI-Sim simulator and dHPF compiler, which had previously been extended to work together for optimizing the simulation of local computational components of an application. The results shows that the availability of lookahead information improves the runtime of the simulator by factors ranging from 9% up to two orders of magnitude, with 30-60% improvements being typical for the real-world codes. The experiments also show that these improvements are directly correlated with reductions in the number of null messages required by the simulations.
In this work we present two different applications implemented on the neurocomputer Totem Nc3001 from Neuricam Inc. The goal of the experimentation is to test, on real problems, the performance of this powerful parall...
详细信息
ISBN:
(纸本)0769509878
In this work we present two different applications implemented on the neurocomputer Totem Nc3001 from Neuricam Inc. The goal of the experimentation is to test, on real problems, the performance of this powerful parallel unit consisting of 32 Digital Signal Processors (DSPs) and to evaluate its suitability to neural network applications. The first problem implemented is a typical classification algorithm in which the network recognises which points belong to different regions inside a 2D space. The second problem is more computationally heavy and consists of a network able to reproduce the eye movements, if properly stimulated. A comparison is reported between Matlab implementations or handwritten code run on workstations and the performance obtained from the Totem chip.
Great effort has been devoted to the design of optimized checkpointing strategies for optimistic parallel discrete event simulators. On the other hand, there is less work being done in the direction of improving the e...
详细信息
ISBN:
(纸本)076951104X
Great effort has been devoted to the design of optimized checkpointing strategies for optimistic parallel discrete event simulators. On the other hand, there is less work being done in the direction of improving the execution mode of any single checkpoint operation. Specifically, checkpoint operations are typically charged to the CPU, thus leading to freezing of the simulation application while checkpointing is in progress, i.e. the execution mode of the checkpointing protocol is typically synchronous. In this paper, we focus on improvements of the execution mode and present a software architecture, designed for Myrinet-based networks of workstations (NOWs), to avoid application freezing during any checkpoint operation, thus moving the execution itself towards an asynchronous mode. This is done by charging checkpoint operations to a hardware component that is distinct from the CPU, namely a DMA (direct memory access) engine. On the other hand, totally asynchronous checkpointing could suffer from data inconsistency whenever the content of a state buffer is accessed for further modifications while a checkpoint operation involving it has not yet completed. To avoid this, the architecture includes functionalities for resynchronization on demand. We have used these functionalities to implement an execution mode of the checkpointing protocol that we refer to as semi-asynchronous. By the results of an experimental study, we argue that the semi-asynchronous mode can be an effective solution to almost completely remove the delay associated with any checkpoint operation from the completion time of the simulation.
The authors discuss a novel direction for evaluating system dependability. The research efforts outlined are part of a special focus concerning the dependability of complex distributed systems at the Department of Com...
详细信息
ISBN:
(纸本)0769510868
The authors discuss a novel direction for evaluating system dependability. The research efforts outlined are part of a special focus concerning the dependability of complex distributed systems at the Department of Computer Engineering of the Federal Armed Forces University in Munich, Germany. Our approach emphasizes the behavior of components after they have been embedded in a system. We use the notion of effective values for the analysis and estimation of key dependability parameters. The concept is targeted to unify performance characterization with the dependability. Besides analysis, we report on some simulation results, which were in favor of our assumptions.
暂无评论