This paper describes a parallel proximity detection algorithm and illustrates its application to the problem of conflict detection in an aviation simulation. The algorithm invokes a previously designed sequential func...
详细信息
ISBN:
(纸本)0769511058
This paper describes a parallel proximity detection algorithm and illustrates its application to the problem of conflict detection in an aviation simulation. The algorithm invokes a previously designed sequential function in parallel, using spatial information acquired during the traversal of a quad tree, to keep the separate invocations of the function as independent as possible. The method is generally applicable to any function (not just conflict detection) whose arguments are spatially organized. Empirical results shoe that a single-threaded version of the algorithm sped up the simulation by 57%, while a four-threaded parallel version extracted 30% of the remaining additional speedup. These results are even more noteworthy given that the architecture of the simulation remains intact: we only replace the invocation mechanism for one function.
A new parallel algorithm for simulating Ising spin systems is presented. The sequential prototype is the n-fold way algorithm [2], which is efficient but is hard to parallelize using conservative methods. Our parallel...
详细信息
ISBN:
(纸本)076951104X;0769511058
A new parallel algorithm for simulating Ising spin systems is presented. The sequential prototype is the n-fold way algorithm [2], which is efficient but is hard to parallelize using conservative methods. Our parallel algorithm is optimistic. Unlike other optimistic algorithms, e.g., Time Warp, our algorithm is synchronous. It also belongs to the class of simulations known as "relaxation" [3];hence it is named "synchronous relaxation." We derive performance guarantees for this algorithm. If N is the number of PEs, then under weak assumptions we show that the number of correct events processed per unit of time is, on average, at least of order N/logN. All communication delays, processing time, and busy waits are taken into account.
In this paper we study the influence of spatio-temporal correlations on the dynamic runtime behaviour of the optimistic parallel Time Warp simulation method. By using the Ising spin model, we show experimentally that ...
详细信息
ISBN:
(纸本)0769511058
In this paper we study the influence of spatio-temporal correlations on the dynamic runtime behaviour of the optimistic parallel Time Warp simulation method. By using the Ising spin model, we show experimentally that the distribution of the number of rolled back events behaves as a power-law distribution over a large range of sub-critical Ising temperatures and decays exponentially for super-critical Ising temperatures. For critical Ising temperatures, where long-range correlations occur, the computational complexity of Time Warp and physical complexity of the Ising spin model are entangled and contribute both to the runtine behavior in a nonlinear way.
Networks of workstations have become a popular architecture for distributedsimulation due to their high availability as opposed to specialized multiprocessor computers. Networks of workstations are also a well-suited...
详细信息
ISBN:
(纸本)076951104X;0769511058
Networks of workstations have become a popular architecture for distributedsimulation due to their high availability as opposed to specialized multiprocessor computers. Networks of workstations are also a well-suited framework for distributedsimulation systems based on the High Level Architecture (HLA). However, using workstations in a distributedsimulation system may eventually effect the availability of computing resources for the users who need their computers as working tools. Thus, for coarse grained distributedsimulation it may be desirable to let the users control to what extent their workstations should participate in a distributedsimulation. In this paper, we present a resource sharing system (RSS) that provides a client user interface on each potentially participating workstation. With the RSS clients, users of workstations can control the availability of their computer for the HLA simulation federation. An RSS manager keeps track of available computing resources and balances the participating HLA federates among the available workstations.
Optimistic techniques can improve the performance of discrete-event simulations, but one area where optimistic simulators have been unable to show performance improvement is in the simulation of parallel programs. Unf...
详细信息
ISBN:
(纸本)0769511058
Optimistic techniques can improve the performance of discrete-event simulations, but one area where optimistic simulators have been unable to show performance improvement is in the simulation of parallel programs. Unfortunately, parallel program simulation using direct execution is difficult;the use of direct execution implies that the memory and computation requirements of the simulator are at least as large as that of the target application, which restricts the target systems and application problem sizes that can be studied. Memory usage is especially important for optimistic simulators due to the need for periodic state-saving and rollback. In our research we addressed this problem and have implemented a simulation library running a Time-Warp-based optimistic engine that uses direct execution to simulate and predict the performance of parallel MPI programs while attaining good simulation speedup. For programs with data sets too large to be directly executed with our optimistic simulator, we reduced the memory and computational needs of these programs by utilizing a static task graph and code-slicing methodology, an approach which also exhibited good performance speedup.
With fixed lookahead information in a simulation model, the overhead of asynchronous conservative parallelsimulation lies in the mechanism used for propagating time updates in order for logical processes to safety ad...
详细信息
ISBN:
(纸本)076951104X;0769511058
With fixed lookahead information in a simulation model, the overhead of asynchronous conservative parallelsimulation lies in the mechanism used for propagating time updates in order for logical processes to safety advance their local simulation clocks. Studies have shown that a good scheduling algorithm should preferentially schedule processes containing events on the critical path. This paper introduces a lock-free algorithm for scheduling logical processes in conservative parallel discrete-event simulation on shared-memory multiprocessor machines. The algorithm uses fetch&add operations that help avoid inefficiencies associated with using locks. The lock-free algorithm is robust. Experiments show that, compared with the scheduling algorithm using locks, the lock-free algorithm exhibits better performance when the number of logical processes assigned to each processor is small or when the workload becomes significant. In models with large number of logical processes, our algorithm shows only modest increase in execution time due to the overhead in the algorithm for extra bookkeeping.
Strong reasons exist for executing a large-scale discrete-event simulation on a cluster of processor nodes (each of which may be a shared-memory multiprocessor or a uniprocessor). This is the architecture of the large...
详细信息
ISBN:
(纸本)076951104X;0769511058
Strong reasons exist for executing a large-scale discrete-event simulation on a cluster of processor nodes (each of which may be a shared-memory multiprocessor or a uniprocessor). This is the architecture of the largest scale parallel machines, and so the largest simulation problems can only be solved this way. It is a common architecture even in less esoteric settings, and is suitable for memory-bound simulations. This paper describes our approach to porting the SSF simulation kernel to this architecture, using the Message Passing Interface (MPI) system. The notable feature of this transformation is to support an efficient two-level synchronization and communication scheme that addresses cost discrepancies between shared-memory and distributed memory. In the initial implementation, we use a globally synchronous approach between distributed-memory noes, and an asynchronous shared-memory approach within a SMP cluster. The SSF API reflects inherently shared-memory assumptions;we report therefore on our approach for porting an SSF kernel to a cluster of SMP nodes. Experimental results on two architectures are described, for a model of TCP/IP traffic flows over a hierarchical network. The performance on a distributed network of commodity SMPs connected through ethernet is seen to frequently exceed performance on a Sun shared-memory multiprocessor.
This paper addresses the issue of efficient and accurate performance prediction of large-scale message-passing applications on high performance architectures using simulation. Such simulators are often based on parall...
详细信息
ISBN:
(纸本)076951104X;0769511058
This paper addresses the issue of efficient and accurate performance prediction of large-scale message-passing applications on high performance architectures using simulation. Such simulators are often based on parallel discrete event simulation, typically using the conservative protocol to synchronize the simulation threads. The paper considers how a compiler can be used to automatically extract information about the lookahead present in the application, and how this can be used to improve the performance of the null protocol used for synchronization. These techniques are implemented in the MPI-Sim simulator and dHPF compiler, which had previously been extended to work together for optimizing the simulation of local computational components of an application. The results show that the availability of lookahead information improves the runtime of the simulator by factors of magnitude, with 30-60% improvements being typical for the real-world codes. The experiments also show that these improvements are directly correlated with reductions in the number of null messages required by the simulations.
Recently a new class of synchronization algorithms for parallel discrete event simulation has been proposed, namely the near perfect state information algorithms, which are based on a notion of error potential to cont...
详细信息
ISBN:
(纸本)076951104X;0769511058
Recently a new class of synchronization algorithms for parallel discrete event simulation has been proposed, namely the near perfect state information algorithms, which are based on a notion of error potential to control the optimism of event execution. An algorithms of this class, called elastic time algorithm (ETA), has been instantiated. In this algorithm, the error potentials computed using temporal information (next event timestamp, simulation clocks etc.) and is then translated into event execution delay based on a constant factor. In this paper we present a scaled version of ETA (SEAT), in which the error potential is translated into event execution delay based on both a constant factor and an additional scaling factor determined dynamically as a function of the event granularity. We have implemented versions of ETA and SETA for a cluster of PCs connected by a Myrinet switch and we have established in an empirical study that SETA outperforms ETA if there is difference in the granularity of different event types.
The Data Distribution Management (DDM) service is one of the six services provided in the Runtime Infrastructure (RTI) of High Level Architecture (HLA). Its purpose is to perform data filtering and reduce irrelevant d...
详细信息
ISBN:
(纸本)0769511058
The Data Distribution Management (DDM) service is one of the six services provided in the Runtime Infrastructure (RTI) of High Level Architecture (HLA). Its purpose is to perform data filtering and reduce irrelevant data communicated between federates. The two DDM schemes proposed for RTI, region-based and grid-based DDM, are oriented to send as little irrelevant data to subscribers as possible, but only manage to filter part of this information and some irrelevant data is still being communicated. In a previous paper [3], we employed intelligent agents to perform data filtering in HLA, implemented an agent-based DDM in RTI (ARTI) and compared it with the other two filtering mechanisms. This paper reports on additional experiments, results and analysis using two scenarios, the AWACS sensing aircraft simulation and the air traffic control simulation scenario Experimental results show that compared with other mechanisms, the agent-based approach communicates only relevant data and minimizes network communication, and is also comparable in terms of time efficiency. Some guidelines on when the agent-based scheme can be used are also given.
暂无评论