Networks of workstations have become a popular architecture for distributedsimulation due to their high availability as opposed to specialized multiprocessor computers. Networks of workstations are also a well-suited...
详细信息
ISBN:
(纸本)076951104X;0769511058
Networks of workstations have become a popular architecture for distributedsimulation due to their high availability as opposed to specialized multiprocessor computers. Networks of workstations are also a well-suited framework for distributedsimulation systems based on the High Level Architecture (HLA). However, using workstations in a distributedsimulation system may eventually effect the availability of computing resources for the users who need their computers as working tools. Thus, for coarse grained distributedsimulation it may be desirable to let the users control to what extent their workstations should participate in a distributedsimulation. In this paper, we present a resource sharing system (RSS) that provides a client user interface on each potentially participating workstation. With the RSS clients, users of workstations can control the availability of their computer for the HLA simulation federation. An RSS manager keeps track of available computing resources and balances the participating HLA federates among the available workstations.
Optimistic techniques can improve the performance of discrete-event simulations, but one area where optimistic simulators have been unable to show performance improvement is in the simulation of parallel programs. Unf...
详细信息
ISBN:
(纸本)0769511058
Optimistic techniques can improve the performance of discrete-event simulations, but one area where optimistic simulators have been unable to show performance improvement is in the simulation of parallel programs. Unfortunately, parallel program simulation using direct execution is difficult;the use of direct execution implies that the memory and computation requirements of the simulator are at least as large as that of the target application, which restricts the target systems and application problem sizes that can be studied. Memory usage is especially important for optimistic simulators due to the need for periodic state-saving and rollback. In our research we addressed this problem and have implemented a simulation library running a Time-Warp-based optimistic engine that uses direct execution to simulate and predict the performance of parallel MPI programs while attaining good simulation speedup. For programs with data sets too large to be directly executed with our optimistic simulator, we reduced the memory and computational needs of these programs by utilizing a static task graph and code-slicing methodology, an approach which also exhibited good performance speedup.
With fixed lookahead information in a simulation model, the overhead of asynchronous conservative parallelsimulation lies in the mechanism used for propagating time updates in order for logical processes to safety ad...
详细信息
ISBN:
(纸本)076951104X;0769511058
With fixed lookahead information in a simulation model, the overhead of asynchronous conservative parallelsimulation lies in the mechanism used for propagating time updates in order for logical processes to safety advance their local simulation clocks. Studies have shown that a good scheduling algorithm should preferentially schedule processes containing events on the critical path. This paper introduces a lock-free algorithm for scheduling logical processes in conservative parallel discrete-event simulation on shared-memory multiprocessor machines. The algorithm uses fetch&add operations that help avoid inefficiencies associated with using locks. The lock-free algorithm is robust. Experiments show that, compared with the scheduling algorithm using locks, the lock-free algorithm exhibits better performance when the number of logical processes assigned to each processor is small or when the workload becomes significant. In models with large number of logical processes, our algorithm shows only modest increase in execution time due to the overhead in the algorithm for extra bookkeeping.
Strong reasons exist for executing a large-scale discrete-event simulation on a cluster of processor nodes (each of which may be a shared-memory multiprocessor or a uniprocessor). This is the architecture of the large...
详细信息
ISBN:
(纸本)076951104X;0769511058
Strong reasons exist for executing a large-scale discrete-event simulation on a cluster of processor nodes (each of which may be a shared-memory multiprocessor or a uniprocessor). This is the architecture of the largest scale parallel machines, and so the largest simulation problems can only be solved this way. It is a common architecture even in less esoteric settings, and is suitable for memory-bound simulations. This paper describes our approach to porting the SSF simulation kernel to this architecture, using the Message Passing Interface (MPI) system. The notable feature of this transformation is to support an efficient two-level synchronization and communication scheme that addresses cost discrepancies between shared-memory and distributed memory. In the initial implementation, we use a globally synchronous approach between distributed-memory noes, and an asynchronous shared-memory approach within a SMP cluster. The SSF API reflects inherently shared-memory assumptions;we report therefore on our approach for porting an SSF kernel to a cluster of SMP nodes. Experimental results on two architectures are described, for a model of TCP/IP traffic flows over a hierarchical network. The performance on a distributed network of commodity SMPs connected through ethernet is seen to frequently exceed performance on a Sun shared-memory multiprocessor.
This paper addresses the issue of efficient and accurate performance prediction of large-scale message-passing applications on high performance architectures using simulation. Such simulators are often based on parall...
详细信息
ISBN:
(纸本)076951104X;0769511058
This paper addresses the issue of efficient and accurate performance prediction of large-scale message-passing applications on high performance architectures using simulation. Such simulators are often based on parallel discrete event simulation, typically using the conservative protocol to synchronize the simulation threads. The paper considers how a compiler can be used to automatically extract information about the lookahead present in the application, and how this can be used to improve the performance of the null protocol used for synchronization. These techniques are implemented in the MPI-Sim simulator and dHPF compiler, which had previously been extended to work together for optimizing the simulation of local computational components of an application. The results show that the availability of lookahead information improves the runtime of the simulator by factors of magnitude, with 30-60% improvements being typical for the real-world codes. The experiments also show that these improvements are directly correlated with reductions in the number of null messages required by the simulations.
Recently a new class of synchronization algorithms for parallel discrete event simulation has been proposed, namely the near perfect state information algorithms, which are based on a notion of error potential to cont...
详细信息
ISBN:
(纸本)076951104X;0769511058
Recently a new class of synchronization algorithms for parallel discrete event simulation has been proposed, namely the near perfect state information algorithms, which are based on a notion of error potential to control the optimism of event execution. An algorithms of this class, called elastic time algorithm (ETA), has been instantiated. In this algorithm, the error potentials computed using temporal information (next event timestamp, simulation clocks etc.) and is then translated into event execution delay based on a constant factor. In this paper we present a scaled version of ETA (SEAT), in which the error potential is translated into event execution delay based on both a constant factor and an additional scaling factor determined dynamically as a function of the event granularity. We have implemented versions of ETA and SETA for a cluster of PCs connected by a Myrinet switch and we have established in an empirical study that SETA outperforms ETA if there is difference in the granularity of different event types.
The Data Distribution Management (DDM) service is one of the six services provided in the Runtime Infrastructure (RTI) of High Level Architecture (HLA). Its purpose is to perform data filtering and reduce irrelevant d...
详细信息
ISBN:
(纸本)0769511058
The Data Distribution Management (DDM) service is one of the six services provided in the Runtime Infrastructure (RTI) of High Level Architecture (HLA). Its purpose is to perform data filtering and reduce irrelevant data communicated between federates. The two DDM schemes proposed for RTI, region-based and grid-based DDM, are oriented to send as little irrelevant data to subscribers as possible, but only manage to filter part of this information and some irrelevant data is still being communicated. In a previous paper [3], we employed intelligent agents to perform data filtering in HLA, implemented an agent-based DDM in RTI (ARTI) and compared it with the other two filtering mechanisms. This paper reports on additional experiments, results and analysis using two scenarios, the AWACS sensing aircraft simulation and the air traffic control simulation scenario Experimental results show that compared with other mechanisms, the agent-based approach communicates only relevant data and minimizes network communication, and is also comparable in terms of time efficiency. Some guidelines on when the agent-based scheme can be used are also given.
One of the six categories of management services provided in the Run Time Infrastructure (RTI) to federated simulations is Time Management. Currently, it provides only two message ordering policies, that is, time stam...
详细信息
ISBN:
(纸本)0769511058
One of the six categories of management services provided in the Run Time Infrastructure (RTI) to federated simulations is Time Management. Currently, it provides only two message ordering policies, that is, time stamp ordering and receive ordering. Temporal anomalies occurred during the execution of federation due to the heterogeneous latencies in the communication network are nor handled in receive ordering. While time stamp ordering eliminates the temporal anomalies entirely, it incurs great communication latency and huge bandwidth requirement. This paper presents a new time management mechanism which provides a less costly message ordering service, namely causal ordering, to federates. It does not require the specification of lookahead and allows federates that do not require stringent message ordering properties to achieve much more efficient execution. A series of experiments has been carried out to benchmark the performance of this new time management mechanism and the results show that it incurs a slight overhead compared to receive ordering mechanism but achieves significant performance improvement over time stamp ordering mechanism.
The use of Web caches makes it possible to offer a service of better quality to Internet users. Due to the difficulty in obtaining a real network of caches to experience with, most research projects use Web caches sim...
详细信息
ISBN:
(纸本)0769513484
The use of Web caches makes it possible to offer a service of better quality to Internet users. Due to the difficulty in obtaining a real network of caches to experience with, most research projects use Web caches simulators. This paper presents a Web cache simulator capable of executing simulations of distributed and cooperative Web caches servers. ne simulator allows the measurement of the impact of configuration parameters in the cache performance, making possible to design better cooperative cache networks. To show an example of the simulator use, simulations based in the Brazilian National Research Network - RNP are presented.
Great effort has been devoted to the design of optimized checkpointing strategies for optimistic parallel discrete event simulators. On the other hand there is less work in the direction to improve the execution mode ...
详细信息
ISBN:
(纸本)076951104X;0769511058
Great effort has been devoted to the design of optimized checkpointing strategies for optimistic parallel discrete event simulators. On the other hand there is less work in the direction to improve the execution mode of any single checkpoint operation. Specifically, checkpoint operations are typically charged to the CPU, thus leading to freezing of the simulation application while checkpointing is in progress, i.e. the execution mode of the checkpointing protocol is typically synchronous. In this paper we focus on improvements of the execution mode and present a software architecture, designed for myrinet based Network of Workstations (NOWs), to avoid application freezing during any checkpoint operation, thus moving the execution itself towards an asynchronous mode. This is done by charging checkpoint operations to a hardware component distinct from the CPU, namely a DMA engine. On the other hand, totally asynchronous checkpointing could suffer from data inconsistency whenever the content of a state buffer is accessed for further modifications while a checkpoint operation involving it is not yet completed. To avoid this, the architecture includes functionalities for resynchronization on demand. We have used these functionalities to implement an execution mode of the checkpointing protocol we refer to as semi-asynchronous. By the results of all experimental study we argue that the semi-asynchronous mode can be an effective solution to almost completely remove the delay associated with any checkpoint operation from the completion time of the simulation.
暂无评论