A number of optimistic synchronization schemes for parallelsimulation rely upon a global synchronization. The problem is to determine when every processor has completed all its work, and there are no messages in tran...
详细信息
ISBN:
(纸本)1565550552
A number of optimistic synchronization schemes for parallelsimulation rely upon a global synchronization. The problem is to determine when every processor has completed all its work, and there are no messages in transit in the system that will cause more work. Most previous solutions to the problem have used distributed termination algorithms, which are inherently serial;other parallel mechanisms may be inefficient. In this paper we describe an efficient parallel algorithm derived from a common `barrier' synchronization algorithm used in parallel processing. The algorithm's principle attraction is speed, and generality - it is designed to be used in contexts more general than parallel discrete-event simulation. To establish our claim to speed, we compare our algorithm's performance with the standard barrier algorithm, and find that its additional costs are not excessive. Our experiments are conducted using up to 256 processors on the Intel Touchstone Delta.
simulation is a low cost and safe alternative to solve complex problems in various areas. To promote reuse and interoperability of simulation applications and link geographically dispersed simulation components, distr...
详细信息
ISBN:
(纸本)9780769528984
simulation is a low cost and safe alternative to solve complex problems in various areas. To promote reuse and interoperability of simulation applications and link geographically dispersed simulation components, distributedsimulation was introduced. The High Level Architecture (HLA) is the IEEE standard for distributedsimulation. To optimize communication efficiency between simulation components, HLA defines a Data Distribution Management (DDM) service group for filtering out unnecessary data exchange. It relies on the computation of overlap between update and subscription regions, which is called matching. In this paper we propose an efficient sort-based DDM matching algorithm for HLA applications with a large spatial environment. A theoretical analysis of our algorithm concludes that it should have good storage and computational scalability. The experimental results have verified the theoretical conclusions by showing that our algorithm has much less storage requirement than the original sort-based matching algorithm and generally has the best computational performance when compared with region-based and the original sort-based matching algorithms.
In order to guarantee correctness of simulations, conventional parallel Discrete Event simulation models impose a sequential mode of execution on the events belonging to a logical process (LP). This constraint, which ...
详细信息
In a distributedsimulation, simulation components of various types are executed at geographically different locations, forming a simulation federation to create a common virtual environment. Under the High Level Arch...
详细信息
In a distributedsimulation, simulation components of various types are executed at geographically different locations, forming a simulation federation to create a common virtual environment. Under the High Level Architecture (HLA), information that will be produced and consumed by a simulation component is defined in its object model, and how that information is produced and consumed is well encapsulated inside the simulation component's implementation. However, in the current implementation of the HLA's Runtime Infrastructure (RTI), information hiding between groups of simulation components in a simulation federation is not addressed. In this paper, we discuss how hierarchical federations architectures can be used to tackle this problem. The hierarchical federations architecture adopted in this paper differs from the existing architectures in that it is based on a hybrid approach for inter-operability between simulation federations. To demonstrate the information hiding using the architecture, a distributed semiconductor supply-chain simulation is also described in the paper.
The increase in complexity, diversity and scale of high performance computing environments, as well as the increasing sophistication of parallel applications and algorithms call for productivity-aware programming lang...
详细信息
ISBN:
(纸本)9781450393393
The increase in complexity, diversity and scale of high performance computing environments, as well as the increasing sophistication of parallel applications and algorithms call for productivity-aware programming languages for high-performance computing. Among them, the Chapel programming language stands out as one of the more successful approaches based on the Partitioned Global Address Space programming model. Although Chapel is designed for productive parallel computing at scale, the question of its competitiveness with well-established conventional parallel programming environments arises. To this end, this work compares the performance of Chapel-based fractal generation on shared- and distributed-memory platforms with corresponding OpenMP and MPI+X implementations. The parallel computation of the Mandelbrot set is chosen as a test-case for its high degree of parallelism and its irregular workload. Experiments are performed on a cluster composed of 192 cores using the French national testbed Grid'5000. Chapel as well as its default tasking layer demonstrate high performance in shared-memory context, while Chapel competes with hybrid MPI+OpenMP in distributed-memory environment.
In traditional optimistic distributedsimulation protocols, a logical process (LP) receiving a straggler rolls back and sends out anti-messages. Receiver of an anti-message may also roll back and send out more anti-me...
详细信息
In traditional optimistic distributedsimulation protocols, a logical process (LP) receiving a straggler rolls back and sends out anti-messages. Receiver of an anti-message may also roll back and send out more anti-messages. So a single straggler may result in a large number of anti-messages and multiple rollbacks of some LPs. In our protocol, an LP receiving a straggler broadcasts its rollback. On receiving this announcement, other LPs may roll back but they do not announce their rollbacks. So each LP rolls back at most once in response to each straggler. Anti-messages are not used. This eliminates the need for output queues and results in simple memory management. It also eliminates the problem of cascading rollbacks and echoing, and results in faster simulation. All this is achieved by a scheme for maintaining transitive dependency information. The cost incurred includes the tagging of each message with extra dependency information and the increased processing time upon receiving a message. We also present the similarities between the two areas of distributedsimulation and distributed recovery. We show how the solutions for one area can be applied to the other area.
Of critical importance to any real-time system is the issue of predictability. We divide overall system predictability into two parts: algorithmic and systemic. Algorithmic predictability is concerned with ensuring th...
ISBN:
(纸本)0769516084
Of critical importance to any real-time system is the issue of predictability. We divide overall system predictability into two parts: algorithmic and systemic. Algorithmic predictability is concerned with ensuring that the parallelsimulation engine and model from a complexity point of view are able to consistently yield results within a real-time deadline. Systemic predictability is concerned with ensuring that OS scheduling, interrupts and virtual memory overheads are consistent over a real-time period. To provide a framework for investigating systemic predictability, we define a new class of parallelsimulation called Extreme simulation or XSim. An XSim is any analytic parallelsimulation that is able to generate a statistically valid result by a real-time deadline. Typically, this deadline is between 10 and 100 milliseconds. XSims are expected to provide decision support to existing complex, realtime systems. As a new design and implementation methodology for realizing XSims, we embed a state-of-the-art optimistic simulator into the Linux operating system. In this operating environment, OS scheduling and interrupts are disabled. Given a 50 millisecond model completion deadline, we observe that the XSim has a systemic predictability, measure of 98% compared with only 56% for the same Time Warp system operating in user-level.
We propose a computing technique for efficient parallelsimulation of compute-intensive DEVS models on the IBM Cell processor, combining multi-grained parallelism and various optimizations to speed up the event execut...
详细信息
Can parallelsimulations efficiently exploit a network of workstations? Why haven't PDES models followed standard modeling methodologies? Will the field of PDES survive, and if so, in what form? Researchers in the...
详细信息
Can parallelsimulations efficiently exploit a network of workstations? Why haven't PDES models followed standard modeling methodologies? Will the field of PDES survive, and if so, in what form? Researchers in the PDES field have addressed these questions and others in a series of papers published in the last few years [1,2,3,4]. The purpose of this paper is to shed light on these questions, by documenting an actual case study of the development of an optimistically synchronized PDES application on a network of workstations. This paper is unique in that its focus is not necessarily on performance, but on the whole process of developing a model, from the physical system being simulated, through its conceptual design, validation, implementation, and, of course, its performance. This paper also presents the first reported performance results indicating the impact of risk on performance. The results suggest that the optimal value of risk is sensitive to the latency parameters of the communications network.
暂无评论