In this paper, we consider the problem of partitioning a conservative parallelsimulation for execution on a multi-computer. The synchronization protocol makes use of null messages [6]. We propose the use of a simulat...
ISBN:
(纸本)9781565550278
In this paper, we consider the problem of partitioning a conservative parallelsimulation for execution on a multi-computer. The synchronization protocol makes use of null messages [6]. We propose the use of a simulated annealing algorithm with an adaptive search schedule to find good (sub-optimal) partitions. The paper discusses the algorithm, its implementation and reports on the performance results of simulations of a partitioned FCFS queueing network model executed on iPSC/860 hypercube. The results obtained are compared with a random partitioning. They show that a partitioning which makes use of our simulated annealing algorithm results in a reduction of 25-35% of the running time of the simulations when compared to the running time of a random partition of the model.
With fixed lookahead information in a simulation model, the overhead of asynchronous conservative parallelsimulation lies in the mechanism used for propagating time updates in order for logical processes to safely ad...
ISBN:
(纸本)9780769511047
With fixed lookahead information in a simulation model, the overhead of asynchronous conservative parallelsimulation lies in the mechanism used for propagating time updates in order for logical processes to safely advance their local simulation clocks. Studies have shown that a good scheduling algorithm should preferentially schedule processes containing events on the critical path. This paper introduces a lock-free algorithm for scheduling logical processes in conservative parallel discrete-event simulation on shred-memory multiprocessor machines. The algorithm uses fetch & add operations that help avoid inefficiencies associated with using locks. The lock-free algorithm is robust. Experiments show that, compared with the scheduling algorithm using locks, the lock-free algorithm exhibits better performance when the number of logical processes assigned to each processor is small or when the workload becomes significant. In models with large number of logical processes, our algorithm shows only modest increase in execution time due to the overhead in the algorithm for extra bookkeeping.
We divide potential NII (National Information Infrastructure) services into five broad areas: Collaboration and televirtuality; InfoVISiON (Information, Video, Imagery, and simulation on Demand), and digital libraries...
详细信息
We divide potential NII (National Information Infrastructure) services into five broad areas: Collaboration and televirtuality; InfoVISiON (Information, Video, Imagery, and simulation on Demand), and digital libraries; commerce; metacomputing; WebTop productivity services. The latter denotes the broad suite of tools we expect to be offered on the Web in a general environment we term WebWindous. We review current and future World Wide Web technologies, which could underlie these services. In particular we suggest an integration framework WebWork for high performance (parallel and distributed) computing and the NII. We point out that pervasive WebWork and WebWindows technologies will enable, facilitate and substantially accelerate such complex software processes on the NII. We briefly analyze seven broad application areas: society; business enterprises; health care; defense command and control, and crisis management; education; collaboratory; manufacturing. We contrast their use of NII services with a more detailed examination of the manufacture of complex systems, such as aircraft and automobiles. This application stresses the NII but there is a remarkable opportunity to develop new manufacturing practices that offer cost savings and reduced time to market.
There are many mapping schemes proposed in previous research on parallel proxy servers. The operations of these schemes are mainly URL-based, and therefore cannot fully benefit from the new persistent connection featu...
详细信息
There are many mapping schemes proposed in previous research on parallel proxy servers. The operations of these schemes are mainly URL-based, and therefore cannot fully benefit from the new persistent connection feature of HTTP/1.1. We propose a site-based mapping scheme that forwards all requests targeting on the same Web site to the same proxy server. The scheme then allows the proxy to use a single persistent connection to serve many client requests. The major advantage of the scheme is the reduction in the number of connection establishments. This reduction can save network bandwidth and reduce the user-experienced latency. simulation results show that the proposed site-based scheme reduces 40%-70% of the connection setups and teardowns when compared to a traditional URL-based mapping scheme.
This paper focuses on conservative simulation using distributed-shared memory for inter-processor communication. JavaSpaces, a special service of Java Jini, provides a shared persistent memory for simulation message c...
ISBN:
(纸本)9780769516080
This paper focuses on conservative simulation using distributed-shared memory for inter-processor communication. JavaSpaces, a special service of Java Jini, provides a shared persistent memory for simulation message communication among processors. Two benchmark programs written using our SPaDES/Java parallelsimulation library are used. The first program is a linear pipeline system representing a loosely-coupled open system. The PHOLD program represents a strongly-connected closed system. Experiments are carried out using a cluster of Pentium II PCs. We used a combination of Wood Turner carrier null, flushing and demand-driven algorithms for null message synchronization. To optimize message communication, we replace SPaDES/Java inter-processor communication implemented using Java's Remote Method Invocation (RMI) with one JavaSpace. For PHOLD (16x16, 16) running on eight processors, this change reduces simulation runtime by more than half, null message overhead reduces by a further 15%, and event rate more than doubled. Based on our memory analysis methodology, the memory cost of null message synchronization for PHOLD is less than 9% of the total memory needed by the simulation.
Decentralized control is composed of more than two subsystems. Subsystems communicate each other to control the whole system. In this paper, information of mechanical constraint for a parallel manipulator is analyzed....
详细信息
Decentralized control is composed of more than two subsystems. Subsystems communicate each other to control the whole system. In this paper, information of mechanical constraint for a parallel manipulator is analyzed. When some joints of multi-degrees of freedom parallel manipulators are set to be passive, excessive interference force can be reduced. A case where one joint of 3-linked arms is set to be passive is treated. The influence of a passive joint to the tip of the arm is derived by the configuration and the calculated reference torque. In the active joints, the interference force information from other joints are used to compensate the effect of passive joint. Such information is also used for switching passive joints. As a result, the avoidance of singular point is realized. Taking advantage of a parallel manipulator which has redundant drive joints, fault compensation is achieved by extending the communication of subsystems. Comparison with centralized control represents the availability of the decentralized control system by simulation.
In a large-scale distributedsimulation with thousands of dynamic objects, efficient communication of data among these objects is an important issue. The broadcasting mechanism specified by the distributed Interactive...
详细信息
ISBN:
(纸本)9780769506678
In a large-scale distributedsimulation with thousands of dynamic objects, efficient communication of data among these objects is an important issue. The broadcasting mechanism specified by the distributed Interactive simulation (DIS) standards is not suitable for large scale distributed *** the High Level Architecture (HLA) paradigm, the Runtime Infrastructure (RTI) provides a set of services, such as data distribution management (DDM) among federates. The goal of the DDM module in RTI is to make the data communication more efficient by sending the data only to those federates that need the data, as opposed to the broadcasting mechanism employed by *** DDM schemes have appeared in the literature. In this paper, we discuss grid-based DDM and develop a DDM model that uses grids for matching the publishing/subscription regions, and for data filtering. We show that appropriate choice of the grid-cell size is crucial in obtaining good performance. We develop an analytical model and derive a formula for identifying the optimal cell size in grid-based DDM.
This paper discusses distributed checkpointing with "Time Warp techniques", a typical uncoordinated checkpointing technique that is often used in the parallel and distributedsimulations. Relaxing the assump...
详细信息
This paper discusses distributed checkpointing with "Time Warp techniques", a typical uncoordinated checkpointing technique that is often used in the parallel and distributedsimulations. Relaxing the assumption of the previous model of Soliman et al., we show a discrete time model where the number of available checkpoints each process can hold is finite. In addition, we propose an adaptive distributed checkpointing technique, that gives an effective time arrangement of checkpoints for a recovery point distribution, and we give numerical examples.
This paper presents two new versions of the Critical Channel Traversing (CCT) algorithm. CCT is a conservative parallel discrete event simulation algorithm that has been shown to achieve very high performance when use...
详细信息
ISBN:
(纸本)9780769516080
This paper presents two new versions of the Critical Channel Traversing (CCT) algorithm. CCT is a conservative parallel discrete event simulation algorithm that has been shown to achieve very high performance when used in a wide area computer network simulator. The first of the new algorithms called simple sender side CCT is similar to the original, but busy waiting is eliminated. Results presented show that simple sender side CCT avoids performance problems that can be caused by busy *** second new algorithm called receive side CCT employs a different strategy for updating channel clocks and determining which objects should be scheduled on critical channels. Performance results show that this version provides better scaling with respect to the connectivity of the model, at the expense of some added complexity.
This paper presents the design and implementation of a reinforcement learning agent that automatically selects appropriate loop scheduling algorithms for parallel loops embedded in time-stepping scientific application...
详细信息
This paper presents the design and implementation of a reinforcement learning agent that automatically selects appropriate loop scheduling algorithms for parallel loops embedded in time-stepping scientific applications executing on clusters. There may be a number of such loops in an application, and the loops may have different load balancing requirements. Further, loop characteristics may also change as the application progresses. Following a model-free learning approach, the learning agent assigned to a loop selects from a library the best scheduling algorithm for the loop during the lifetime of the application. The utility of the learning agent is demonstrated by its successful integration into the simulation of wave packets - an application arising from quantum mechanics. Results of statistical analysis using pairwise comparison of means on the running time of the simulation with and without the learning agent validate the effectiveness of the agent in improving the parallel performance of the simulation.
暂无评论