We describe a unique approach to improving the performance of a Web clipping portal by exploiting inherent parallelism in the syntax of widely used markup languages, and by employing a parallel computing platform as a...
详细信息
ISBN:
(纸本)1581137664
We describe a unique approach to improving the performance of a Web clipping portal by exploiting inherent parallelism in the syntax of widely used markup languages, and by employing a parallel computing platform as an in-line proxy between the handheld mobile device and a Web server on the Internet.
In this paper, a new event scheduling mechanism XEQ and a new rollback procedure rb-messages are proposed for use in optimistic logic simulation. We incorporate both of these techniques in a simulator XTW. XTW groups ...
详细信息
ISBN:
(纸本)0769523838
In this paper, a new event scheduling mechanism XEQ and a new rollback procedure rb-messages are proposed for use in optimistic logic simulation. We incorporate both of these techniques in a simulator XTW. XTW groups LPs into clusters, and makes use of a multi-level queue,XEQ, to schedule events in the cluster. XEQ has an O(1) event scheduling time complexity. Our new rollback mechanism replaces the use of anti-messages by an rb-message, and eliminates the need for an output queue at each LP. Experimental comparisons to Time Warp reveal a superior performance on the part of XTW, while experimental results over large circuits (5-million-gate to 25-million-gate) shows XTW scales well with both the size of circuits and the number of processors.
This paper examines the cost/performance of simulating a hypothetical target parallel computer using a commercial host parallel computer. We address the question of whether parallelsimulation is simply faster than se...
详细信息
ISBN:
(纸本)9781565550278
This paper examines the cost/performance of simulating a hypothetical target parallel computer using a commercial host parallel computer. We address the question of whether parallelsimulation is simply faster than sequential simulation, or if it is also more cost-effective. To answer this, we develop a performance model of the Wisconsin Wind Tunnel (WWT), a system that simulates cache-coherent shared-memory machines on a message-passing Thinking Machines CM-5. The performance model uses Kruskal and Weiss's fork-join model to account for the effect of event processing time variability on WWT's conservative fixed-window simulation algorithm. A generalization of Thiebaut and Stone's footprint model accurately predicts the effect of cache interference on the CM-5. The model is calibrated using parameters extracted from a fully-parallelsimulation (p=N), and validated by measuring the speedup as the number of processors (p) ranges from one to the number of target nodes (N. Together with simple cost models, the performance model indicates that for target system sizes of 32 nodes and larger, parallelsimulation is more cost-effective than sequential simulation. The key intuition behind this result is that large simulations require large memories, which dominate the cost of a uniprocessor; parallel computers allow multiple processors to simultaneously access this large memory.
One of the significant difficulties in partitioning logic circuits for distributedsimulation is the lack of a priori knowledge concerning the evaluation frequency of individual circuit elements. A number of researche...
详细信息
ISBN:
(纸本)1565550277
One of the significant difficulties in partitioning logic circuits for distributedsimulation is the lack of a priori knowledge concerning the evaluation frequency of individual circuit elements. A number of researchers have resorted to pre-simulation to estimate these evaluation frequencies. In this paper we empirically investigate the wisdom of relying on presimulation results, and evaluate the degree to which early evaluation frequencies predict later evaluation frequencies. The results show that, for simulations that use random input vectors, pre-simulation has clear merit in predicting circuit element evaluation frequency. This supports the use of pre-simulation as an input to circuit partitioning algorithms.
The proceedings contain 16 papers. The topics discussed include: distributedparallel computing using windows desktop systems;a load balancing tool for distributedparallel loops;a diskless checkpointing algorithm for...
ISBN:
(纸本)0769519849
The proceedings contain 16 papers. The topics discussed include: distributedparallel computing using windows desktop systems;a load balancing tool for distributedparallel loops;a diskless checkpointing algorithm for super-scale architectures applied to the fast Fourier transform;dynamic replication to improve input/output scalability of genomic alignment;towards a grid-based architecture for traditional Chinese medicine;towards a grid-based architecture for traditional Chinese medicine;issues in runtime algorithm selection for grid environments;resource co-allocation for parallel tasks in computational grids;scalable state replication with weak consistency;runtime support for changing the communication model in large scale applications;impact of admission and cache replacement policies on response times of jobs on data grids;and distributing simulation work based on component activity: a new approach to partitioning hierarchical DEVS models.
Compared to highly optimized optimistic simulators which use local event queues for individual processors on a shared-memory computer, we demonstrate that employing a single global event queue drastically reduces the ...
详细信息
Compared to highly optimized optimistic simulators which use local event queues for individual processors on a shared-memory computer, we demonstrate that employing a single global event queue drastically reduces the number of rollbacks, brings down the storage requirements, and achieves superior load balance. On a bus-based Silicon Graphics multiprocessor, these virtues consistently translated into faster execution times and higher speedups on those synthetic networks of medium- to coarse-grained logical processes which were ridden with rollbacks and load imbalance on local-queue-based simulators. A dynamic randomization-based load distribution scheme for local-event-queue simulators is also shown to be an effective improvement.
Rapid progress in the design of fast CPU chips has outstripped progress in memory and cache performance. Optimistic algorithms would seem to be more vulnerable to poor memory performance because they require extra mem...
详细信息
Rapid progress in the design of fast CPU chips has outstripped progress in memory and cache performance. Optimistic algorithms would seem to be more vulnerable to poor memory performance because they require extra memory for state saving and anti-messages. We examine the performance of both optimistic and conservative protocols in controlled experiments to evaluate the effects of memory speed and cache size, using a variety of applications.
In message passing environments, the message send time is dominated by overheads that are relatively independent of the message size. Therefore, fine-grained applications (such as Time-Warp simulators) suffer high ove...
详细信息
In message passing environments, the message send time is dominated by overheads that are relatively independent of the message size. Therefore, fine-grained applications (such as Time-Warp simulators) suffer high overheads because of frequent communication. In this paper, we investigate the optimization of the communication subsystem of Time-Warp simulators using dynamic message aggregation. Under this scheme, Time-Warp messages with the same destination LP, occurring in close temporal proximity are dynamically aggregated and sent as a single physical message. Several aggregation strategies that attempt to minimize the communication overhead without harming the progress of the simulation (because of messages being delayed) are developed. The performance of the strategies is evaluated for a network of workstations, and an SMP, using a number of applications that have different communication behavior.
We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallelsimulation of continuous-time Markov chains. This paper reviews the basic method an...
详细信息
ISBN:
(纸本)1565550552
We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallelsimulation of continuous-time Markov chains. This paper reviews the basic method and compares four different methods based on uniformization, evaluating their strengths and weaknesses as a function of problem characteristics. The methods vary in their use of optimism, logical aggregation, communication management, and adaptivity. Performance evaluation is conducted on the Intel Touchstone Delta multiprocessor, using up to 256 processors.
We present a novel synchronisation algorithm for distributed discrete-event simulation (DDES), called the Area Virtual Time (AVT) algorithm. We first expose two orthogonal ideas of the synchronisation policy for DDES,...
详细信息
ISBN:
(纸本)0769513484
We present a novel synchronisation algorithm for distributed discrete-event simulation (DDES), called the Area Virtual Time (AVT) algorithm. We first expose two orthogonal ideas of the synchronisation policy for DDES, which is either conservative or optimistic, and the time-keeping mechanism, which is based on either Local or Global Virtual Times. The AVT algorithm is based on a network of virtual time regions, which is a happy medium between the Local Virtual Time (LVT) and the Global Virtual Time (GVT). The AVT algorithm permits the different parts of the simulation model to run either under LVT or GVT timekeeping mechanisms. This is particularly suited to models which are less than homogeneous. In those cases, mapping the models entirely to either one of the time-keeping schemes would not be efficient;or, the real-time nature of the interfaces precludes the use of GVT in those parts of the model. Our results demonstrate that the AVT algorithm progresses the simulation times faster than either the LVT or the GVT schemes, and is less sensitive to variations in some key model and communication parameters - a desirable property in distributed computation.
暂无评论