Computer numerical simulation is widely applied in engineering and social fields. It has shown great values in these fields. Small scale simulation applications can be processed on the traditional simulation computer....
详细信息
Computer numerical simulation is widely applied in engineering and social fields. It has shown great values in these fields. Small scale simulation applications can be processed on the traditional simulation computer. But with the size of problem increases the sequential processing of simulation can not meet the requirements. The dynamic real-time simulation and super real-time simulation require high performance simulation computer. In this paper we first analyse the structure of a classical simulation computer AD-100 which developed by ADI Inc., then a novel structure for simulation computer which adopts the MPP technology is proposed. At the end of this paper an experimental result is given to test the feasibility of parallel simulation processing.
We describe in this paper a partial evaluator for a parallel programming language. The parallel language we present is a combination of lambda calculus and message passing communication mechanism. By improving some te...
详细信息
ISBN:
(纸本)0818678763
We describe in this paper a partial evaluator for a parallel programming language. The parallel language we present is a combination of lambda calculus and message passing communication mechanism. By improving some techniques originally used for partial evaluation of sequential language-and introducing some new methods, we successfully solve the problems caused by some internal semantic differences between lambda cab culus and message passing in our partial evaluator for the parallel language.
The multithreaded processor - called Rhamma - uses a fast context switch to bridge latencies caused by memory accesses or by synchronization operations. Load/store, synchronization, and execution operations of differe...
详细信息
The multithreaded processor - called Rhamma - uses a fast context switch to bridge latencies caused by memory accesses or by synchronization operations. Load/store, synchronization, and execution operations of different threads of control are executed simultaneously by appropriate functional units. A fast context switch is performed whenever a functional unit comes across an operation that is destined for another unit. The overall performance depends on the speed of the context switch. We present two techniques to reduce the context switch cost to at most one processor cycle: A context switch is explicitly coded in the opcode, and a context switch buffer is used. The load/store unit shows up as the principal bottleneck. We evaluate four implementation alternatives of the load/store unit to increase processor performance.
A synchronous checkpointing algorithm coordinates a set of processes in taking checkpoints in such a way that the set of local checkpoints always forms part of a consistent global system state. Whenever a process p re...
详细信息
A synchronous checkpointing algorithm coordinates a set of processes in taking checkpoints in such a way that the set of local checkpoints always forms part of a consistent global system state. Whenever a process p requests to take a checkpoint, a set of processes, called the cohorts set of p, must be checked and some of them may also have to take their checkpoints in order to preserve system consistency. Although several synchronous checkpointing algorithms have been proposed in the literature, most of them do not address the performance issue. In this paper we propose an efficient distributed algorithm for synchronous checkpointing. Proof of correctness and analysis of efficiency of the algorithm are presented. It is shown that the algorithm has a better message and time complexity than the existing algorithms. The method proposed in this paper can also be applied to enhance the performance of rollback operation which always require synchronization of the inter-dependent processes.
Although MPEG-1 Video is a promising and the most widely used moving picture compression standard, it requires a lot of computational resources to encode the moving pictures with a reasonable frame size and quality. I...
详细信息
Although MPEG-1 Video is a promising and the most widely used moving picture compression standard, it requires a lot of computational resources to encode the moving pictures with a reasonable frame size and quality. In this paper, we propose and implement an efficient parallelizing scheme of MPEG-1 Video encoding algorithm on an Ethernet-connected workstations which is the most widely available computing environment nowadays. In this parallelizing scheme, the slice-level, frame-level, and GOP (Group of Pictures)-level parallelisms are identified as the attractive parallelisms that can be exploited in Ethernet-connected workstations. Three efficient parallel implementation schemes considering the communication characteristics of Ethernet-connected workstations are also proposed and experimented. A series of experiments using thirty workstations shows that the MPEG-1 Video encoding time can be reduced in proportional to the number of workstations used in encoding computations, although there is a saturation point in the speedup graphs.
Current advances in high-speed networks and WWW technologies have made network computing a cost-effective, high-performance computing alternative. New software tools are being developed to utilize efficiently the netw...
详细信息
Current advances in high-speed networks and WWW technologies have made network computing a cost-effective, high-performance computing alternative. New software tools are being developed to utilize efficiently the network computing environment. Our project, called Virtual distributedcomputing Environment (VDCE), is a high-performance computing environment that allows users to write and evaluate networked applications for different hardware and software configurations using a web interface. In this paper we present the software architecture of VDCE by emphasizing application development and specification, scheduling, and execution/runtime aspects.
In this paper we study the parallel aspects of PCGLS, a basic iterative method whose main idea is to organize the computation of conjugate gradient method with preconditioner applied to normal equations, and Incomplet...
详细信息
ISBN:
(纸本)0818678763
In this paper we study the parallel aspects of PCGLS, a basic iterative method whose main idea is to organize the computation of conjugate gradient method with preconditioner applied to normal equations, and Incomplete Modified Gram-Schmidt (IMGS) preconditioner for solving sparse least squares problems on massively paralleldistributed memory computers. The performance of these methods an this kind of architecture is always limited because of the global communication required for the inner products. We will describe the parallelization of PCGLS and lMGS preconditioner by two ways of improvement. One is To assemble the results of a number of inner products collectively and the other is to create situations where communication can be overlapped with computation. A theoretical model of computation and communication phases is presented which allows us to decide the number of processors that minimizes the runtime. Several numerical experiments on Parsytec GC/PowerPlus are presented.
Reliable multicast services in a group of autonomous distributed processes/sites are desirable to maintain the consistent state of shared information accessed by transactions in distributed systems. Many existing prot...
详细信息
Reliable multicast services in a group of autonomous distributed processes/sites are desirable to maintain the consistent state of shared information accessed by transactions in distributed systems. Many existing protocols are complicated and thus quite expensive and not efficient for availability of distributed systems. This paper discusses the design and implementations of a new logical token ring based multicast communications services. It provides total ordering, atomicity of multicast messages, membership and fault-tolerant services in the presence of sites fail stop and network partitioning. An unique feature of the protocol is that all members, knowing exactly, in the group, who holds the token, are able to detect right order of a multicast message, thereby, reducing the synchronous overhead, preventing possible token lose problem and minimizing control messages. The services are implemented by using finite state machine approach and they are highly efficient comparing with related services in the same network settings.
This paper presents a new structured parallel programming model, ''SEQ OF PAR'': based on the Communication Closed Layer (CCL) principle of causal composition for parallel programs and Bird-Meertens fo...
详细信息
ISBN:
(纸本)0818678763
This paper presents a new structured parallel programming model, ''SEQ OF PAR'': based on the Communication Closed Layer (CCL) principle of causal composition for parallel programs and Bird-Meertens formalism (Bh IF) of locality-based parallel computation. This model is to support for more general, architecture-independent parallel programming. It provides a structured approach to integrate task (or process) parallelism and data-parallelism in one framework. The well-founded algebra of CCL and BMF makes it also possible to derive, optimize and verify parallel programs through algebraic transformations. Experimental results show that it is very promising to adopt this programming model for getting efficient, portable parallel code.
Snapshot algorithms are fundamental for many distributed applications and must often be executed repeatedly. We present three snapshot algorithms The first one is based on the assumption of global time, if computes ch...
详细信息
ISBN:
(纸本)0818678763
Snapshot algorithms are fundamental for many distributed applications and must often be executed repeatedly. We present three snapshot algorithms The first one is based on the assumption of global time, if computes channel states using several schemes. Taking consistent cut for global time instant, we show that the algorithm is applicable for existing snapshot algorithms The second one is a real token passing based algorithm for non-FIFO asynchronous distributed systems Its message complexity of control messages is O(n). The last algorithm is the repeated version of the second one. Using this algorithm, processes can get consistent global states af their convenience concurrently.
暂无评论