This paper characterizes the structure and resource requirements of the NAS parallel Benchmarks (NPB), a popular benchmark suite used to evaluate various parallel computers. The phase parallel model is used to obtain ...
详细信息
ISBN:
(纸本)0818678763
This paper characterizes the structure and resource requirements of the NAS parallel Benchmarks (NPB), a popular benchmark suite used to evaluate various parallel computers. The phase parallel model is used to obtain parameter values for memory, I/O, and communication latency and bandwidth requirements. These quantitative parameters are useful in the design and evaluation of various parallel computers. The results of this study is being used in designing Dawning 2000, which is NCIC's second generation MPP.
Cluster computing provides an attractive solution as a low cost parallelcomputing platform consisting of multiple interconnected workstations through commodity network technology as a replacement for an expensive par...
详细信息
ISBN:
(纸本)1932415610
Cluster computing provides an attractive solution as a low cost parallelcomputing platform consisting of multiple interconnected workstations through commodity network technology as a replacement for an expensive parallel computer. Recent advances in processor technologies made it possible to build a cluster of multi-processor machines more economically usually with Symmetric Multi-Processors. In cluster computing, message passing has been the most widely used programming method due to its simple and efficient programming model and the availability of the programming tools such as Message Passing Interface (MPI). On the other hand, Java technologies have also gained attention as an alternative platform for distributedcomputing thanks to its platform neutral features. In this paper, we present a java based portable parallel runtime system for cluster computing called VCluster and describe its architecture and features.
SRAM (static random access memory)-based pipelined algorithmic solutions have become competitive alternatives to TCAMs (ternary content addressable memories) for high-throughput IP lookup. Multiple pipelines can be ut...
详细信息
SRAM (static random access memory)-based pipelined algorithmic solutions have become competitive alternatives to TCAMs (ternary content addressable memories) for high-throughput IP lookup. Multiple pipelines can be utilized in parallel to improve the throughput further. However, several challenges must be addressed to make such solutions feasible. First, the memory distribution over different pipelines, as well as across different stages of each pipeline, must be balanced. Second, the traffic among these pipelines should be balanced. Third, the intra-flow packet order (i.e. the sequence) must be preserved. In this paper, we propose a parallel SRAM-based multi-pipeline architecture for IP lookup. A two-level mapping scheme is developed to balance the memory requirement among the pipelines as well as across the stages in each pipeline. To balance the traffic, we propose an early caching scheme to exploit the data locality inherent in the architecture. Our technique uses neither a large reorder buffer nor complex reorder logic. Instead, a flow-aware queuing scheme exploiting the flow information is used to maintain the intra-flow sequence. Extensive simulation using real-life traffic traces shows that the proposed architecture with 8 pipelines can achieve a throughput of up to 10 billion packets per second, i.e. 3.2 Tbps for minimum size (40 bytes) packets, while preserving intra-flow packet order. (c) 2009 Elsevier Inc. All rights reserved.
We describe in this paper a partial evaluator for a parallel programming language. The parallel language we present is a combination of lambda calculus and message passing communication mechanism. By improving some te...
详细信息
ISBN:
(纸本)0818678763
We describe in this paper a partial evaluator for a parallel programming language. The parallel language we present is a combination of lambda calculus and message passing communication mechanism. By improving some techniques originally used for partial evaluation of sequential language-and introducing some new methods, we successfully solve the problems caused by some internal semantic differences between lambda cab culus and message passing in our partial evaluator for the parallel language.
The parallel Virtual Machine (PVM) message passing system developed at Oak Ridge National Laboratories has gained widespread acceptance and usage in the parallel programming community. The authors describe the results...
详细信息
In the field of parallel FEM methods a number of highly efficient solutions for distributed memory systems are existing, but the passage to adaptive parallel FEM simulations leads, in all probability, to a more dynami...
详细信息
In the field of parallel FEM methods a number of highly efficient solutions for distributed memory systems are existing, but the passage to adaptive parallel FEM simulations leads, in all probability, to a more dynamic behaviour with respect to data placement and load balancing. Therefore shared-memory architecture seems to be a more appropriate solution for getting efficient implementations. This work presents a parallelized CG-method for shared memory systems which was implemented on a 4-processor SMP system and makes explicit use of shared memory to enhance the communication between different domains. It is based on an idea for implementing parallelization on distributed memory systems and represents an appropriate modification of this method. The results show that an increased synchronization expense can partially compensate the advantages of shared memory communication depending on the levels of refinement and the processor number.
We introduce a new parallel programming paradigm, namely synchronous parallel critical sections. Such parallel critical sections must be seen in the context of switching between synchronous and asynchronous modes of c...
详细信息
We introduce a new parallel programming paradigm, namely synchronous parallel critical sections. Such parallel critical sections must be seen in the context of switching between synchronous and asynchronous modes of computation. Thread farming allows to generate bunches of threads to solve independent subproblems asynchronously and in parallel. Opposed to that, synchronous parallel critical sections allow to organize bunches of asynchronous parallel threads to execute certain tasks jointly and synchronously. We show how the PRAM language Fork95 can be extended by a construct join supporting parallel critical sections. We explain its semantics and implementation, and discuss possible applications.
Snapshot algorithms are fundamental for many distributed applications and must often be executed repeatedly. We present three snapshot algorithms The first one is based on the assumption of global time, if computes ch...
详细信息
ISBN:
(纸本)0818678763
Snapshot algorithms are fundamental for many distributed applications and must often be executed repeatedly. We present three snapshot algorithms The first one is based on the assumption of global time, if computes channel states using several schemes. Taking consistent cut for global time instant, we show that the algorithm is applicable for existing snapshot algorithms The second one is a real token passing based algorithm for non-FIFO asynchronous distributed systems Its message complexity of control messages is O(n). The last algorithm is the repeated version of the second one. Using this algorithm, processes can get consistent global states af their convenience concurrently.
In the past couple of years, significant progress has been made in the development of message-passing libraries for parallel and distributedcomputing, and in the area of high-speed networking. These advances in compu...
详细信息
In the past couple of years, significant progress has been made in the development of message-passing libraries for parallel and distributedcomputing, and in the area of high-speed networking. These advances in computing technology have also led to a tremendous increase in the amount of data being manipulated and produced by scientific and commercial application programs. Despite their popularity, message-passing libraries only provide part of the support necessary for most high performance distributedcomputing applications - support for high speed parallel I/O is still lacking. In this paper, we provide an overview of the conceptual design of a parallel and distributed I/O file system, the Virtual parallel File System (VIP-FS), and describe its implementation. VIP-FS makes use of message-passing libraries to provide a parallel and distributed file system which can execute over multi-processor machines or heterogeneous network environments.
We present a fast parallel algorithm running in O(log(2) n) time on a CREW PRAM with O(n) processors for finding the kth longest path in a given tree of n vertices (with Theta(n(2)) intervertex distances). Our algorit...
详细信息
ISBN:
(纸本)0818678763
We present a fast parallel algorithm running in O(log(2) n) time on a CREW PRAM with O(n) processors for finding the kth longest path in a given tree of n vertices (with Theta(n(2)) intervertex distances). Our algorithm is obtained by efficient parallelization of a sequential algorithm which is a variant of both Megiddo et al's algorithm [12] and Fredrickson et al's algorithm [3] based on centroid decomposition of tree and succinct representation of the set of intervertex distances. With the same time and space bound as the best known result, our sequential algorithm maintains a shorter length of the decomposition tree.
暂无评论