On a GPU cluster, the ratio of high computing power to communication bandwidth makes scaling breadth-first search (BFS) on a scale-free graph extremely challenging. By separating high and low out-degree vertices, we p...
详细信息
ISBN:
(纸本)9781538643686
On a GPU cluster, the ratio of high computing power to communication bandwidth makes scaling breadth-first search (BFS) on a scale-free graph extremely challenging. By separating high and low out-degree vertices, we present an implementation with scalable computation and a model for scalable communication for BFS and direction-optimized BFS. Our communication model uses global reduction for high-degree vertices, and point-to-point transmission for low-degree vertices. Leveraging the characteristics of degree separation, we reduce the graph size to one third of the conventional edge list representation. With several other optimizations, we observe linear weak scaling as we increase the number of GPUs, and achieve 259.8 GTEPS on a scale-33 Graph500 RMAT graph with 124 GPUs on the latest CORAL early access system.
Dedicated Cluster parallel Computers (DCPCs) are emerging as low-cost high performance environments for many important applications in science and engineering. A significant class of applications that perform well on ...
详细信息
ISBN:
(纸本)0818675829
Dedicated Cluster parallel Computers (DCPCs) are emerging as low-cost high performance environments for many important applications in science and engineering. A significant class of applications that perform well on a DCPC are coarse-grain applications that involve large amounts of file I/O. Current research in parallel file systems for distributed systems is providing a mechanism for adapting these applications to the DCPC environment. We present the parallel Virtual File System (PVFS), a system that provides disk striping across multiple nodes in a distributedparallel computer and file partitioning among tasks in a parallel program. PVFS is unique among similar systems in that it uses a streams-based approach that represents each file access with a single set of request parameters and decouples the number of network messages from details of the files striping and partitioning. PVFS also provides support for efficient collective file accesses and allows overlapping file partitions. We present results of early performance experiments that show PVFS achieves excellent speedups in accessing moderately sized file segments.
This paper is concerned with distributed receding horizon prediction for continuous-time linear stochastic systems with multiple sensors. A distributed fusion with the weighted sum structure is applied to the optimal ...
详细信息
ISBN:
(纸本)9781424443475
This paper is concerned with distributed receding horizon prediction for continuous-time linear stochastic systems with multiple sensors. A distributed fusion with the weighted sum structure is applied to the optimal local receding horizon predictors. The distributed prediction algorithm represents the optimal linear fusion by weighting matrices under the minimum mean square criterion. The algorithm has the parallel structure and allows parallelprocessing of observations making it reliable since the rest faultless sensors can continue to the fusion estimation if some sensors occur faulty. The derivation of equations for error cross-covariances between the local predictors is the key of this paper. Example demonstrates effectiveness of the distributed receding horizon predictor.
In view of the problems of centralized search engine and based on analyzing the technology of topic map, OAI, mobile Agent and P2P, we design and implement a Scalable distributed Retrieval Model based on Topic Map and...
详细信息
ISBN:
(纸本)9781424425105
In view of the problems of centralized search engine and based on analyzing the technology of topic map, OAI, mobile Agent and P2P, we design and implement a Scalable distributed Retrieval Model based on Topic Map and Mobile Agent. The model can effectively improve the utilization rate of resources, increase search depth, overcome server failure and heterogeneity of network, and has a strong capacity for expansion, fault-tolerant and parallelprocessing
Results on how to place a limited number of resources in two dimensional torus-based parallel systems are described. The resources are placed so that every non-resource node is within a given distance d from some reso...
详细信息
ISBN:
(纸本)0818684038
Results on how to place a limited number of resources in two dimensional torus-based parallel systems are described. The resources are placed so that every non-resource node is within a given distance d from some resource node. It is proved that the proposed methods are optimal in terms of reducing the maximum distance between the resource and the non-resource nodes. Simulation results show that the proposed methods are superior to the existing methods in terms of the average message latency.
Advances in optical technology have increased the interest for multiprocessor architectures based on lightwave networks because of the vast bandwidth available. in this paper we propose a passive star multi-hop lightw...
详细信息
ISBN:
(纸本)0818684038
Advances in optical technology have increased the interest for multiprocessor architectures based on lightwave networks because of the vast bandwidth available. in this paper we propose a passive star multi-hop lightwave network called stack-Kautz, based on the Kautz graph. We show that this architecture is very cost-effective with respect to its resources requirements. We also propose control protocols for accessing the optical passive star couplers, which improve on the bit complexity of the control sequence proposed in the literature for the Partitioned Optical Passive Star network Finally, we show through simulation that these control protocols efficiently implement shortest path routing on the stack-Kautz network.
The paper deals with the problem of parallel external integer sorting in the context of a class of heterogeneous clusters. We explore some techniques inherited from the homogeneous and in-core cases to show how they c...
详细信息
This paper proposes a distributed, dynamic processor sharing scheme in torus-connected multicomputer systems. It is applicable to database query and on-line transaction processing applications. In such a system, each ...
详细信息
This paper proposes a distributed, dynamic processor sharing scheme in torus-connected multicomputer systems. It is applicable to database query and on-line transaction processing applications. In such a system, each processor can process small transaction tasks locally and support parallel execution of large transaction tasks in a timesharing fashion. distributed management of processors is achieved by our scheme based on the distributed submesh table (DST) which describes how processors are clustered and how many time-slices that each cluster can provide.
Many applications require the derivatives of functions defined by computer programs. Automatic differentiation (AD) is a means of developing code to compute the derivatives of complicated functions accurately and effi...
详细信息
ISBN:
(纸本)0818684038
Many applications require the derivatives of functions defined by computer programs. Automatic differentiation (AD) is a means of developing code to compute the derivatives of complicated functions accurately and efficiently, without the difficulties associated with developing correct code by hand. We discuss some of the issues involved in developing automatic differentiation tools for parallel programming environments.
TOP-C is a task-oriented parallel C interface. It presents a master-slave task architecture that greatly eases the parallelization of code. It is intended for applications where a compiler would have difficulty recogn...
详细信息
ISBN:
(纸本)0818675829
TOP-C is a task-oriented parallel C interface. It presents a master-slave task architecture that greatly eases the parallelization of code. It is intended for applications where a compiler would have difficulty recognizing opportunities for data-parallelism. The model has been implemented for both shared memory processors and networks of workstations. There is also a sequential version useful during development, which runs the same application code. Ease-of-use has been a strong motivation behind its design. For this reason, TOP-C is organized in a SPMD style, with one primary subroutine call to invoke it. Its main features are: (a) task-parallelism, (b) a single shared, global data structure, and (c) restricted master-slave communication.
暂无评论