As the resolution of simulation models increases, scientific visualization algorithms which take advantage of the large memory andparallelism of Massively parallel Processors (MPPs) are becoming increasingly importan...
详细信息
ISBN:
(纸本)0818678704
As the resolution of simulation models increases, scientific visualization algorithms which take advantage of the large memory andparallelism of Massively parallel Processors (MPPs) are becoming increasingly important. For large applications rendering on the MPP tends to be preferable to rendering on a graphics workstation due to the MPP's abundant resources: memory, disk, and numerous processors. The challenge becomes developing algorithms that carl exploit these resources while minimizing overhead, typically communication costs. This paper will describe recent efforts in parallel rendering for polygonal primitives as well as parallel volumetric techniques. This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume render use a MIMD approach. Implementations for these algorithms are presented far the Thinking Machines Corporation CM-5 and the Cray Research Inc T3D.
The parentheses-matching problem is of crucial importance in the construction of expression tree in order to evaluate, for instance, arithmetic expressions. A new parallel algorithm is introduced in this paper to solv...
详细信息
The parentheses-matching problem is of crucial importance in the construction of expression tree in order to evaluate, for instance, arithmetic expressions. A new parallel algorithm is introduced in this paper to solve the parentheses-matching problem optimally (in O(log2 n) parallel time with O(n/log2 n) processors) on an EREW-PRAM model. An algorithm for an input string of n parentheses with a maximal nested level of log n is also presented.
Much work has been done to implement declarative languages in parallel form. Most of them tend to resort to imperative features for some purposes, particularly for description of the parallelism. We propose parallel c...
详细信息
ISBN:
(纸本)0818678704
Much work has been done to implement declarative languages in parallel form. Most of them tend to resort to imperative features for some purposes, particularly for description of the parallelism. We propose parallel computation on associative networks, a machine independent parallel programming model, for automatic extraction of available inherent parallelism and optimization of declarative programs. Associative networks are used for representing program-like and data-like information. The computation follows the transformation style of information processing. All computational mechanisms are oriented toward the processing incomplete information and perform parallel partial evaluation. This partial evaluation is a base of the proposed technique for automatic transforming, optimizing, andparallelizing declarative programs.
This paper is devoted to developing a genetic algorithm for a communication network design that minimizes total link cost, and subjects to some constraints like diameter and two-connectivity. Two parallel genetic algo...
详细信息
This paper is devoted to developing a genetic algorithm for a communication network design that minimizes total link cost, and subjects to some constraints like diameter and two-connectivity. Two parallel genetic algorithms on the level of partitioning requirements and the level of dividing population are proposed and implemented over a transputer based parallel network with various virtual network topologies. The ring-ring topology gives the best performance for the parallel genetic algorithm on the level of partitioning requirements, and the torus topology is the most suitable topology for the parallel genetic algorithm on the level of dividing population.
In this paper, a parallel and fault-tolerant LAN (P_FTLAN) with dual communication subnetworks is presented to improve LANs' reliability. Its function modes, technical characters, hardware and software architectur...
详细信息
In this paper, a parallel and fault-tolerant LAN (P_FTLAN) with dual communication subnetworks is presented to improve LANs' reliability. Its function modes, technical characters, hardware and software architectures, and some key implementation techniques, such as logical addresses andparallel mechanisms, are described in details. Our prototype system and analyzing results suggest that the scheme presented in the paper not only provides an effective approach to high reliable LANs, but also can improve their performance greatly.
In this paper, we give efficient parallel and distributed algorithms for the topological sort problem on acyclic graphs with n vertices. Our parallel algorithm solves the problem on a CREW PRAM in O(log2n) time with O...
详细信息
In this paper, we give efficient parallel and distributed algorithms for the topological sort problem on acyclic graphs with n vertices. Our parallel algorithm solves the problem on a CREW PRAM in O(log2n) time with O(M(n)/log n) processors, where M(n) denotes the number of processors needed to multiply two n × n integer matrices over the integer ring. The best known upper bound of M(n) is O(n2.376). The parallel algorithm can also solve the problem on processor arrays with reconfigurable bus systems in O(1) time and O(n3) processors. Our distributed algorithm solves the topological sort problem of an arbitrary asynchronous network with communication complexity O(n2).
Recently, high-performance computer architecture has focused on dynamic scheduling techniques to issue and execute multiple operations concurrently. These design are complex and have frequently shown disappointing per...
详细信息
Recently, high-performance computer architecture has focused on dynamic scheduling techniques to issue and execute multiple operations concurrently. These design are complex and have frequently shown disappointing performance. A complementary approach is the use of static scheduling techniques to exploit the same parallelism. In this paper we describe some of the tradeoffs between the use of static and dynamic scheduling techniques and show that with appropriate scheduling, low-complexity designs using only static scheduling have significant advantages over high-complexity designs using dynamic scheduling in real systems.
A massively parallel computer handles massive amount of data with simultaneous access requests from multiple processors, and therefore the massively parallel computer must have a large capacity secondary storage syste...
详细信息
ISBN:
(纸本)0818678704
A massively parallel computer handles massive amount of data with simultaneous access requests from multiple processors, and therefore the massively parallel computer must have a large capacity secondary storage system. of very high concurrency. Such a storage system should consists many disks that are connected in parallel. With, such large-scale parallel disk systems, access load balancing is extremely important to enhance the effective operation of all disks. In. this paper, we propose a parallel file access method named DECODE (Dynamic Express Changing Of Data Entry) which perform dynamic load balancing over all disks according to the load status of each disk. The DECODE can achieve load balancing by changing the disk. for writing data to the low load disk. The efficiency of this method was verified by preliminary performance evaluation using software simulation under various access conditions by changing the access pattern and the access size. And the effects of some parameters used in this method was evaluated also.
CP-PACS (Computational Physics by parallel Array Computer System) is a massively parallel processor with 2048 Processing Units built at Center for Computational Physics, University of Tsukuba. The node processor of CP...
详细信息
CP-PACS (Computational Physics by parallel Array Computer System) is a massively parallel processor with 2048 Processing Units built at Center for Computational Physics, University of Tsukuba. The node processor of CP-PACS is a RISC microprocessor enhanced by Pseudo Vector Processing feature, which can realize high-performance vector processing. The interconnection network is 3-dimensional Hyper-Crossbar Network, which has high flexibility and embeddability for various network topologies and communication patterns. The theoretical peak performance of whole system is 614.4 GFLOPS. In this paper, we describe the overview of CP-PACS architecture and several special architectural characteristics of it. The performance evaluation on parallel LINPACK benchmark is also shown.
Inter-module bandwidth is one of the major constraints on the performance of current and future parallel systems. In this paper, we propose and evaluate several high-performance bus-based parallelarchitectures, inclu...
详细信息
Inter-module bandwidth is one of the major constraints on the performance of current and future parallel systems. In this paper, we propose and evaluate several high-performance bus-based parallelarchitectures, including bus-based cyclic networks (BCNs) and quotient cyclic networks (BQCNs), which are particularly efficient in view of their respective inter-module communication patterns. The inter-cluster connection in a BCN is defined on a set of nodes whose addresses are cyclic shifts of one another. The node degree of a basic BCN is 3;while those of BQCNs and enhanced BCNs can vary from a small constant (e.g., 2) to as large as required, thus providing flexibility and effective tradeoff between cost and performance. A variety of algorithms can be performed efficiently on these networks, thus proving the versatility of BCNs and BQCNs.
暂无评论