作者:
Chang, JMIIT
Comp Engn Program Chicago IL 60616 USA
This paper presents the design and evaluation of a novel submesh allocation algorithm called the Optimized Buddy System which is derived from conventional binary buddy system. Our scheme is made feasible by using a bi...
详细信息
ISBN:
(纸本)0818682596
This paper presents the design and evaluation of a novel submesh allocation algorithm called the Optimized Buddy System which is derived from conventional binary buddy system. Our scheme is made feasible by using a bit-map to represent the status of the processors. The proposed scheme can be mapped into hardware directly to take the advantage of the speed of a pure combinational-logic implementation. Although the buddy system may allocate a submesh that is much larger than the requested size, the logic that finds a free submesh can be augmented by a "bit-flipper" to relinquish the unused portion at the end of the submesh. This effectively eliminates internal fragmentation. The hardware complexity of proposed scheme (i.e. and-gate tree, or-gate tree and bit-flipper) is O(m x n), where m x n is the total number of the processors. The unique blind spot issue of proposed system is also investigated. Simulation results show that the allocation efficiency of the proposed scheme is similar to the one that uses best-fit approach.
This paper is concerned with the transparent parallelisation of declarative database queries, based on theoretical principles. We have designed an entire database architecture suitable for use on any general-purpose p...
详细信息
ISBN:
(纸本)0818682596
This paper is concerned with the transparent parallelisation of declarative database queries, based on theoretical principles. We have designed an entire database architecture suitable for use on any general-purpose parallel machine. This architecture addresses the shortcomings in flexibility and scalability of commercial parallel databases. A substantial benefit is that the mathematical principles underlying our framework allow provably correct parallel evaluations and optimisations, using compile-time transformations. We address parallelism in a language-independent way through the choice of monoids as a formulation for expressing and modelling queries. Queries expressed in our declarative language are transformed into applications of a higher-order function, the monoid homomorphism. The evaluation of this function is partitioned at run-time, giving a tree-like processor topology, the depth and breadth of which can be varied with a declarative execution plan. Leaf nodes within the evaluation tree operate on their own data partitions and forward results to the apropriate interior nodes. Due to the nature of our language, the functions that are necessary to combine results from independent parallel evaluations are generated automatically at compile-time from a monoid definition dictionary, additions to which can be made to extend the system's data types. We have built a complete prototype of our system, which uses Swiss Radio Corporation's entire recorded music catalogue, on a general-purpose AP1000, 128-cell parallel computer at the IFPC.(1)
The proceedings contains 24 papers from the Second internationalsymposium on parallel Symbolic Computation PASCO '97 held in Maui, Hawaii from July 20-22, 1997. Topics discussed include: linear Diophantine equati...
详细信息
The proceedings contains 24 papers from the Second internationalsymposium on parallel Symbolic Computation PASCO '97 held in Maui, Hawaii from July 20-22, 1997. Topics discussed include: linear Diophantine equations;rectangular matrix multiplications;matrix inversion;parallel Cost-based Abductive Reasoning (PARCAR) system;object graph rewriting;Block-Wiedemann algorithm;parallel logic programming systems;Andorra model;functional programming;parallel algorithms;logic programming;polynomial factorization;multipoint polynomial evaluation;and fine-grain implicit parallelism.
This paper presents a strategy for efficiently rendering time-varying volume dates on a distributed-memory parallel computer. Visualizing time-varying volume data take both large storage space and long computation tim...
详细信息
ISBN:
(纸本)0818682596
This paper presents a strategy for efficiently rendering time-varying volume dates on a distributed-memory parallel computer. Visualizing time-varying volume data take both large storage space and long computation time. Instead of employing all processors to render one volume at a time, a pipelined rendering approach partitions processors into groups so that multiple volumes can be rendered concurrently. The overall rendering time is greatly minimized because rendering is overlapped with I/O required to load the volume data sets. Moreover, parallelization overhead may be reduced as a result of partitioning the processors. We modify can existing parallel volume renderer to exploit various levels of rendering parallelism and to study how the partitioning of processors may lead to optimal rendering performance. We find that two factors affecting the overall execution time are resource utilization efficiency anal pipeline startup latency. The optimal partitioning configuration is the one that balances these two factors. Tests on Intel Paragon computers show that in general optimal partitionings do exist for a given rendering task and result in 40-50% saving in overall rendering time.
In this paper, a model is presented for representing and comparing workloads, based on the way they would exercise parallel machines. This workload characterization is derived from parallel instruction centroid and pa...
详细信息
In this paper, a model is presented for representing and comparing workloads, based on the way they would exercise parallel machines. This workload characterization is derived from parallel instruction centroid and parallel workload similarity. The centroid is a simple measure that aggregates average parallelism, instruction mix, and critical path length. When captured with abstracted information about communication requirements, the result is a powerful tool in understanding the requirements of workloads and their potential performance on target machines. The workload similarity is based on measuring the normalized Euclidean distance (ned) between workload centroids. It will be shown that this workload representation method outperforms comparable ones in accuracy, as well as in time and space requirements. Analysis of the NAS parallel Benchmarks and their performance is presented to demonstrate some of the applications, such as performance prediction with good accuracy, and insight provided by this model.
Recent advances in network technology and the higher levels of circuit integration due to VLSI have led to widespread interest in the use of multiprocessor systems in solving many practical problems. As the hardware c...
详细信息
Recent advances in network technology and the higher levels of circuit integration due to VLSI have led to widespread interest in the use of multiprocessor systems in solving many practical problems. As the hardware continues to diminish in size and cost, new possibilities are being created for systems that are heterogeneous by design. parallel multiprocessor architectures are now feasible and provide a valid solution to the throughput rates demands of the increasing sophistication of control and/or instrumentation systems. Increasing the number of processors and the complexity of the problems to be solved makes programming multiprocessor systems more difficult and error-prone. This paper describes some parts already implemented (mainly the scheduler) of a software development tool for heterogeneous multiprocessor system that will perform automatically: code generation, execution time estimation, scheduling and handles the communication primitive insertion.
In this paper, we present a new algorithm to solve the integral knapsack problem on the hypercube. The main idea is to use the fact that the precedence graph of the dynamic programming function of the knapsack problem...
详细信息
In this paper, we present a new algorithm to solve the integral knapsack problem on the hypercube. The main idea is to use the fact that the precedence graph of the dynamic programming function of the knapsack problem is an irregular mesh. We propose a scheduling algorithm for irregular meshes on the hypercube. The efficiency of the algorithm is independent on the number of processors. We also present some improvements for the solution of the O/I knapsack problem on the hypercube.
We present two algorithms to minimize the amount of synchronization added when parallelizing a loop with loop-carried dependences. In contrast to existing schemes, our algorithms add lesser synchronization, while pres...
详细信息
We present two algorithms to minimize the amount of synchronization added when parallelizing a loop with loop-carried dependences. In contrast to existing schemes, our algorithms add lesser synchronization, while preserving the parallelism that can be extracted from the loop. Our first algorithm uses an interval graph representation of the dependence `overlap' to find a synchronization placement in time almost linear in the number of dependences. Although this solution may be suboptimal, it is still better than that obtained using existing methods, which first eliminate redundant dependences and then synchronize the remaining ones. Determining the optimal synchronization is an NP-complete problem. Our second algorithm therefore uses integer programming to determine the optimal solution. We first use a polynomial-time algorithm to find a minimal search space that must contain the optimal solution. Then, we formulate the problem of choosing the minimal synchronization from the search space as a set-cover problem, and solve it exactly using 0-1 integer programming. We show the performance impact of our algorithms by synchronizing a set of synthetic loops on an 8-processor Convex Exemplar. The greedily synchronized loops ran between 7% and 22% faster than those synchronized by the best existing algorithm. Relative to the same base, the optimally synchronized loops ran between 10% and 22% faster.
In this paper a parallel and fault-tolerant LAN (P_FTLAN) with dual communication subnetworks is presented to improve LANs' reliability. Its function modes, technical characters, hardware and software architecture...
详细信息
ISBN:
(纸本)0818678704
In this paper a parallel and fault-tolerant LAN (P_FTLAN) with dual communication subnetworks is presented to improve LANs' reliability. Its function modes, technical characters, hardware and software architectures and some key, implementation techniques, such as logical addresses and parallel mechanisms, are described in details. Our prototype system and analyzing results suggest that the scheme presented in the paper not only provides an effective approach to high reliable LANs, but also can improve their performance greatly.
The article presents TOPAS, a programming environment for visualization, animation and investigation of algorithms for mapping graphs of parallel programs into graphs of parallel computing systems implemented in Java ...
详细信息
The article presents TOPAS, a programming environment for visualization, animation and investigation of algorithms for mapping graphs of parallel programs into graphs of parallel computing systems implemented in Java and accessible on the WWW.
暂无评论