For instruction-level parallel machines, it is essential to extract parallelly executable instructions from a program by code scheduling. In this paper, we propose a new code scheduling technique using an extension of...
详细信息
For instruction-level parallel machines, it is essential to extract parallelly executable instructions from a program by code scheduling. In this paper, we propose a new code scheduling technique using an extension of PDG. This technique parallelizes non-numerical programs, producing better machine codes than these created by percolation scheduling.< >
A new loop scheduling scheme called multithreaded self-scheduling (MSS) for distributed shared memory multiprocessor is proposed. Based on the principles of multithreading, MSS attempts to hide the remote memory acces...
详细信息
A new loop scheduling scheme called multithreaded self-scheduling (MSS) for distributed shared memory multiprocessor is proposed. Based on the principles of multithreading, MSS attempts to hide the remote memory access latencies by switching between multiple contexts of threads. Consequently, loops scheduled by using MSS can obtain better performance comparing to the single-thread approaches. In this paper a series of simulation results corresponding to various parameter changes are presented, which provides a measure of the effectiveness of MSS under different boundary conditions and suggests the ways for further improvements.< >
We present a parallel method for finding the convex hull of a set of discs in the CREW PRAM model. We show that the convex hull of n discs can be computed in O(log/sup 1+/spl epsiv// n) time using O(n/log/sup /spl eps...
详细信息
We present a parallel method for finding the convex hull of a set of discs in the CREW PRAM model. We show that the convex hull of n discs can be computed in O(log/sup 1+/spl epsiv// n) time using O(n/log/sup /spl epsiv// n) processors, where /spl epsiv/ is any positive constant. We also show that it can be constructed in O(log n loglog n) time using O(n log n) processors. The first result achieves cost optimal and the second one runs faster. The main technique which we used in the algorithm is a complex divide-and-conquer technique.< >
This paper presents a new data dependence checking technique called the variable tracking technique (VTT). It is a single-pass data dependence checking method which locates dependent statements in a serial computer pr...
详细信息
This paper presents a new data dependence checking technique called the variable tracking technique (VTT). It is a single-pass data dependence checking method which locates dependent statements in a serial computer program. VTT produces a schedule which lists the operations in the source code in groups. The list of operations in a particular group can be executed concurrently. The user is not required to provide a profile of the program to the compiler, hence VTT is suitable for applications which automate the process of exploiting parallelism. Here we describe the use of this technique in gacc, a parallelising compiler, which compiles C functions to field programmable gate array (FPGA) circuits. The results presented in this paper show that VTT has been instrumental in gaining improved performance from a parallelising compiler which automates the process of executing the computational intensive portion of the program in hardware.< >
In order to achieve good performance, the signature file approach has been required to support parallel database processing. Therefore, in this paper we propose a horizontally-divided parallel signature file method (H...
详细信息
ISBN:
(纸本)0780320182
In order to achieve good performance, the signature file approach has been required to support parallel database processing. Therefore, in this paper we propose a horizontally-divided parallel signature file method (HPSF) using extendible hashing and frame-slicing techniques. In addition, we propose a heuristic processor allocation methods so that we may assign signatures into a given number of processors in a uniform way. To show the efficiency of HPSF, we evaluate the performance of HPSF in terms of retrieval time, storage overhead, and insertion time. Finally, we show from the performance results that HPSF outperforms the conventional parallel signature file methods on retrieval performance as well as insertion time.< >
parallel software design techniques based on client-server process models have been proposed to support the development of deadlock free systems. Deadlock freedom can be guaranteed where no client-server cycles occur ...
详细信息
parallel software design techniques based on client-server process models have been proposed to support the development of deadlock free systems. Deadlock freedom can be guaranteed where no client-server cycles occur in process graphs. Hierarchical composition rules are presented which allow the designer more freedom, including the use of cycles at a higher level. The incorporation of these design rules into a software development methodology, PARSE, is described. When PARSE is used in this manner, it provides the parallel software engineer with a powerful software development framework and permits direct design verification.< >
This paper describes a floorplan design approach that combines both a heuristic graph bipartitioning procedure and a slicing tree representation in the physical design of VLSI systems. The description of the circuit t...
详细信息
This paper describes a floorplan design approach that combines both a heuristic graph bipartitioning procedure and a slicing tree representation in the physical design of VLSI systems. The description of the circuit to be floorplanned contains a set of functional modules each having a number of possible dimensions and a net-list containing the connectivity information. The slicing tree representation provides an efficient free traversal operations using recursion for obtaining area-efficient floorplans. The slicing paradigm also eliminates the cyclical conflicts in module placement and hence ensures better routability.< >
The performance and cost-performance benefits of parallel systems make them attractive platforms for many applications. But, these are unfortunately offset by the difficulties of programming parallel computers. Theref...
详细信息
ISBN:
(纸本)0780320182
The performance and cost-performance benefits of parallel systems make them attractive platforms for many applications. But, these are unfortunately offset by the difficulties of programming parallel computers. Therefore, programming tools are the key to achieve greater success in developing applications for parallel architectures. This paper describes a new tool, VPEcons, for parallel programming development. It uses graphics to assist in the design of parallel programs. To facilitate the portability of the constructor, a VPEcons Builder has also been developed. It is a tool for creating basic component blocks and binding an existing language to the blocks created. The usefulness of the constructor is demonstrated with a parallel discrete-event simulation example and by comparing it with other visual parallel programming tools.< >
This paper presents a parallel computation model for the time-periodic nonlinear electromagnetic field analysis in the frequency domain using harmonic balance finite element method (HBFEM). The proposed model, differe...
详细信息
This paper presents a parallel computation model for the time-periodic nonlinear electromagnetic field analysis in the frequency domain using harmonic balance finite element method (HBFEM). The proposed model, different from the traditional HBFEM technique that requires large memory and long CPU time, divides the global system matrix into a number of matrices in the frequency domain. Each computation unit has exactly the same number of elements and unknown values. The work involved in calculating the element matrices is equal, therefore the load can be well-balanced and the maximum speed-up will be M times if M processors are available (M is the number of harmonics considered in the electromagnetic field). The model is well-suited to MIMD parallel computer or multiple computers connected by local area networks.< >
Orthogonal fat-trees are a type of interconnection network with several desirable characteristics: short distance between processors, constant degree of the switching elements, uniform traffic load, symmetry, and recu...
详细信息
Orthogonal fat-trees are a type of interconnection network with several desirable characteristics: short distance between processors, constant degree of the switching elements, uniform traffic load, symmetry, and recursive scalability. We first show how to build two-level orthogonal fat-trees, where each node has a fixed degree and there is a maximum distance of two between any two leaves. We then show how to provide fault tolerance by including redundant paths at the cost of reducing the number of leaves. Finally, we show how to construct large orthogonal fat-trees from two-level fat-trees recursively.< >
暂无评论