Scheduling parallel programs in multiprocessing systems is considered. The goal is to improve the job turnaround time and speedup. As most programs are not fully parallelizable, parallel programs consisting of serial,...
详细信息
ISBN:
(纸本)0818621583
Scheduling parallel programs in multiprocessing systems is considered. The goal is to improve the job turnaround time and speedup. As most programs are not fully parallelizable, parallel programs consisting of serial, partially parallel, and (perfectly) parallel stages, are modeled by task graphs which depict the parallelism profiles. In another perspective, the study is intended to be a queuing counterpart of Amdahl's law. Various two-stage cases are investigated with FCFS or priority scheduling policies, and the optimal strategy is determined. Results indicate that, with nonhomogeneous parallelism profiles, it would be a better processor allocation policy to give a preferential treatment to serial stages. The study might hopefully improve the design of parallel operating system as well as parallel algorithms, databases, and parallelizing compilers.
Loop-iteration level parallelism is one of the most common forms of parallelism being exploited by optimizing compilers and parallel machines. In the present study, the authors selected six large application programs ...
详细信息
ISBN:
(纸本)0818621583
Loop-iteration level parallelism is one of the most common forms of parallelism being exploited by optimizing compilers and parallel machines. In the present study, the authors selected six large application programs and used an execution-driven simulation technique from MaxPar to identify and measure the effectiveness of concurrent DOACROSS loop execution. It was found that executing DOACROSS loops serially can significantly degrade the performance for some of the programs. The authors also measured and studied the characteristics of those cross-iteration dependences in DOACROSS loops and measured the capability of a state-of-the-art parallellizing compiler, KAP, in identifying and eliminating cross-iteration dependences.
An approach to reversibility at the machine language level is discussed. The focus is upon instruction set design and associated machine language programming techniques. Sample programs run on a simulator developed as...
详细信息
ISBN:
(纸本)0780300335
An approach to reversibility at the machine language level is discussed. The focus is upon instruction set design and associated machine language programming techniques. Sample programs run on a simulator developed as proof of concept for a reversible system demonstrates the viability of reversible programs and possible application areas.
The author presents a novel protocol for run-time detection of data races in executions of shared-memory programs with nested fork-join parallelism and no other inter-thread synchronization. This protocol has signific...
详细信息
ISBN:
(纸本)0818621583
The author presents a novel protocol for run-time detection of data races in executions of shared-memory programs with nested fork-join parallelism and no other inter-thread synchronization. This protocol has significantly smaller worst-case run-time overhead than previous techniques. The worst-case space required by this protocol when monitoring an execution of a program P is O(V N), where V is the number of shared variables in P, and N is the maximum dynamic nesting of parallel constructs in P's execution. The worst-case time required to perform any monitoring operation is O(N). It is formally proved that the proposed protocol always reports a non-empty subset of the data races in a monitored program execution, and how this property leads to an effective debugging strategy is described.
This paper presents a comparison between two different approaches for parallel computation of structural optimization problems. The first approach decomposes the original structure into several substructures and assig...
详细信息
ISBN:
(纸本)0791807487
This paper presents a comparison between two different approaches for parallel computation of structural optimization problems. The first approach decomposes the original structure into several substructures and assigns each substructure to one processor. Each processor handles the finite element calculation of its assigned substructure independently with limited communication between processors. The second approach decomposes the computation of constraint's gradient evaluation. When a constraint's gradient evaluation is needed the main processor decomposes it into several computation tasks, then assigns the computation tasks to the other available processors. To compare the speedup of the two approaches some numerical examples on the CRAY X-MP multi-processor system are presented.
The authors evaluate a class of sorting algorithms which can be adapted to a faulty network with nearest neighbor interconnections by determining a suitable indexing scheme. A worst case sorting time of O(N) is proved...
详细信息
ISBN:
(纸本)0818690895
The authors evaluate a class of sorting algorithms which can be adapted to a faulty network with nearest neighbor interconnections by determining a suitable indexing scheme. A worst case sorting time of O(N) is proved for these sorters. Simulation results show that the average sorting time of the fault-tolerant sorters is only slightly higher than O(√N), and therefore is comparable to that of non-fault-tolerant sorting algorithms. This algorithmic approach does not require additional wiring for reconfiguration, and hence the amount of additional circuitry required for fault-tolerance is very small. An efficient procedure for calculating an indexing scheme is presented, and simulation results are shown. Furthermore, an efficient strategy for testing the network is proposed.
We present the results of a large empirical study of the running time of Shellsort. A previous study reported by Knuth for several increment sequences gave running times between THETA(N1.25) and THETA(N1.28) or THETA(...
详细信息
We present the results of a large empirical study of the running time of Shellsort. A previous study reported by Knuth for several increment sequences gave running times between THETA(N1.25) and THETA(N1.28) or THETA(Nlog2/N), but these forms are not very accurate especially for large permutations. Our results give a running time of THETA(N5/4) for all of the increment sequences suggested by Knuth and THETA(N7/6) for an increment sequence suggested by Sedgewich. Our fits are accurate to within 1 % for 250 less-than-or-equal-to N < 100 000 and about 0.1 % for larger N. Much of the error in the fits appears to be related to the error in the measured data. These results are significant because they suggest that O(log N) increment sequences with THETA(N(k))worst-case running times have THETA(N(k+1)/2) average-case running times and that there is no increment sequence for which Shellsort is O(N log N), even on average.
This paper proposes a scheme for the automatic detection of parallelism among coarse grain tasks (macrotasks) in a Fortran program. In this scheme parallelism is represented as the EXECUTION START CONDITION for each m...
详细信息
This paper proposes a scheme for the automatic detection of parallelism among coarse grain tasks (macrotasks) in a Fortran program. In this scheme parallelism is represented as the EXECUTION START CONDITION for each macrotask in a program. This condition is represented by a logical expression that states when, in relation to other macrotasks, the execution of a certain macrotask can be started. This representation aids in scheduling a set of macrotasks onto processors. To construct such an expression, a novel concept called co-control dependence is introduced. Analyzing co-control dependence together with control dependence and data dependences among macrotasks, the scheme detects and describes successfully the parallelism among all macrotasks in a program.
A sequence α = (al, . . ., an) is k-sorted if and only if for all 1 &le i, j &le n, i i &le aj. A k-sorting algorithm called k-Bubblesort is first designed, since it is a generalization of Bubblesort. Nex...
详细信息
A sequence α = (al, . . ., an) is k-sorted if and only if for all 1 &le i, j &le n, i i &le aj. A k-sorting algorithm called k-Bubblesort is first designed, since it is a generalization of Bubblesort. Next, a k-sorting algorithm called k-Quicksort is designed, as well as an algorithm for transforming a roughly sorted sequence into a less roughly sorted sequence. It is shown that k-Quicksort is time-optimal within a constant factor if k satisfies a certain condition. Parallel implementation of k-Bubblesort and k-Quicksort are also discussed.
Proposed PARCS -tools (PARCS - 'Parallel Asynchronous Recursive Controlled systems') are intended for programming parallel systems with dynamically changed structures and designed according to the PARCS - tech...
详细信息
Proposed PARCS -tools (PARCS - 'Parallel Asynchronous Recursive Controlled systems') are intended for programming parallel systems with dynamically changed structures and designed according to the PARCS - technology of programming. programming system PARCS supports algorithms design and realization for parallel information processing and is based on extensions of basic algorithmic languages (C, Pascal, Fortran, Modula -2).
暂无评论