Code scheduling to exploit instruction level parallelism (ILP) is a critical problem in compiler optimization research in light of the increased use of long-instruction-word machines. Unfortunately optimum scheduling ...
详细信息
ISBN:
(纸本)9780818676413
Code scheduling to exploit instruction level parallelism (ILP) is a critical problem in compiler optimization research in light of the increased use of long-instruction-word machines. Unfortunately optimum scheduling is computationally intractable, and one must resort to carefully crafted heuristics in practice. If the scope of application of a scheduling heuristic is limited to basic blocks, considerable performance loss may be incurred at block boundaries. To overcome this obstacle, basic blocks can be coalesced across branches to form larger regions such as super blocks. In the literature, these regions are typically scheduled using algorithms that are either oblivious to profile information (under the assumption that the process of forming the region has fully utilized the profile information), or use the profile information as an addendum to classical scheduling techniques. We believe that even for the simple case of linear code regions such as super blocks, additional performance improvement can be gained by utilizing the profile information in scheduling as well. We propose a general paradigm for converting any profile-insensitive list scheduler to a profile-sensitive scheduler. Our technique is developed via a theoretical analysis of a simplified abstract model of the general problem of profile-driven scheduling over any acyclic code region, yielding a scoring measure for ranking branch instructions.
Predicated execution offers new approaches to exploiting instruction-level parallelism (ILP), but it also presents new challenges for compiler analysis and optimization. In predicated code, each operation is guarded b...
详细信息
ISBN:
(纸本)9780818676413
Predicated execution offers new approaches to exploiting instruction-level parallelism (ILP), but it also presents new challenges for compiler analysis and optimization. In predicated code, each operation is guarded by a boolean operand whose run-time value determines whether the operation is executed or nullified. While research has shown the utility of predication in enhancing ILP, there has been little discussion of the difficulties surrounding compiler support for predicated execution. Conventional program analysis tools (e.g. data flow analysis) assume that operations execute unconditionally within each basic block and thus make incorrect assumptions about the run-rime behavior of predicated code. These tools can be modified to be correct without requiring predicate analysis, but this yields overly-conservative results in crucial areas such as scheduling and register allocation. To generate high-quality code for machines offering predicated execution, a compiler must incorporate information about relations between predicates into its analysis. We present new techniques for analyzing predicated code. Operations which compute predicates are analyzed to determine relations between predicate values. These relations are captured in a graph-based data structure, which supports efficient manipulation of boolean expression representing facts about predicated code. This approach forms the basis for predicate-sensitive data flow analysis. Conventional data flow algorithms can be systematically upgraded to be predicate sensitive by incorporating information about predicates. Predicate-sensitive data flow analysis yields significantly more accurate results than conventional data flow analysis when applied to predicated code.
An upper bound for the trade-off between space and routing time and space needed to store routing information in the processors and the packets is proven. This result is a consequence of a new strategy for the simulat...
详细信息
An upper bound for the trade-off between space and routing time and space needed to store routing information in the processors and the packets is proven. This result is a consequence of a new strategy for the simulation between arbitrary networks. Finally, several techniques are applied for the simulation of networks to demonstrate space-efficiently routing strategies for vertex-symmetric networks.
We describe a new randomized parallel algorithm for computing the planar convex hull of n points on a p processor-memory pair machine communicating through a network for which O(log p) routing is possible. We show tha...
详细信息
We describe a new randomized parallel algorithm for computing the planar convex hull of n points on a p processor-memory pair machine communicating through a network for which O(log p) routing is possible. We show that the algorithm has optimal O(n log n/p) performance provided that p = o(n/log3n). We report extensive computational experience with the algorithm which supports its theoretical claims.
Emerging applications in multi-media and the Human Genome Project require storage and searching of large databases of strings - a task for which parallelism seems the only hope. In this paper, we consider the parallel...
详细信息
Emerging applications in multi-media and the Human Genome Project require storage and searching of large databases of strings - a task for which parallelism seems the only hope. In this paper, we consider the parallelism in some of the fundamental problems in compressing strings and in matching large dictionaries of patterns against texts. We present the first work-optimal algorithms for these well-studied problems including the classical dictionary matching problem, optimal compression with a static dictionary and the universal data compression with dynamic dictionary of Lempel and Ziv. All our algorithms are randomized and they are of the Las Vegas type. Furthermore, they are fast, working in time logarithmic in the input size. Additionally, our algorithms seem suitable for a distributed implementation.
We implemented and measured several methods to perform BMMC permutations on the MasPar MP-2. Our results indicate that, except for certain types of permutations or very high virtual processor ratios, the best method o...
详细信息
We implemented and measured several methods to perform BMMC permutations on the MasPar MP-2. Our results indicate that, except for certain types of permutations or very high virtual processor ratios, the best method overall is the naive method but with virtual-processor numbers computed in Gray-code order. For some permutations, however, the naive method performs very poorly;the best method in these cases is an adaptation of the block BMMC algorithm for parallel disk systems in which the processor elements are treated as independent devices.
Consider the following NP-hard problems: Given a graph G, find the minimum 2-edge connected and 2-vertex connected subgraphs spanning all vertices of G. The past few years have produced exciting sequential algorithms ...
详细信息
ISBN:
(纸本)9780897917179
Consider the following NP-hard problems: Given a graph G, find the minimum 2-edge connected and 2-vertex connected subgraphs spanning all vertices of G. The past few years have produced exciting sequential algorithms for approximating such minimum subgraphs [6, 7]. The approximation factors are improved from 2 down to 5/4 and 3/2 respectively. Yet the techniques involved are all based on augmenting depth-first-search trees and no similar progress has been carried to the parallel context. This paper presents NC algorithms to achieve approximation factors of 3/2 + Ε and 7/4 + Ε respectively without computing depth-first-search trees.
This paper provides a new approach to labeling the connected components of an n × n image on a scan line array processor (comprised of n processing elements). Variations of this approach yield an algorithm guaran...
详细信息
ISBN:
(纸本)9780897917179
This paper provides a new approach to labeling the connected components of an n × n image on a scan line array processor (comprised of n processing elements). Variations of this approach yield an algorithm guaranteed to complete in o(n lg n) time as well as algorithms likely to approach O(n) time for all or most images. The best previous solutions require using a more complicated architecture or require Ω(n lg n) time. We also show that on a restricted version of the architecture, any algorithm requires Ω(n lg n) time in the worst case.
This paper addresses the question of the future applicability of bus-based multiprocessors, by studying the execution characteristics of a number of data parallel numerical and scientific applications, as the system p...
详细信息
ISBN:
(纸本)9780897917179
This paper addresses the question of the future applicability of bus-based multiprocessors, by studying the execution characteristics of a number of data parallel numerical and scientific applications, as the system parameters of processor speed, bus bandwidth, and cache size are scaled (holding the number of processors fixed). This study focuses on 'asymptotic' results. Specifically, it addresses whether the application eventually become communication bound as the machine and the application are scaled. Essentially, it addresses the question of whether the machine becomes unsuitable as a parallel computing platform for the application. A transient analysis on the rate which the asymptotic results take hold is also performed.
Techniques for quickly executing lengthy computations by the use of molecular parallelism are described. It is demonstrated that molecular computations can be done using short DNA strands by more or less conventional ...
详细信息
ISBN:
(纸本)9780897917179
Techniques for quickly executing lengthy computations by the use of molecular parallelism are described. It is demonstrated that molecular computations can be done using short DNA strands by more or less conventional biotechnology engineering techniques within a small number of laboratory steps. Two abstract models of molecular computation are proposed. The first, the parallel Associative Memory Model, is a very high level model which includes a parallel Associative Matching operation, that appears to improve the power of molecular parallelism beyond the operations previously considered by Lipton (1994). A Recombinant DNA Model is also proposed which is a low level model that allows operations that are abstractions of very well understood recombinant DNA operations and provides a representation, herein called complex, for the relevant structural properties of DNA.
暂无评论