We present two algorithms to minimize the amount of synchronization added when parallelizing a loop with loop-carried dependences. In contrast to existing schemes, our algorithms add lesser synchronization, while pres...
详细信息
We present two algorithms to minimize the amount of synchronization added when parallelizing a loop with loop-carried dependences. In contrast to existing schemes, our algorithms add lesser synchronization, while preserving the parallelism that can be extracted from the loop. Our first algorithm uses an interval graph representation of the dependence "overlap" to find a synchronization placement in time almost linear in the number of dependences. Although this solution may be suboptimal, it is still better than that obtained using existing methods, which first eliminate redundant dependences and then synchronize the remaining ones. Determining the optimal synchronization is an NP-complete problem. Our second algorithm therefore uses integer programming to determine the optimal solution. We first use a polynomial-time algorithm to find a minimal search space that must contain the optimal solution. then, we formulate the problem of choosing the minimal synchronization from the search space as a set-cover problem, and solve it exactly using 0-1 integer programming. We show the performance impact of our algorithms by synchronizing a set of synthetic loops on an 8-processor Convex Exemplar. the greedily synchronized loops ran between 7% and 22% faster than those synchronized by the best existing algorithm. Relative to the same base, the optimally synchronized loops ran between 10% and and 22% faster.
作者:
Lenke, MLRR-TUM
Lehrstuhl für Rechnertechnik und Rechnerorganisation Institut für Informatik Technische Universit?t München 80290 München Germany
Typical applications of the so-called Grand Challenges need massively parallel computer system architectures. Tools like parallel debuggers, performance analysers and visualizers help the code designer to develop effi...
详细信息
Typical applications of the so-called Grand Challenges need massively parallel computer system architectures. Tools like parallel debuggers, performance analysers and visualizers help the code designer to develop efficient parallelalgorithms. Such tools merely support the development cycle. But technical and scientific engineers who make use of parallel high-performance computing applications, e.g. numerical simulation algorithms in computational fluid dynamics (CFD), must be supported in their engineering work by another kind of tool. A tool for the application cycle is required because old, conventional suggestions regarding the arrangement for the application cycle rely on strictly sequential procedures. they are due to the heritage of traditional work on former vector computers. that formative influence is still felt in today's arrangements for the application cycle, prevents a more efficient engineering work and, therefore, must be overcome. New tool conceptions have to be introduced to enable on-line interaction between the technical and scientific engineers and their running parallel simulation. VIPER stands for VIsualization of parallel numerical simulation algorithms for Extended Research and offers physical parameters of the mathematical model and parameters of the numerical method as objects of a graphical user tool interface for online observation and online modification. A special client-server-client process architecture implementation enables technical and scientific engineers who are sitting at their graphic workstation to interact withtheir parallel simulation algorithms running on a remote parallel computer system. the VIPER prototype is applied on ParNsflex which is a parallel Navier-Stokes solver for real world aero-dynamic problems. A Paragon XP/S was selected as test parallel computer system. A first evaluation indicates the superiority of the VIPER conception against conventional procedures. Copyright (C) 1996 Published by Elsevier Science L
We study the scalability of 2-D discrete wavelet transform algorithms on fine-grained parallelarchitectures. the principal operation in the 2-D DWT is the filtering operation used to implement the filter banks of the...
详细信息
this paper aims to describe synthetically integration and use of parallelism in relational databases on MIMD parallel architecture models. More precisely, after exposing the main goals of parallel relational databases...
详细信息
ISBN:
(纸本)354061656X
this paper aims to describe synthetically integration and use of parallelism in relational databases on MIMD parallel architecture models. More precisely, after exposing the main goals of parallel relational databases, we highlight that It is essential to exploit recent parallelarchitectures to obtain high performance. parallelization of database programs requires the use of data placement approaches and data partitioning strategies which lead to extract levels, forms and types of parallelism. As for the inter-operation parallelization phase, the key problem of optimization, we describe one-phase and two-phase inter-operation parallelization strategies. this leads to unsolved problems which constitute a challenge for future parallel relational database systems.
this paper presents the design of a dedicated parallel architecture for connected component analysis. Categorized in one-dimensional array processors, for an image of n/spl times/n pixels, the proposed architecture ha...
详细信息
We present deterministic algorithms for integer sorting and on-line packet routing on arrays with reconfigurable optical buses. the main objective is to identify the mechanisms specific to this type of architecture wh...
An extension of a general-purpose programming language (gpPL) is presented. It enables parallelism, persistence and query optimization based on sets. the authors demonstrate that in gpPLs the primitive "set"...
详细信息
An extension of a general-purpose programming language (gpPL) is presented. It enables parallelism, persistence and query optimization based on sets. the authors demonstrate that in gpPLs the primitive "set" can be generalised for the needs of database and expert system applications. Side-effect free declarative queries, based on set expressions, can be optimized and executed in parallel. Individual optimization and parallelization are integral parts of the language system and compiler. Very different combinations of persistent or volatile, and parallel or sequential, and optimized or non-optimized implementations are possible. this is eased by the fact that a great part of the implementation is located outside the compiler withthe help of predefined interfaces. Different algebras, optimizers or algorithms can be considered. the same program can be executed without modification in various systems or platforms.
this paper presents the design of a dedicated parallel architecture for connected component analysis. Categorized in one-dimensional array processors, for an image of n/spl times/n pixels, the proposed architecture ha...
详细信息
this paper presents the design of a dedicated parallel architecture for connected component analysis. Categorized in one-dimensional array processors, for an image of n/spl times/n pixels, the proposed architecture has n-1 linear processing elements (PEs), n/sup 2/ CAM memory modules, and a tree structure of (n/2)-1 switches allowing communication through the global bus in O(log n) unit of propagation time. Well suited for low and intermediate-level vision, this architecture allows sequential processingthrough its line structure which is perfectly adapted to real time image analysis from any interlaced-mode video signal. this paper presents the algorithms for connected component labeling, area and perimeter determination, all of which are in O(n log n). the performance of the proposed architecture is compared with another architecture types. the simulation results, the possibility of implementation, and future work are discussed.
暂无评论