High-performance architectures rely upon powerful optimizing and parallelizing compilers to maximize performance. Such compilers need accurate program analysis to enable their performance-enhancing transformations. In...
详细信息
High-performance architectures rely upon powerful optimizing and parallelizing compilers to maximize performance. Such compilers need accurate program analysis to enable their performance-enhancing transformations. In the domain of program analysis for parallelization, pointer analysis is a difficult and increasingly common problem. When faced with dynamic, pointer-based data structures, existing solutions are either too limited in the types of data structures they can analyze, or require too much effort on the part of the programmer. In this paper we present a powerful description language for expressing the aliasing properties of dynamic date structures. Such descriptions provide the compiler with better information during alias analysis, and require only minimal effort from the programmer. Ultimately, this enables a more accurate program analysis, and an increased application of performance-enhancing transformations.< >
The effectiveness of residue code checking for on-line error detection in parallel two's complement multipliers has up to now only been evaluated experimentally for few architectures. In this paper a formal analys...
详细信息
The effectiveness of residue code checking for on-line error detection in parallel two's complement multipliers has up to now only been evaluated experimentally for few architectures. In this paper a formal analysis is given for most of the current multiplication schemes. Based on this analysis it is shown which check bases are appropriate, and how the original scheme has to be extended for complete error detection at the input registers and Booth recording circuitry. In addition, we argue that the hardware overhead for checking can be reduced by approximately one half if a small latency in error detection is acceptable. Schemes for structuring the checking logic in order to guarantee it to be self-testing, and thus achieve the totally self-checking goal for the overall circuit, are also derived.< >
This paper deals with performance measurement and evaluation of digital neuro-computers. We discuss the constraints introduced by hardware implementations. A revisited definition of computer speed-up is then proposed,...
详细信息
This paper deals with performance measurement and evaluation of digital neuro-computers. We discuss the constraints introduced by hardware implementations. A revisited definition of computer speed-up is then proposed, taking into account both the traditional notion of parallelization speed-up and the algorithmic precision of the machines. Finally we show, on the example of the Kohonen feature map algorithm, how a fair benchmarking procedure can be established to evaluate digital neuro-computers more significantly than using the traditional Mcups metric.
We present a new mapping strategy of the dynamic space warping algorithm (DSWA) onto a micro-grained array processor (MGAP). This new mapping strategy reduces the communication complexity between processing elements a...
详细信息
We present a new mapping strategy of the dynamic space warping algorithm (DSWA) onto a micro-grained array processor (MGAP). This new mapping strategy reduces the communication complexity between processing elements and increases the performance due to data pipelining and interleaving. The DSWA, which can be applied to image recognition, originally needs a four-dimensional array. Practically however, this four-dimensional algorithm must be mapped onto a two-dimensional array processor. A previous mapping used O(NW) processors to compute the distance between an N/spl times/N input image and a reference image with the warping distance W in O(NW) time. The new mapping scheme uses O(N/sup 2/) processors to generate each computation result in O(N+W/sup 2/) time. We also show the experimental results and performance comparison between Connection Machine (CM) 200 and the MGAP.< >
"First-generation" scalable parallel libraries have been achieved, and are maturing, within the Multicomputer Toolbox. The Toolbox includes sparse, dense, iterative linear algebra, a stiff ODE/DAE solver, an...
详细信息
"First-generation" scalable parallel libraries have been achieved, and are maturing, within the Multicomputer Toolbox. The Toolbox includes sparse, dense, iterative linear algebra, a stiff ODE/DAE solver, and an open software technology for additional numerical algorithms. We have devised C-based strategies for useful classes of distributed data structures, including distributed matrices and vectors. The underlying Zip code message passing system has enabled process-grid abstractions of multicomputers, communication contexts, and process groups, all characteristics needed for building scalable libraries, and scalable application software. A data-distribution-independent approach to building scalable libraries is needed so that applications do not unnecessarily have to redistribute data at high expense. We discuss the strategy used for implementing data-distribution-independent mappings. We also indicate that data-distribution-independent algorithms are sometimes more efficient than fixed-data-distribution counterparts, because redistribution of data can be avoided, and that this question is strongly application dependent.< >
In this paper we present a constraint-model for pipelined digital signal processors, based on mathematical programming. Our intention is to maximize the utilization of allocated hardware components by minimizing the c...
详细信息
In this paper we present a constraint-model for pipelined digital signal processors, based on mathematical programming. Our intention is to maximize the utilization of allocated hardware components by minimizing the clock-cycle time and the period of cyclic innermost loop schedules in one approach. The constraint formulations are developed for a very general multiphase clocking environment with an FPGA (Field Programmable Gate Array) target architecture.< >
Inspired by the harmony between the basic functional elements of biological neural networks and their natural operating media, we have been seeking for ways to implement artificial neural networks (ANNs) using the int...
详细信息
Inspired by the harmony between the basic functional elements of biological neural networks and their natural operating media, we have been seeking for ways to implement artificial neural networks (ANNs) using the intrinsic functionality of the most commonly available devices in an electronics technology, in contrast to the method of hardware-compilation of software-simulation modules. In the case of MOS technology, we employ a quadratic functional equation similar to that found in standard MOS transistors to implement synapses in ANNs. A structure has been proposed in to implement a MOS device with externally-controllable threshold voltage to be employed as a synapse. In the present work, we develop and compare practical architectures within which these synapses can be utilized optimally. A simulator and proper training algorithms have been developed to simulate different hardware-based architectures.< >
A major emphasis within the computational electromagnetics (CEM) community concerns the solution of Maxwell's differential equations using finite-difference time-domain (FDTD) techniques. Because of the computatio...
详细信息
A major emphasis within the computational electromagnetics (CEM) community concerns the solution of Maxwell's differential equations using finite-difference time-domain (FDTD) techniques. Because of the computational time and memory requirements associated with these time-stepping algorithms, their application to very large problems has been somewhat limited. To alleviate these computational obstacles, some efforts have previously been aimed at the implementation of space-parallelism-the concurrent computation of unknowns at different points in the spatial mesh using multiple processors-in the FDTD algorithms. For these schemes, however, communication and synchronization requirements have limited the amount of computational speed-up provided by the use of additional processors. This limited potential for enhanced computational efficiency implies that if full exploitation of the capabilities of emerging multiple instruction multiple data (MIMD) architectures is to be realized, approaches must be developed which represent a drastic departure from traditional FDTD techniques. The aim of the paper is to present such a computational strategy. Unlike traditional approaches which use space-parallelism this methodology exploits the decoupling mechanism of an eigenvalue-eigenvector (EE) decomposition of the FDTD matrix to allow the efficient implementation of time-parallelism-the simultaneous computation of field values at multiple time steps. The resulting algorithm is highly coarse grain and has minimum communication and synchronization requirements.< >
作者:
Siegl, Kurt
Johannes Kepler University LinzA-4040 Austria
||MAPLE|| (speak: parallel Maple) is a portable system for parallel symbolic computation. The system is built as an interface between the parallel declarative programming language Strand and the sequential computer al...
详细信息
ISBN:
(纸本)0897915895
||MAPLE|| (speak: parallel Maple) is a portable system for parallel symbolic computation. The system is built as an interface between the parallel declarative programming language Strand and the sequential computer algebra system Maple, thus providing the elegance of Strand and the power of the existing sequential algorithms in Maple. The implementation of difFerent parallelprogramming paradigms shows that it is fairly easy to parallelize even complex algebraic algorithms using this system. Sample applications (among them algorithms solving multivariate nonlinear equation systems) are implemented on various parallelarchitectures. For example a straightforward parallelization of the complex and important problem of real root isolation has been parallelized using a generic Strand program of fewer than 20 Unes of code and a slight modification of 5 lines in the original sequential Maple source. Even with such a simple modification we gained a speed-up of 5 times, that is better than those reported by others in the literature.
暂无评论