An application-specific multiprocessor system is investigated for real-time implementation of the hierarchical block matching algorithm. The proposed architecture is based on parallel processing units and local memori...
详细信息
An application-specific multiprocessor system is investigated for real-time implementation of the hierarchical block matching algorithm. The proposed architecture is based on parallel processing units and local memories which are globally preloaded via a common bus. The performance is estimated for the data transfer and the parallel computation time schedule.< >
A method of implementing neural networks on programmable, parallel machines is presented. The method is applicable to multilayer connectionist networks and two dimensional, single-instruction multiple-data stream proc...
详细信息
A method of implementing neural networks on programmable, parallel machines is presented. The method is applicable to multilayer connectionist networks and two dimensional, single-instruction multiple-data stream processor arrays. A detailed description for a mapping of a multilayer perceptron with a back-propagation learning algorithm is provided. The mapping includes partitioning of inputs larger than the processor array. The performance of the method is evaluated using the Nettalk network, and is compared to that of other methods. In particular, it is shown that the implementation of the method on the Hughes Systolic/Cellular machine results in a processing rate equal to 100 million connections per second (MCPS).< >
A unified algebraic basis for transforming algorithms to achieve unlimited parallelism is provided. In the case of recursive algorithms, a certain class of algebraic structures is shown to be sufficient for the applic...
详细信息
A unified algebraic basis for transforming algorithms to achieve unlimited parallelism is provided. In the case of recursive algorithms, a certain class of algebraic structures is shown to be sufficient for the application of the look-ahead computation. This approach is generalized to matrix-based algorithms where the operations form a semiring. Previously known approaches to speeding up recursions are shown to be special instances of the given approach. A block processing realization corresponding to the logarithmic look-ahead computation is given which is efficient in its area and time requirements.< >
The authors show how the data parallelprogramming model can be applied to high level vision tasks needed for object recognition. The architecture and programming model of the Connection Machine System are reviewed. U...
详细信息
The authors show how the data parallelprogramming model can be applied to high level vision tasks needed for object recognition. The architecture and programming model of the Connection Machine System are reviewed. Utilities for representing and manipulating sets of data, the primary representation outside of the image plane, are described using communications primitives, especially segmented scans. Several algorithms for matching and evidence accumulation, which are constructed from the utilities, are compared. The techniques emphasize the use of sorting and sparse representations of space in order to limit the combinatorial processing requirements of high-level vision.< >
The QR algorithm has proven to be the most successful sequential algorithm for finding all eigenvalues of a dense nonsymmetric matrix. Recently, several parallel implementations of the QR algorithm have been proposed....
详细信息
The QR algorithm has proven to be the most successful sequential algorithm for finding all eigenvalues of a dense nonsymmetric matrix. Recently, several parallel implementations of the QR algorithm have been proposed. These implementations have all been based on the column-wrapped storage scheme. While column-wrapped storage has been used to obtain asymptotically 100% efficiency for many algorithms for solving linear systems, eigenvalue algorithms using such storage have achieved, at best, asymptotically constant efficiency. The authors present a parallel implementation of the QR algorithm that makes use of an alternate storage scheme that allows them to achieve asymptotically 100% efficiency. They also present a technique, which they call bundling, that further improves the observed efficiency of the parallel implementation.< >
A method for parallelizing the image reconstruction of a fan beam computed tomography (CT) scanner and a processor array adapted for computing the back-projection reconstruction algorithm (BPR) are presented. By makin...
详细信息
A method for parallelizing the image reconstruction of a fan beam computed tomography (CT) scanner and a processor array adapted for computing the back-projection reconstruction algorithm (BPR) are presented. By making use of some inherent parallelism in PBR, a modular ring architecture in multiple of eight processors is shown to be well suited for a class of CT algorithms. The parallel architecture encompasses a reconfigurable binary tree to accommodate a wide range of processor interconnections needed for different image-processing schemes, ranging from front-end filtering, calibration, and fast Fourier transform to postprocessing such as artifact removal and image compression.< >
There are efficient sequential algorithms that use linear programming (LP) for computing maximum weight matchings. Finding a deterministic parallel algorithm for computing maximum weight matchings in complete graphs h...
详细信息
There are efficient sequential algorithms that use linear programming (LP) for computing maximum weight matchings. Finding a deterministic parallel algorithm for computing maximum weight matchings in complete graphs has been an open problem for some time. Since LP is known to be P-complete, then, by the parallel computation thesis, it is unlikely that there exists an NC algorithm that uses LP to solve the maximum weight matching problem. The authors present an LP-based parallel algorithm for maximum weight matching in a complete weighted graph. The algorithm is designed for the EREW PRAM model of parallel computation, and runs in O(n/sup 3//p+n/sup 2/logn) time for p >
The complexity of most machine learning techniques can be improved by transforming iterative components into their parallel equivalent. The parallel architecture of the Connection Machine provides a platform for the i...
详细信息
The complexity of most machine learning techniques can be improved by transforming iterative components into their parallel equivalent. The parallel architecture of the Connection Machine provides a platform for the implementation and evaluation of parallel learning techniques. The architecture of the Connection Machine is described along with limitations of the language interface that constrain the implementation of learning programs. Connection Machine implementations of two learning programs, perceptron and AQ, are described, and their computational complexity is compared to that of the corresponding sequential versions using actual runs on the Connection Machine. Techniques for parallelizing ID3 are also analyzed, and the advantages and disadvantages of parallel implementation on the Connection Machine are discussed in the context of machine learning.< >
Efficient, numerically stable and highly concurrent algorithms for adaptive multichannel least-squares lattice filters are presented. The require O(pm/sup 2/) orthogonal operations to update the linear prediction and ...
详细信息
Efficient, numerically stable and highly concurrent algorithms for adaptive multichannel least-squares lattice filters are presented. The require O(pm/sup 2/) orthogonal operations to update the linear prediction and estimation errors as well as all reflection coefficients for a m-channel and p-tap problem. parallel array architectures for high-speed implementations are discussed.< >
The authors describe the development of parallel implementations for the bin packing problem. Bin packing algorithms are studied to understand various resource allocation issues and the impact of the different packing...
详细信息
The authors describe the development of parallel implementations for the bin packing problem. Bin packing algorithms are studied to understand various resource allocation issues and the impact of the different packing heuristics on the packing efficiency. The seemingly serial nature of the bin packing simulation has prohibited previous experimentations from going beyond sizes of several thousands bins. The authors show that by adopting fairly simple data parallel algorithms a linear speedup is possible. Sizes of up to hundreds of thousands of bins have been simulated for different parameters and heuristics. The authors focussed on the well known first-fit heuristic. They also considered another potentially superior heuristic, k-delayed best fit.< >
暂无评论