Matching is an important pari of a model-based object recognition system. Matching is a difficult task, for a number of reasons. First, in a number of recognition systems matching is formulated as a combinatorial prob...
详细信息
Barrier algorithms are central to the performance of numerous algorithms on scalable, high-performance architectures. Numerous barrier algorithms have been suggested and studied for Non-Uniform Memory Access (NUMA) ar...
详细信息
ISBN:
(纸本)0818656026
Barrier algorithms are central to the performance of numerous algorithms on scalable, high-performance architectures. Numerous barrier algorithms have been suggested and studied for Non-Uniform Memory Access (NUMA) architectures, but less work has been done for Cache Only Memory Access (COMA) or attraction memory [1] architectures such as the KSR-1. In this paper, we presented two new barrier algorithmsthat offer the best performance we have recorded on the KSR-1 distributed cache multiprocessor. We discuss the trade-offs and the performance of seven algorithms on two architectures. the new barrier algorithms adapt well to a hierarchical caching memory model and take advantage of parallel communication offered by most multiprocessor interconnection networks,. Performance results are shown for a 256-processor KSR-1 and a 20-processor Sequent Symmetry.
A novel reconfigurable architecture based on a Multi-Ring Multiprocessor Network is described. the reconfigurable architecture is shown to combine low network diameter with a low degree of connectivity for each node i...
详细信息
this work deals with evaluation of hardware implementations of image processingalgorithms for real time applications, using SRAM based Field Programmable Gate Arrays. We discuss a generic architectural model adapted ...
详细信息
parallelalgorithms developed for CAD problems today suffer from three important drawbacks. first, they are machine specific and tend to perform poorly on architectures other than the one for which they were designed....
详细信息
ISBN:
(纸本)0818656026
parallelalgorithms developed for CAD problems today suffer from three important drawbacks. first, they are machine specific and tend to perform poorly on architectures other than the one for which they were designed. Second, they cannot use the latest advances in improved versions of the sequential algorithms for solving the problem. third, the quality of results degrade significantly during parallel execution. In this paper we address these three problems for an important CAD application: standard cell placement. We have developed a new parallel placement algorithm that is portable across a range of MIMD parallelarchitectures. the algorithm is part of the ProperCAD project which allows the development and implementation of a parallel algorithm such that it can be executed on a wide variety of parallel machines without any change to the source. the parallel placement algorithm is based on an existing implementation of the sequential simulated annealing algorithm, TimberWolfSC 6.0 [1].
Segmentation and other image processing operations rely on convolution calculations with heavy computational and memory access demands. this paper presents an analysis of a texture segmentation application containing ...
详细信息
ISBN:
(纸本)0818656026
Segmentation and other image processing operations rely on convolution calculations with heavy computational and memory access demands. this paper presents an analysis of a texture segmentation application containing a 96x96 convolution. Sequential execution required several hours on single processors systems with over 99% of the time spent performing the large convolution. 70% to 75% of execution time is attributable to cache misses within the convolution. We implemented the same application on CM-5, iPSC/860 and PVM distributed memory multicomputers, tailoring the parallelalgorithms to each machine's architectures. parallelization significantly reduced execution time, taking 49 second on a 512 node CM-5 and 6.5 minutes on a 32 node iPSC/860.
We Have continued our study of a parallel perturbative learning method [Alspector et al., 1993] and implications for its implementation in analog VLSI. Our new results indicate that, in most cases, a single parallel p...
An ASIC has been designed to perform functions including digital quadrature demodulation and signal detection on an intermediate frequency signal sampled at 400 MHz in electronic warfare receivers. this performance is...
详细信息
An ASIC has been designed to perform functions including digital quadrature demodulation and signal detection on an intermediate frequency signal sampled at 400 MHz in electronic warfare receivers. this performance is achieved through a fully pipelined, parallel architecture implemented on a GaAs gate array. the hardware complexity is minimized by careful cost-performance tradeoffs in the design of the algorithms.
Scalable parallel computer architectures provide the computational performance demanded by advanced biological computing problems. NIH has developed a number of parallelalgorithms and techniques useful in determining...
详细信息
Scalable parallel computer architectures provide the computational performance demanded by advanced biological computing problems. NIH has developed a number of parallelalgorithms and techniques useful in determining biological structure and function. these applications include processing electron micrographs to determine the three-dimensional structure of viruses, calculating the solvent accessible surface area of proteins to predict the three-dimensional conformation of these molecules from their primary structure, and searching for homologous DNA sequences in large genetic databases. Timing results demonstrate substantial performance improvements withparallel implementations compared with conventional sequential systems.
Currently, many parallelalgorithms are defined for shared- memory architectures. the prefered machine model for designing these algorithms is the PRAM. However, this model does not take into account properties of exi...
详细信息
暂无评论