The aim of this paper is to present a comparative analysis of the execution times of low-level vision algorithms on two different simd parallel machines. The set of algorithms is part of the DARPA Image Understanding ...
详细信息
ISBN:
(纸本)0780335295
The aim of this paper is to present a comparative analysis of the execution times of low-level vision algorithms on two different simd parallel machines. The set of algorithms is part of the DARPA Image Understanding benchmark, a widely-accepted platform for performance comparison of parallel systems in the field of computer vision. The considered computer architectures represent two opposite solutions in terms of granularity in approaching the simd paradigm, one with a coarse-grain array of floating-point processors and the other with a fine-grain array of single-bit processing elements. For these reasons, the set of algorithms was implemented on both systems taking into account machine specificities. In this work some insights into implementation issues and a comparative analysis of the assessed execution times are presented.
The paper presents a new testing method applicable to VLSI arrays made up of microcomputers as processing elements. A system with single instruction multiple data (simd) processing is assumed. In this system, computin...
详细信息
The paper presents a new testing method applicable to VLSI arrays made up of microcomputers as processing elements. A system with single instruction multiple data (simd) processing is assumed. In this system, computing elements are connected by a regular interconnection network. A new fault model for the array is presented. Faults are defined at a functional level and allow a systematic test generation procedure to be derived. This procedure is independent of array implementation details and still retains a simd characterization. Testing is performed by sequences of instructions. Test sequences are defined by using two ordering criteria. The first criterion establishes the external observability and controllability of the instructions. The second criterion uses instruction cardinality as metric for evaluation of inspection complexity. Algorithms and procedures for a correct execution of functional testing are presented. An example of the application of the proposed technique to an existing parallel scheme made of complex microprocessors is described. The criteria for structuring the test procedure lead to an optimization of fault coverage and a reduction of ambiguity.
During the last years many hardware realizations of neural networks and algorithms have been developed. Nowadays, these developments result in the first commercially available products, usually distributed by American...
详细信息
This paper presents a new processing cell circuit, suitable for use in massively parallel fine-grain processor arrays, oriented towards image processing applications. The design, based on dynamic logic, is efficient f...
详细信息
ISBN:
(纸本)9780889866058
This paper presents a new processing cell circuit, suitable for use in massively parallel fine-grain processor arrays, oriented towards image processing applications. The design, based on dynamic logic, is efficient for both local and global operations. In this paper we discuss design trade-offs and provide detailed description of the architecture. A cellular processor array based on the presented design can operate in both discrete- and continuous-time domains. Asynchronous execution of global operations significantly increases overall performance. Simulation results indicate the performance in the range from 1.1 (unsigned products) to 2900 (asynchronous binary processing) MOPS/cell.
Although massively parallel arrays for spatially mapped applications have been proposed since the 1950s(42) and built since the 1960s,(12) there have been very few systematic empirical studies that cover more than a s...
详细信息
Although massively parallel arrays for spatially mapped applications have been proposed since the 1950s(42) and built since the 1960s,(12) there have been very few systematic empirical studies that cover more than a small fraction of the design space. The problems have included the lack of a test suite of non-trivial application codes;inadequate language support;the difficulties of balancing evaluation performance with flexibility;and balancing test suite portability with accuracy of evaluation. We describe an environment that addresses these problems. A realistic workload including a series of applications currently being used as building blocks in vision research has been constructed. Both flexibility in architectural parameter selection and simulation efficiency are maintained with a novel new technique that combines virtual machine emulation with trace-driven simulation. The trade-off between fairness to diverse target architectures and programmability of the test suite is addressed through the use of operator and application libraries for a small set of critical functions. We also present examples of the type of results we are obtaining, including the effects of changing ALU designs and datapath widths, finding critical points in register set and cache sizes, the benefits of various types of router networks, and the performance cost of processor virtualization.
暂无评论