We propose a solution to handle two problems inducted by the growth of the complexity of machine vision systems: (1) the need of a robust, open and flexible framework to control various descriptive and operational kno...
详细信息
ISBN:
(纸本)0780372417
We propose a solution to handle two problems inducted by the growth of the complexity of machine vision systems: (1) the need of a robust, open and flexible framework to control various descriptive and operational knowledge; and (2) the necessity to have an architecture which offer parallelprocessingthat can be easily scaled to an evolving underlying hardware. We propose an agent society, implemented in the Java language, that is organized as an irregular pyramid for many reasons: (1) an agent provides an abstraction to encapsulate reactive or cognitive processing; and (2) the pyramid proposes a formal graph-based approach to ensure global and distributed goal satisfaction. the evaluation of the architecture performed on a X-scanner breast image, shows good quality results and parallelprocessing abilities.
A new VLSI architecture for the computation of the three-dimensional discrete cosine transform (3D DCT) for compression of integral 3D images is proposed. the 3D DCT is decomposed into 1D DCTs computed in each of the ...
详细信息
A new VLSI architecture for the computation of the three-dimensional discrete cosine transform (3D DCT) for compression of integral 3D images is proposed. the 3D DCT is decomposed into 1D DCTs computed in each of the three dimensions. the architecture is a parallel structure which computes an N/spl times/N/spl times/N-point DCT by computing N N/spl times/N 2D DCTs in parallel and feeding each of the computed 2D DCT coefficients into a final ID DCT block. the architecture uses 5N/sup 2//2 multiplier-accumulators to evaluate N/spl times/N/spl times/N-point DCT's at a rate of N complete 3D DCT coefficients per clock cycles, where N is even. the architecture is regular and modular and as such it is suitable for VLSI implementation. the proposed architecture has a better area-time performance than previously reported 3D DCT architectures. Also, the proposed architecture reduces the initial delay by a factor of N.
We propose a sampled-analog rank-order filter (ROF) architecture of complexity O(n/sup 2/). It yields a very compact structure because the devices used are essentially of minimum geometry. Its sole active building blo...
详细信息
We propose a sampled-analog rank-order filter (ROF) architecture of complexity O(n/sup 2/). It yields a very compact structure because the devices used are essentially of minimum geometry. Its sole active building block being the simple CMOS inverter, the circuit exhibits an excellent low-voltage compatibility. Furthermore, it can support a rail-to-rail input range. It is inherently fast due to the fully parallel signal processing, and the speed is expected to increase with technological scaling at the same rate as purely digital circuitry. Finally, it supports full programmability of the rank by means of an analog reference voltage. the ROF is based on a pair of multiple-winners-take-all (mWTA) circuits and a set of AND gates. the paper includes a description of the architecture and a detailed analysis of the mWTA. Most relevant design issues are addressed, and experimental results obtained from a fabricated ROF are presented.
A signal processing technique for analysing boththe transient natural excitations and the vibration structures response, capable of determining the time variations in the amplitude and in the frequency content of the...
详细信息
ISBN:
(纸本)0780372417
A signal processing technique for analysing boththe transient natural excitations and the vibration structures response, capable of determining the time variations in the amplitude and in the frequency content of these structures, is presented. this technique makes use of an impulse invariant transformation for obtaining an equivalent model of the vibration structure, represented by a parallel-form realization of second-order subsystems, corresponding to different modes of vibration, having as input a stationary independent and identically distributed sequence. the method was applied for detection of changes in dynamic characteristics of a vibrating structure, a multi-story concrete building subject to an earthquake ground motion.
the MPEG-4 object based profiles pose highly varying computational demands. To enable real-time and power efficient decoding of these profiles, parallelization of the algorithm is a necessity. At this, sensible system...
详细信息
the MPEG-4 object based profiles pose highly varying computational demands. To enable real-time and power efficient decoding of these profiles, parallelization of the algorithm is a necessity. At this, sensible system partitioning is of paramount concern in order to keep communication and synchronization overhead low. It is shown that especially the bit stream level decoding is considerably more complex than in earlier, frame based video coding standards. therefore, it is best executed on a dedicated unit. Based on the data dependencies between individual decoding operations, an interface between bit stream level and higher level decoding is derived which causes minimal synchronization overhead. An optimized software implementation of Main Profile bit stream level decoding shows that its computational demands for arbitrarily shaped Video Objects are 3.8-4.2 times as high as for frame based video. Under worst-case conditions 547.5MIPS are required for a Main@L2 decoder on a 32-bit MIPS RISC processor.
To meet the increasing requirement for high speed switches, a multiple input-queued (MIQ) switch is explored. Rather than handling the scheduling problem heavily focused on by many researchers, a proposed dynamic queu...
详细信息
ISBN:
(纸本)0780370937
To meet the increasing requirement for high speed switches, a multiple input-queued (MIQ) switch is explored. Rather than handling the scheduling problem heavily focused on by many researchers, a proposed dynamic queue allocation algorithm is used to handle non-uniform or hot-spot traffic. Although the performance of the original algorithm is analyzed by N.K. Sharma and M.R. Pinnu (see parallel Computing, vol.23, p.777-81, 1997) and the orderly property is enhanced by us, Wu and Lin, (see parallel Computing, vol.24, p.2143-8, 1998), the correctness of the algorithm has not been proven yet. In this report, withthe help of the inherent properties of FIFO queues, we prove that the algorithm, and the related MIQ switch, is free from deadlock.
ICA3PP 2000 was an important conferencethat brought together researchers and practitioners from academia, industry and governments to advance the knowledge of parallel and distributed computing. the proceedings const...
详细信息
ISBN:
(数字)9789812792037
ISBN:
(纸本)9789810244811
ICA3PP 2000 was an important conferencethat brought together researchers and practitioners from academia, industry and governments to advance the knowledge of parallel and distributed computing. the proceedings constitute a well-defined set of innovative research papers in two broad areas of parallel and distributed computing: (1) architectures, algorithms and networks; (2) systems and applications.
In this work a High Level Software Synthesis (HLSS) methodology is presented. HLSS allows the automatic generation of a parallel program starting from a sequential C program. HLSS deals with a significant class of ite...
详细信息
ISBN:
(纸本)3540675531
In this work a High Level Software Synthesis (HLSS) methodology is presented. HLSS allows the automatic generation of a parallel program starting from a sequential C program. HLSS deals with a significant class of iterative algorithms, the one expressible through nested loops with affine dependencies, and integrates several techniques to achieve the final parallel program. the computational model of the System of Affine Recurrence Equations (SARE) is used. As first step in HLSS, the iterative C program is converted into SARE form;parallelism is extracted from the SARE through allocation and scheduling functions which are represented as unimodular matrices and are determined by moans of an optimization process. A clustering phase is applied to fit the parallel program onto a parallel machine with a fixed amount of resources (number of processors, main memory, communication channels). Finally, the parallel program to be executed on the target parallel system is generated.
In this paper we present an approach to determine scheduling functions suitable for the design of processor arrays. the considered scheduling functions support a followed LSGP-partitioning of the processor array by al...
详细信息
ISBN:
(纸本)0769507166
In this paper we present an approach to determine scheduling functions suitable for the design of processor arrays. the considered scheduling functions support a followed LSGP-partitioning of the processor array by allowing to execute the tasks of processors of the frill-size array mapped into one processor of the partitioned processor array in art arbitrary order: Several constraints are derived to ensure the causality of computations and to prevent access conflicts to bath modules and registers. We propose an optimization problem generating the scheduling functions and outline its implementation as an integer linear program. the proposed methods are also applicable for the mapping of algorithms to parallelarchitectures. In this case, the scheduling function produces identical, independent small threads which can be combined to utilize the target architecture as much as possible.
High computational demands are one of the main reasons for the use of parallelarchitectures like clusters of PCs. Many parallel programs, however, suffer from severe inefficiencies when executed on such a loosely cou...
详细信息
ISBN:
(纸本)3540675531
High computational demands are one of the main reasons for the use of parallelarchitectures like clusters of PCs. Many parallel programs, however, suffer from severe inefficiencies when executed on such a loosely coupled architecture for a variety of reasons. One of the most important is the frequent access to remote memories. In this article, we present a hybrid event-driven monitoring system which uses a hardware monitor to observe all of the underlying transactions on the network and to deliver information about the run-time behavior of parallel programs to tools for performance analysis and debugging. this monitoring system is targeted towards cluster architectures with NUMA characteristics.
暂无评论