this paper adopts a transformational programming approach for deriving massively parallelalgorithms from functional specifications. It gives a brief description of a framework for relating key higher order functions ...
详细信息
ISBN:
(纸本)0780365429
this paper adopts a transformational programming approach for deriving massively parallelalgorithms from functional specifications. It gives a brief description of a framework for relating key higher order functions such as map, reduce, and scan with communicating processes with different configurations. the parallelisation of many interesting functional algorithms can then be systematically synthesized by combining "off the shelf" parallel implementations of instances of these higher order functions. Efficiency in the final message-passing algorithms is achieved by exploiting data parallelism, for generating the intermediate results in parallel; and functional parallelism, for processing intermediate results in stages such that the output of one stage is simultaneously input to the next one. this approach is illustrated through a case study for testing whether all the elements of a given list are distinct. Bird-Meertens formalism is used to concisely carry out algebraic transformations.
In this paper, we present a parallel algorithm for solving the congruent region problem of locating all the regions congruent to a test region in a planar figure on a mesh-connected computer(MCC). Given a test region ...
详细信息
Full adders are important elements in applications such as DSP architectures and microprocessors. We propose a technique to build a total of 41 new 10-transistor full adders using novel XOR and XNOR gates in combinati...
详细信息
Full adders are important elements in applications such as DSP architectures and microprocessors. We propose a technique to build a total of 41 new 10-transistor full adders using novel XOR and XNOR gates in combination with existing ones. We have done over 10000 HSPICE simulation runs of all the different adders in different input patterns, frequencies, and load capacitances. Almost all those new adders consume less power in high frequencies, while three new adders consistently consume on average 10% less power and have higher speed compared withthe previous 10-transistor full adder and the conventional 28-transistor CMOS adder.
Although arrays of SIMD PEs can be built with very high operating frequencies, problems exist in keeping the array busy. the inherent mismatch between host and array makes it difficult to maintain high array utilizati...
详细信息
Although arrays of SIMD PEs can be built with very high operating frequencies, problems exist in keeping the array busy. the inherent mismatch between host and array makes it difficult to maintain high array utilization: either the rate of instruction issue is very low or PE data locality is compromised, having the same effect. Our solution is based on an array control unit (ACU) design that expands macro instructions in two stages, first by data tile and then into microinstructions. the expansion itself solves the issue problem;decoupling the expansion modalities maintains data locality. Several issues involving host/ACU interaction need to be resolved to effect this solution.
this paper presents a parallel implementation of a CBIR system which deals with an image database composed of data from over 29 million bidimensional RGB images, which would be equivalent to 1.45 TB of graphical data....
详细信息
this paper presents a parallel implementation of a CBIR system which deals with an image database composed of data from over 29 million bidimensional RGB images, which would be equivalent to 1.45 TB of graphical data. the application has been designed for a distributed memory multiprocessor environment, and has been implemented in a cluster of twenty five PCs using MPI. the paradigm that best fits the problem's needs is a farm based solution: a master process distributes the work load between the slave processes, and when these have finished, the master recollects the partial results computed on each slave process. In order to evaluate this solution, the experimental results have been compared withthose achieved using a Silicon Graphics Origin 2000, a shared memory machine with eight processors. this paper analyzes the performances offered by both approaches from the viewpoints of speed, price and scalability, presenting the conclusions that can be extracted from the results' comparison.
Comparison of five different 32-bit integer multipliers is done for various performance measures. Multipliers included in comparison are the array multiplier, modified Booth (radix-4) multiplier, optimized Wallace tre...
详细信息
ISBN:
(纸本)9643600572
Comparison of five different 32-bit integer multipliers is done for various performance measures. Multipliers included in comparison are the array multiplier, modified Booth (radix-4) multiplier, optimized Wallace tree multiplier, combined modified Booth-Wallace tree multiplier and twin pipe serial parallel multiplier. Comparison is based on synthesis results obtained by synthesizing all multiplier architectures towards FPGA.
Based on learning theory, Support the statistical Vector Machines is a novel neural network method for solving image classification problems. It has proven to obtain the optimal decision hyperplane and is also unaware...
详细信息
Based on learning theory, Support the statistical Vector Machines is a novel neural network method for solving image classification problems. It has proven to obtain the optimal decision hyperplane and is also unaware of the dimensionality of the problem. the decision function is constructed withthe support vectors obtained during the learning process. Each pixel bloc in the training database is processed as an input vector, the learning process finds out between input vectors those who will construct the solution (the support vectors), the weights and the threshold of the neural network. SVM does not need a test database and the solution depends entirely on the training database. the aim of our work is to exploit the regularities of the SVM decision function in an integrated vision system. the application of our vision system is object detection and localization. We use SVM classifier as the main module of the system. In order to reduce the classification computation time we are proposing a parallel implementation on an FPGA programmed with VHDL.
In this paper we describe the design and implementation of an efficient and compact image processing library for a digital still camera based on Siemens TriCore microcontroller-DSP processor. the library is designed f...
详细信息
In this paper we describe the design and implementation of an efficient and compact image processing library for a digital still camera based on Siemens TriCore microcontroller-DSP processor. the library is designed for use in both off-line (e.g. NT based Pentium platforms) as well as on-line (TriCore implementation). To satisfy the constraints of embedded systems the library was designed to operate on an input image using the concept of band processing. In such a method, the input image is divided into an appropriate number of data bands (strips). the image bands are then processed separately using a pipeline of band based operators. the processed bands are then collected into a single output image. Most of the operators incorporated in the library take advantage of the band processing mechanism and operate on a stream of such image bands. this scheme not only alleviates the memory space requirements but also lends itself to multithreading and parallelprocessing implementations with potential for even faster performance. the library was optimized in terms of code size, (31 kilobytes) and processing speed (1.98 sec. on an 1008×800 input image in the acquisition mode of operation) to meet the current requirements of a size less than 250 kilobytes and a processing speed of less than 2 seconds/image.
this report presents an adaptive speech analysis method specially used for a speech recognition system. the designed speech recognition system consists of an adaptive speech analysis, a self-organized clustering/pseud...
详细信息
this report presents an adaptive speech analysis method specially used for a speech recognition system. the designed speech recognition system consists of an adaptive speech analysis, a self-organized clustering/pseudo-labeling method and a DTW. All methods are redesigned in fully parallel and pipelined mechanism. In the speech analysis method, an adaptive ARMA lattice modelling is introduced for the reduction of distortion, noise and disturbance. In addition, the speech analysis keeps robust condition where an adaptive method is usually considered to be sensitive as to the convergence property and the parameter estimation. By using the recognition system including the robust adaptive speech analyzer, speech recognition results are shown.
Motivated by the objective of finding a reduced complexity implementation of the EM (estimate and maximize) algorithm, the authors move the concept of alternating projection (AP), reported Ziskind and Wax (1988), to a...
详细信息
Motivated by the objective of finding a reduced complexity implementation of the EM (estimate and maximize) algorithm, the authors move the concept of alternating projection (AP), reported Ziskind and Wax (1988), to a specific architecture for array processing in communications. Direction of arrivals (DOAs) are estimated by scanning the scenario with a dedicated beamvector, proving that low-resolution procedures with constraints, working in parallel, may enhance the performance of high resolution methods. Since no inverse is involved the method is robust and copes with full coherent sources (specular multipath) and fast updates. the procedure proved to be useful for adaptive beamforming in either point-to-point or mobile communications. Preserving the EM performance the array processing architecture offers a wide range of possibilities in updating and framing taking the best of hardware resources.
暂无评论