In this paper, we present a C-to-DFG generation algorithm for coarse-grained reconfigurable processor in multimedia application field. the algorithm exploits the operation parallelism available in the sequential code;...
详细信息
In this paper, we present a C-to-DFG generation algorithm for coarse-grained reconfigurable processor in multimedia application field. the algorithm exploits the operation parallelism available in the sequential code; maximizes parallelism by loop unrolling and scalar replacement. Loop unrolling increases the size of basic block and fully exposes the intrinsic data parallelism. Scalar replacement eliminates memory access instructions from the basic block under the prerequisite condition of keeping data dependency. For mapping kernels, the three parts of DFGs are corresponding to the three sub-components of reconfigurable unit. the experiments evaluating the degrees of parallelism on DFGs suggest 5.2x to 120.4x speedups on four kernels from common multimedia algorithms.
Subword parallelism can efficiently improve the performance of multimedia applications. Two different control mechanisms, carry truncation and carry elimination, for subword parallel adder design are proposed in this ...
详细信息
Subword parallelism can efficiently improve the performance of multimedia applications. Two different control mechanisms, carry truncation and carry elimination, for subword parallel adder design are proposed in this paper. the carry truncation mechanism achieves subword partition by inserting killing logics into the carry propagation chain, while the carry elimination one employs control logics on the subword boundary bit positions. Based on these two mechanisms, we implement several representative adder algorithms. the experimental results show that, for all the adder algorithms, the proposed carry elimination mechanism counts averagely 8% less delay than the carry truncation one. However, except for the Kogge-Stone and Brent-Kung adders, the carry elimination mechanism requires more gates and higher power consumption than the carry truncation one. this paper also compares the performance of different adder algorithms.
Many machine vision applications deal with depth estimation in a scene. Disparity map recovery from a stereo image pair has been extensively studied by the computer vision community. Previous methods are mainly restri...
详细信息
ISBN:
(纸本)9783642041457
Many machine vision applications deal with depth estimation in a scene. Disparity map recovery from a stereo image pair has been extensively studied by the computer vision community. Previous methods are mainly restricted to software based techniques on general-purpose architectures, presenting relatively high execution time due to the computationally complex algorithms involved. In this paper a new hardware module suitable for real-time disparity map computation module is realized. this enables a hardware based occlusion-aware parallel-pipelined design, implemented on a single FPGA device with a typical operating frequency of 511 MHz. It provides accurate disparity map computation at a rate of 768 frames per second, given a stereo image pair with a disparity range of 80 pixels and 640x480 pixel spatial resolution. the proposed method allows a fast disparity map computational module to be built, enabling a suitable module for real-time stereo vision applications.
Real-time simulation is an important issue in the design of power electronic systems, especially in the context of hardware-in-loop (Hit) simulation. this paper is concerned withthe development of a real-time simulat...
详细信息
ISBN:
(纸本)9781424441662
Real-time simulation is an important issue in the design of power electronic systems, especially in the context of hardware-in-loop (Hit) simulation. this paper is concerned withthe development of a real-time simulation environment that is low-cost and can be easily set up in an educational laboratory. Any real-time simulation environment needs three essential components: a mechanism to accept a description of the system to be simulated, a digital hardware platform to carry out the simulation, and real-time software to manage the simulation. this paper addresses these three issues from the viewpoint of an educational laboratory setup. Real-time simulation has typically been carried out on complex and specialized multiprocessor systems, running dedicated real-time software. However, the current availability of low-cost, high-speed multi-core digital processor systems has made it possible to use standard computing hardware for this purpose. this has been aided further by the availability of real-time operating systems with multi-core execution capability. this paper discusses the issues involved in setting up a real-time simulator based on multi-core processors, and presents the details of an educational laboratory setup. As an example, the paper shows the simulation of an induction motor drive system. Experimental plots are presented. A timing analysis of the simulation is also presented, along with timing accuracy measurements.
One of the critical issues in floorplanning is to minimize area and/or wire length of a given design with millions of transistors while considering other factors which may influence the success of design flow or even ...
详细信息
High-performance and flexible configurable extract instructions targeted at stream cipher processing are proposed by analyzing the structures and operating characteristics of more than forty public stream cipher algor...
详细信息
High-performance and flexible configurable extract instructions targeted at stream cipher processing are proposed by analyzing the structures and operating characteristics of more than forty public stream cipher algorithms in this paper. the extract instructions are designed to sustain four different data widths, and ten parallel extract modes are exploited by instruction level parallelism based on VLIW system structure. Further more, the corresponding reconfigurable hardware circuit is implemented. By configurating the hardware circuit, the extract of different data width and different parallel mode can be gained efficiently, so the circuit can be used as an important accelerated unit in special processing for stream cipher.
We present an efficient implementation of a high performance parallel framework for Agent Based Modelling (ABM), exploiting the parallel architecture of the Graphics processing Unit (GPU). It provides a mapping betwee...
详细信息
ISBN:
(纸本)9780981738178
We present an efficient implementation of a high performance parallel framework for Agent Based Modelling (ABM), exploiting the parallel architecture of the Graphics processing Unit (GPU). It provides a mapping between formal agent specifications, with C based scripting, and optimised NVIDIA Compute Unified Device Architecture (CUDA) code. the mapping of agent data structures and agent communication is described, and our work is evaluated through a number of simple interacting agent examples. In contrast with an alternative, single machine CPU implementation, a speedup of up to 250 times is reported.
the PermaSense project has set the ambitious goal of gathering real-time environmental data for high-mountain permafrost in unattended operation over multiple years. this paper discusses the specialized sensing and da...
详细信息
ISBN:
(纸本)9781424451081
the PermaSense project has set the ambitious goal of gathering real-time environmental data for high-mountain permafrost in unattended operation over multiple years. this paper discusses the specialized sensing and data recovery architecture tailored to meet the precision, reliability and durability requirements of scientists utilizing the data for model validation. We present a custom sensor interface board including specialized sensors and redundancy features for end-to-end data validation. Aspects of high-quality data acquisition, design for reliability by strict separation of operating phases and analysis of energy efficiency are discussed. the system integration using the Dozer protocol scheme achieves a best-in-class average power consumption of 148 mu A considerably exceeding the lifetime requirement.
As the amount of available RDF data continues to increase steadily, there is growing interest in developing efficient methods for analyzing such data. While recent efforts have focused on developing efficient methods ...
详细信息
ISBN:
(纸本)9783642049293
As the amount of available RDF data continues to increase steadily, there is growing interest in developing efficient methods for analyzing such data. While recent efforts have focused on developing efficient methods for traditional data processing, analytical processing which typically involves more complex queries has received much less attention. the use of cost effective parallelization techniques such as Google's Map-Reduce offer significant promise for achieving Web scale analytics. However, currently available implementations are designed tor simple data processing on structured data. In this paper, we present a language, RAPID, for scalable ad-hoc analytical processing of RDF data on Map-Reduce frameworks. It builds on Yahoo's Pig Latin by introducing primitives based on a specialized join operator, the MD-join, for expressing analytical tasks in a manner that is more amenable to parallelprocessing, as well as primitives for coping with semi-structured nature of RDF data. Experimental evaluation results demonstrate significant performance improvements for analytical processing of RDF data over existing Map-Reduce based techniques.
暂无评论