Advanced multimedia applications (e.g. based on MPEG-4) -will consist of multiple scalable multimedia objects. This scalability enables the application to adapt to different processing capabilities at the end user pla...
详细信息
ISBN:
(纸本)1932415424
Advanced multimedia applications (e.g. based on MPEG-4) -will consist of multiple scalable multimedia objects. This scalability enables the application to adapt to different processing capabilities at the end user platforms, hence providing the user with a certain Quality of Service (QoS). Moreover, thanks to the current evolution in run-time reconfigurable computing platforms, reconfigurability is becoming increasingly viable and supports adaptation to the varying complexity of the applications. This paper describes the problem of mapping HW/SW tasks of scalable applications on run-time reconfigurable platforms, built upon an instruction set processor and a run-time reconfigurable hardware, in such a way that the overall quality of service of the applications is maximised. This QoS aware hardware/software (HW/SW) partitioning problem is formulated in terms of an NP-hard optimisation problem for which an approximate solution using a two-step heuristic algorithm is proposed. Experimental results on a prototype platform have proved the importance of QoS requirements in taking run-time HW/SW partitioning decisions.
Me long-term goal of our work described in this paper is the development of a biologically-inspired cellular fault-tolerant hardware system. The basic structure of the system is a multi-cellular embryonic a-ray that i...
详细信息
ISBN:
(纸本)193241505X
Me long-term goal of our work described in this paper is the development of a biologically-inspired cellular fault-tolerant hardware system. The basic structure of the system is a multi-cellular embryonic a-ray that is capable of. achieving self-diagnostics, se self-repair and fault recovery. The 'nucleolus' of the cell is a general purpose function unit comprised of a 2-to-1 multiplexer and a D-type flip-flop. Input and configuration data to the function unit are provided by the DNA segment memory and I/O router. A diagnostic logic monitors the error free operation of the cell. When an error is detected the diagnostic logic requests the reconfiguration unit to kill the cell, transferring its function to a fault-free neighbouring cell Once permission is granted, die faulty cell is eliminated and becomes transparent. Die functionality of each cell will shift in the array until a spare cell is found. Finally the whole embryonic array recovers. Fault-free operation is then continued.
This paper describes a reconfigurable architecture based on field-programmable gate-array (FPGA) technology for monitoring and analyzing network traffic at increasingly high network data rates. Our approach maps the p...
详细信息
This paper describes a reconfigurable architecture based on field-programmable gate-array (FPGA) technology for monitoring and analyzing network traffic at increasingly high network data rates. Our approach maps the performance-critical tasks of packet classification and flow monitoring into reconfigurable hardware, such that multiple flows can be processed in parallel. We explore the scalability of our system, showing that it can support flows at multi-gigabit rate;this is faster than most software-based solutions where acceptable data rates are typically no more than 100 million bits per second.
The routing architecture of a reconfigurable device is the single most important factor in determining both logic density and overall performance. Previous research efforts focused on minimising interconnect delays an...
详细信息
ISBN:
(纸本)193241505X
The routing architecture of a reconfigurable device is the single most important factor in determining both logic density and overall performance. Previous research efforts focused on minimising interconnect delays and area requirements. For traditional single-context configuration subsystems, this is acceptable but as dynamic reconfiguration becomes increasingly important, the impact of routing architecture on reconfiguration overheads must be taken into account. This paper presents a detailed, simulation-based study of the relationship between routing architectures and reconfiguration overheads, and investigates the benefits of partitionable semi-global interconnect and just-in-time reconfiguration, a novel block-oriented configuration control strategy. Results indicate that changes in connection block design coupled with more flexible long distance interconnect can dramatically reduce overheads in systems which exploit dynamic reconfiguration.
Platform FPGAs incorporate many different components, such as processor core(s), reconfigurable logic, memory, etc., onto a single chip. When an application is synthesized on platform FPGAs, part of it can be executed...
详细信息
ISBN:
(纸本)1932415424
Platform FPGAs incorporate many different components, such as processor core(s), reconfigurable logic, memory, etc., onto a single chip. When an application is synthesized on platform FPGAs, part of it can be executed using hardware implementations on FPGA or software implementations on processor core(s). As the connection between different components on the devices are realized using FPGA routing resources, the designer has many choices far configuring the hardware components to execute the software. We show that these design choices have profound impact on the energy performance of the software programs. We propose a hybrid design approach for energy efficient application synthesis on platform FPGAs. It consists of a bottom-up process which performs simulation based performance modeling, and a top-down process which performs analytical performance optimization. The execution of an FFT software program on a state-of-the-art platform FPGA under various hardware choices is used to illustrate the bottom-up process. For the top-down process, we map an beamforming application onto hardware and software components based on the results from the bottom-up process. Energy reduction up to 46% is observed for the beamforming application using the proposed design approach.
Recently, energy dissipation for computations on FPGAs has become an important performance metric. In this paper, we summarize our recent efforts in developing an algorithm-level design methodology for optimizing the ...
详细信息
ISBN:
(纸本)1932415424
Recently, energy dissipation for computations on FPGAs has become an important performance metric. In this paper, we summarize our recent efforts in developing an algorithm-level design methodology for optimizing the energy performance of FPGA based implementations. For kernels, our design methodology consists of four steps: domain selection, domain-specific energy modeling, domain-space exploration and low-level simulation. To achieve system-level energy-efficiency, we outline a design methodology that integrates the kernel-level design methodology. Both the design methodologies can be used to achieve not only energy-efficiency but also latency, area, and power efficiency. We consider signal processing kernels as illustrative examples and demonstrate energy and time efficient algorithms and implementations for these on FPGAs. Example energy performance optimization through algorithmic optimizations include the 29%-51% improvement in energy performance for a matrix multiplication kernel, 57%-78% improvement for a FFT kernel and the 10%-60% improvement for a floating-point LU decomposition kernel over state-of-the-art implementations. Similarly, an improvement of 41% to 46% in energy performance was achieved by the system-level design approach over a greedy approach for a MVDR adaptive beamforming application. Finally we briefly describe a high-level tool for obtaining parameterized and energy-efficient designs on FPGAs.
Many scientific and engineering applications are data intensive. Their data-flow is often the performance limiting factor. When these applications are implemented on a reconfigurable system, the use of long routing re...
详细信息
ISBN:
(纸本)9781932415742
Many scientific and engineering applications are data intensive. Their data-flow is often the performance limiting factor. When these applications are implemented on a reconfigurable system, the use of long routing resources usually limits the performance. This paper compares cellular design methods for reconfigurable devices which reduce the need for long paths. Two different approaches are presented: a heterogeneous approach with centralised control, and a homogeneous approach with distributed control, both performing full search motion vector estimation. Our heterogeneous approach, which is based around the NIOS II processor, is easy to develop;when extended with custom instructions, it can process a 16 by 16 pixel macroblock in about 23 msec and uses 5099 LEs (Logic Elements). Our homogeneous approach, which contains one or more copies of an optimised cell and a system control block, can process a macroblock with larger search area in 4.6 msec using only 950 LEs. Additionally, the homogeneous approach can exploit more parallelism and shows nearly linear scaling to cope with larger frame-sizes, search-area and frame-rate.
In this paper, we present a unified estimation technique to find the lower bounds on the number of LUT blocks and that of the micro-registers which can be obtained by any partitioning or synthesis methods, respectivel...
详细信息
ISBN:
(纸本)193241505X
In this paper, we present a unified estimation technique to find the lower bounds on the number of LUT blocks and that of the micro-registers which can be obtained by any partitioning or synthesis methods, respectively, without performing any actual synthesis and/or design space exploration. The lower bound estimation is very important in sense that it greatly helps to evaluate the results of the previous work and even the future work. Some experimental results on lower bound estimation are shown and compared with the previous results published in the literature.
The 2-D discrete cosine transform (DCT) is an integral part of video and image processing;it is used in both the JPEG and MPEG encoding standards. As streaming video is brought to mobile devices, it becomes important ...
详细信息
ISBN:
(纸本)193241505X
The 2-D discrete cosine transform (DCT) is an integral part of video and image processing;it is used in both the JPEG and MPEG encoding standards. As streaming video is brought to mobile devices, it becomes important that it is possible to calculate the DCT in an energy-efficient manner. In this paper, we present a new algorithm and processing element (PE) architecture for computing the DCT with a linear array of PEs. This design is optimized for energy efficiency. We analyze the energy, area, and latency tradeoffs available with this design and then compare its energy dissipation, area, and latency to those of Xilinx's optimized IP core.
In this paper, a new project named Context Switching reconfigurable Hardware for Communication systems (COSRECOS) is introduced. The project started autumn 2009 and consists of applying reconfigurable hardware technol...
In this paper, a new project named Context Switching reconfigurable Hardware for Communication systems (COSRECOS) is introduced. The project started autumn 2009 and consists of applying reconfigurable hardware technology (Field Programmable Gate Arrays - FPGAs) for designing high performance run-time reconfigurable computing architectures for communication systems. The overall goal of the project is to contribute in making run-time reconfigurablesystems more feasible in general. This includes introducing architectures for reducing reconfiguration time as well as undertaking tool development. Case studies by applications in network and communication systems will be a part of the project. The paper describes how we plan to address the challenge of changing hardware configurations while a system is in operation. An overview of promising initial approaches is also included.
Proceedings of the 2011 internationalconference on engineering of reconfigurablesystems and algorithms (ersa’11, ISBN#: 1-60132-177-5), Editor: Toomas P. Plaks. Associate Editors: Shiu-Kai Chin, Pedro C. Diniz, William L. Harrison, Roman Lysecky, pp.: 255 – 262, Las Vegas, USA, 2011. ersa’11, The internationalconference on engineering of reconfigurablesystems and algorithms: http://***/ersa11
暂无评论