Functional languages as input specifications for HLS-tools allow to specify data dependencies but do not contain a notion of time nor execution order. In this paper, we propose a method to add this notion to the funct...
详细信息
ISBN:
(纸本)9783031150746;9783031150739
Functional languages as input specifications for HLS-tools allow to specify data dependencies but do not contain a notion of time nor execution order. In this paper, we propose a method to add this notion to the functional description using the dataflow model SDF-AP. SDF-AP consists of patterns that express consumption and production that we can use to enforce resource usage. We created an HLS-tool that can synthesize parallel hardware, both data and control path, based on the repetition, expressed in Higher-Order Functions, combined with specified SDF-AP patterns. Our HLS-tool, based on Template Haskell, generates an Abstract Syntax Tree based on the given patterns and the functional description uses the Clash-compiler to generate VHDL/Verilog. Case studies show consistent resource consumption and temporal behavior for our HLS-tool. A comparison with a commercially available HLS-tool shows that our tool outperforms in terms of latency and sometimes in resource consumption. The method and tool presented in this paper offer more transparency to the developer and allow to specify more accurately the synthesized hardware compared to what is possible with pragmas of the Vitis HLS-tool.
To efficiently utilize the functionality of dynamic reconfigurable computing systems, it is imperative that a software-oriented approach for modeling complex hardware/software systems be adopted. To achieve this, we n...
详细信息
ISBN:
(纸本)3540296433
To efficiently utilize the functionality of dynamic reconfigurable computing systems, it is imperative that a software-oriented approach for modeling complex hardware/software systems be adopted. To achieve this, we need to enhance the simulation environment to understand these dynamic reconfiguration requirements during system-level modeling. We present a flexible simulation model which is able to produce multiple views/contexts for a given problem. We use this model to examine each view for the mapping space, bandwidth, reconfiguration requirements, and configuration patterns of each computational problem.
The emergence of programmable logic devices as single-chip heterogeneous processing platforms for digital signal processing applications poses challenges concerning rapid implementation and high level optimisation of ...
详细信息
The emergence of programmable logic devices as single-chip heterogeneous processing platforms for digital signal processing applications poses challenges concerning rapid implementation and high level optimisation of algorithms on these platforms. This paper describes Abhainn, a rapid implementation methodology and toolsuite for translating an algorithmic expression of the system to a working solution on FPGA-centric embedded platforms. Two particular focuses for Abhainn are the automated but configurable reallisation of inter-processor communication fabrics, and the establishment of novel dedicated hardware component design methodologies allowing algorithm level transformation for system optimisation. This paper outlines the approaches employed in these areas and demonstrates their effectiveness on high end signal processing beamforming applications. (c) 2007 Elsevier B.V. All rights reserved.
We introduce low-overhead power optimization techniques to reduce leakage power in embedded processors. Our techniques improve previous work by a) taking into account idle time distribution for different execution uni...
详细信息
ISBN:
(纸本)3540364102
We introduce low-overhead power optimization techniques to reduce leakage power in embedded processors. Our techniques improve previous work by a) taking into account idle time distribution for different execution units, and b) using instruction decode and control dependencies to wakeup the gated (but needed) units as soon as possible. We take into account idle time distribution per execution unit to detect an idle time period as soon as possible. This in turn results in increasing our leakage power savings. In addition, we use information already available in the processor to predict when a gated execution unit will be needed again. This results in early and less costly reactivation of gated execution units. We evaluate our techniques for a representative subset of MiBench benchmarks and for a processor using a configuration similar to Intels Xscale processor. We show that our techniques reduce leakage power considerably while maintaining performance.
In this paper, we investigate the collapsing of eight multi-operand addition related operations into a single and common (3:2) counter array. We consider for this unit multiplication in integer and fractional represen...
详细信息
ISBN:
(纸本)354026969X
In this paper, we investigate the collapsing of eight multi-operand addition related operations into a single and common (3:2) counter array. We consider for this unit multiplication in integer and fractional representations, the Sum of Absolute Differences (SAD) in unsigned, signed magnitude and two's complement notation. Furthermore, the unit also incorporates a Multiply-Accumulation unit, (MAC) for two's complement notation. The proposed multiple operation unit was constructed around 10 element arrays that can be reduced using well known counter techniques, which are feed with the necessary data to perform the proposed eight operations: It is estimated that 6/8 of the basic (3:2) counter array is shared by the operations. The obtained results of the presented unit indicates that is capable of processing a 4x4 SAD macro-block in 36.35 ns and takes 30.43 ns to process the rest of the operations using a VIRTEX 11 PRO xc2vp100-7ff1696 FPGA device.
When designing DSP applications for implementation on field programmable gate arrays (FPGAs), it is often important to minimize consumption of limited FPGA resources while satisfying real-time performance constraints....
详细信息
The work presents a modeling and analysis framework for heterogeneous industrial networks architectures which is based on a tight integration of a network simulator with embedded software, middleware and a real-time o...
详细信息
This paper describes an energy-aware methodology that identifies custom instructions for critical code segments, given the available data bandwidth constraint between custom logic and a base processor. Our approach en...
详细信息
Designers of the upcoming digital-centric More-than-Moore systems are lacking a common design and simulation environment able to efficiently manage all the multi-disciplinary aspects of its components of various natur...
详细信息
In this paper, a flexible HW architecture for video-based driver assistance applications is presented. It comprises a customizable and extensible processor template and several task-specific HW accelerators. The propo...
详细信息
暂无评论