The concepts and options for configuration management (CM) within a new framework under development at Florida called CARMA (Comprehensive Approach to reconfigurable Management Architecture) are discussed. The CM in C...
详细信息
ISBN:
(纸本)1932415424
The concepts and options for configuration management (CM) within a new framework under development at Florida called CARMA (Comprehensive Approach to reconfigurable Management Architecture) are discussed. The CM in CARMA uses a Board Interface Module (BIM) to provide abstracted user access to RC boards and hardware independence to higher layers. The four disctributed CM schemes, which include master-worker (MW), client server (CS), client broker (CB) and peer-to-peer (PP), are also proposed. The local CM communicates with one or more remote CMs to obtain the configuration over the data network, if a cache miss. CMs also manage FPGA resource access, configuration relocation, transformation, defragmentation, and caching.
Gillespie's Stochastic Simulation Algorithm (SSA) is an important tool for accurately modeling spatially homogeneous chemical, biological, and ecological systems. The SSA provides an accurate representation of mod...
详细信息
ISBN:
(纸本)1932415742
Gillespie's Stochastic Simulation Algorithm (SSA) is an important tool for accurately modeling spatially homogeneous chemical, biological, and ecological systems. The SSA provides an accurate representation of models affected by the presence of stochastic fluctuations or noise, and property models systems with small populations of chemical species. Unfortunately, the utility of the SSA is limited by its high computational complexity and long simulation time for large-scale systems. Presented here is a novel approach for improving the performance of the SSA using a custom hardware accelerator implemented in a Field Programmable Gate Array (FPGA). The performance Of software and hardware-accelerated implementations are compared when given identical chemically reacting systems. The measurements demonstrate that the hardware-accelerated system shows around 1.5X improvement over the fastest known software implementation. Furthermore, the work demonstrates the applicability of a reconfigurable computing approach to accelerating stochastic simulation.
In this paper, an original method for the synthesis of one part of block turbo decoder is presented. From the abstract specifications, Madeo framework proposes an object oriented method to synthesize them onto FPGA st...
详细信息
ISBN:
(纸本)1932415742
In this paper, an original method for the synthesis of one part of block turbo decoder is presented. From the abstract specifications, Madeo framework proposes an object oriented method to synthesize them onto FPGA structures. The method relies on the capability of Madeo framework to manage high level expressions. The paper shows how this can be achieved in a case of turbo decoder, especially for the syndrome computation and least reliable symbols search. This kind of application is a good candidate for Madeo because it deals with non-standard arithmetics (Galois field GF(128)). The synthesis results using this approach are compared to ones from a more traditional design, using SystemC.
In this paper, we present a hardware-based dynamic scheduling technique that can be used to schedule irregularly structured task sustems onto a Dynamically reconfigurable Hybrid Architecture (DHRA). The microarchitect...
详细信息
ISBN:
(纸本)9781932415742
In this paper, we present a hardware-based dynamic scheduling technique that can be used to schedule irregularly structured task sustems onto a Dynamically reconfigurable Hybrid Architecture (DHRA). The microarchitecture that implemets this technique is presented along with the major characteridtics of the scheduling approach. To study the effectiveness of the scheduling technque, we have developed a functional simulation model for the proposed hybrid architecture as well as the hardware-based task scheduler. Empirical results from the resulting simulations indicate that the proposed method can reduce the applications execution time by an average of 17% when compared to a previous method for cased where the queue search depth is small.
It has been observed that fine-grained filter structures are mapped efficiently to XPP but the compiler is unable to decompose the arrays declared within the filter body in StreamIt code, which results in generation o...
详细信息
ISBN:
(纸本)1932415742
It has been observed that fine-grained filter structures are mapped efficiently to XPP but the compiler is unable to decompose the arrays declared within the filter body in StreamIt code, which results in generation of extra event control logic and makes it difficult for the XPP mapper to route the application. Another observation is that XPP is suitable for arithmetic operations but the conditional constructs in the Streamlt code are not mapped efficiently to XPP. For XPP as a co-processor with a host processor performing the control of the application flow and assigning computational tasks to XPP, a partitioner can be developed to partition the Streamlt generated code of the application between the host processor and the XPP. Another approach for compilation for XPP is to generate NML code directly from the SIR representation.
The current paradigm shift in signal processing system design favors the deployment of heterogeneous systems that may combine DSPs, FPGAs and other digital processing elements. Consequently, there has been an expanded...
详细信息
ISBN:
(纸本)1932415424
The current paradigm shift in signal processing system design favors the deployment of heterogeneous systems that may combine DSPs, FPGAs and other digital processing elements. Consequently, there has been an expanded interest in design exploration methodologies to explore the design space for optimal mixed-signal signal processing designs. The emergence of field programmable analog arrays (FPAAs) has provided the resources to realize these methodologies. This paper presents a multi-objective system level synthesis methodology that explores the design space for mixed-signal signal processing solutions.
This paper proposes a novel interconnection structure for reconfigurable cell arrays used in digital signal processing applications. The structure provides two routing mechanisms: local connections between neighboring...
详细信息
ISBN:
(纸本)1932415424
This paper proposes a novel interconnection structure for reconfigurable cell arrays used in digital signal processing applications. The structure provides two routing mechanisms: local connections between neighboring cells, and a global H-tree linking every cell in the array. This scheme exploits the symmetry and modularity of digital signal processing algorithms, and allows users to map critical computations by hand if desired. The H-tree contains a hierarchy of crossbar switches;switches on upper levels manipulate data in larger units than switches on lower levels. The structure is also pipelined for high throughput. A 32-bit, 512-point Fast Fourier Transform is routed by hand onto the cell array to demonstrate the efficiency of the design.
The enumeration of 2D fast Fourier transforms (FFT) design options, based on multiple copies of existing 1D FFT IP cores, is discussed. An optimal design that maximizes the throughput subject to hardware resource cons...
详细信息
ISBN:
(纸本)1932415424
The enumeration of 2D fast Fourier transforms (FFT) design options, based on multiple copies of existing 1D FFT IP cores, is discussed. An optimal design that maximizes the throughput subject to hardware resource constraints is selected. The properties of optimal designs are summarized to illustrate the relationship between the number of memory ports and throughput. The design option is identified with six numbers, the first three for the first stage and the second three for the second stage. Each list of three numbers includes the number of FFT IP cores, the IP architecture and the memory mode of phase factors. The IPs in each stage are assumed to be of the same type (AR and MMph) so as to simplify the controller complexity.
A lookup table (LUT) based packing mechanism has been presented as part of the application specific reconfigurable architecture design methodology proposed in earlier work. In addition to routability driven cost metri...
详细信息
ISBN:
(纸本)1932415742
A lookup table (LUT) based packing mechanism has been presented as part of the application specific reconfigurable architecture design methodology proposed in earlier work. In addition to routability driven cost metrics defined by other researchers, packing mechanism, prioritizes nets that lead to reduction of input/output pins that are within the fan-in fan-out distance. Existing approaches treat the number of intersecting nets as positive gain during packing and ignore the wiring requirement growth within the cluster being formed. New packing algorithm employs average interconnection length estimation obtained through Rent's Rule in order to incorporate the wiring requirement into the cost function. This approach provides on average 28% reduction in number of nets and 26% reduction on number of tracks used compared to the state of the art approaches. Results will lead to significant amount of savings in switching complexity;hence contribute to reduction in power consumption and increase the processing speed.
This paper presents a novel instruction cell-based reconfigurable computing architecture for low-power applications, thereafter referred to as the reconfigurable instruction cell array (RICA). For the development of t...
详细信息
This paper presents a novel instruction cell-based reconfigurable computing architecture for low-power applications, thereafter referred to as the reconfigurable instruction cell array (RICA). For the development of the RICA, a top-down software driven approach was taken and revealed as one of the key design decisions for a flexible, easy to program, low-power architecture. These features make RICA an architecture that inherently solves the main design requirements of modern low-power devices. Results show that it delivers considerably less power consumption when compared to leading VLIW and low-power digital signal processors, but still maintaining their throughput performance.
暂无评论