Various multimedia communication systems based on 3D-Audio algorithms have been proposed by researchers from the acoustic data processing domain. However, all systems reported in the literature follow a PC-based appro...
详细信息
ISBN:
(纸本)9781605589114
Various multimedia communication systems based on 3D-Audio algorithms have been proposed by researchers from the acoustic data processing domain. However, all systems reported in the literature follow a PC-based approach that introduces processing bottlenecks and excessive power consumption. In order to alleviate these problems, we propose a reconfigurable 3D-Audio processor that can record and render sound sources concurrently. Audio recording and rendering are performed by two hardware accelerators exploiting the beamforming and the Wave field Synthesis algorithms. The theoretical scalability of the proposed processor is explored with respect to systems consisting of different microphone and loudspeaker arrays configurations. A working FPGA prototype is compared against a software implementation on a Core2 Duo system. Results suggest that the proposed reconfigurable hardware solution can process data up to 2.4x faster than the software approach, while power consumption is approximately 7 Watts according to the Xilinx XPower report.
Memories play a key role in FGPAs in the forms of both programming bits and embedded memory blocks. FPGAs using non-volatile memories have been the focus of attention with zero boot-up delay, real-time reconfigurabili...
详细信息
ISBN:
(纸本)9781450301466
Memories play a key role in FGPAs in the forms of both programming bits and embedded memory blocks. FPGAs using non-volatile memories have been the focus of attention with zero boot-up delay, real-time reconfigurability, and superior energy efficiency. This paper presents a novel threedimensional (3D) non-volatile FPGA architecture (3D-Non- FAR) using phase change memory (PCM) and 3D die stacking techniques. Basic structures in a conventional FPGA architecture are renovated with PCM, and components are repartitioned and reorganized in 3D-NonFAR to allow an efficient 3D integration of PCM elements. 3D-NonFAR not only preserves the advantages of existing non-volatile FPGAs, but also provides high integration density, high performance, and bit-level programmability, which enable PCM as a universal memory replacement in FPGAs. Evaluation results show that 3D-NonFAR has smaller footprint, higher performance, and lower power consumption compared with other FPGA counterparts. Copyright 2010 acm.
The proceedings contain 24 papers. The topics discussed include: designing with extreme parallelism;high-quality, deterministic parallel placement for FPGAs on commodity hardware;enforcing long-path timing closure for...
ISBN:
(纸本)9781595939340
The proceedings contain 24 papers. The topics discussed include: designing with extreme parallelism;high-quality, deterministic parallel placement for FPGAs on commodity hardware;enforcing long-path timing closure for FPGA routing with path searches on clamped lexicographic spirals;mapping for better than worst-case delays in LUT-based FPGA designs;a complexity-effective architecture for accelerating full-system multiprocessor simulations using FPGAs;efficient ASIP design for configurable processors with fine-grained resource sharing;pattern-based behavior synthesis for FPGA resource reduction;modeling routing demand for early-stage FPGA architecture development;and trace-based framework for concurrent development of process and FPGA architecture considering process variation and reliability.
This paper presents the implementation of a high resolution time-to-digital converter (TDC) on a dynamically reconfigurable FPGA. The TDC architecture is based on the Vernier method using two ring oscillators with sli...
详细信息
ISBN:
(纸本)9781605589114
This paper presents the implementation of a high resolution time-to-digital converter (TDC) on a dynamically reconfigurable FPGA. The TDC architecture is based on the Vernier method using two ring oscillators with slightly different frequencies. The proposed oscillators can be calibrated with picoseconds resolution by taking advantage of partial reconfiguration, and moreover recalibrated over time. The results obtained on a Xilinx Virtex-II Pro FPGA show that the proposed TDC implementation can achieve unprecedented resolutions (on FPGA) as low as 5ps and precisions up to 25ps.
Decoding operation is one of the major performance bottlenecks in network coding applications. To address the problem caused by decoding delay, this paper proposes high-performance decoding logic on the field-programm...
详细信息
ISBN:
(纸本)9781605589114
Decoding operation is one of the major performance bottlenecks in network coding applications. To address the problem caused by decoding delay, this paper proposes high-performance decoding logic on the field-programmablegate-array (FPGA). A Galois field arithmetic logic unit (GF ALU) is implemented with a full parallelization. We claim that the complexity of hardware is reduced by use of the log and anti-log tables. In addition, the fast arithmetic operation is achieved by the parallelized GF ALU architecture, which allows one-row-calculations of a matrix to be performed concurrently. The decoders for four different sizes of the coefficient matrix have been implemented while the degree of parallelism is preserved for each size. The performance is evaluated by comparing with the performance of the decoding operation both on the ARM processor emulator and a real ARM processor. Using a modern Xilinx Virtex-5 device, the decoding time of 3.5 ms for the size 16 x 16 and 190.5 ms for 128 x 128 has been achieved at the operating frequency of 50MHz, which is equal to 12.7 and 21.7 in terms of speedup.
Many compute-bound software applications have seen order-of-magnitude speedups using application-specific accelerators built on specialized architectures such as field-programmablegatearrays. These architectures are...
详细信息
ISBN:
(纸本)9781605589114
Many compute-bound software applications have seen order-of-magnitude speedups using application-specific accelerators built on specialized architectures such as field-programmablegatearrays. These architectures are particularly good at implementing systems of recurrence equations realized as systolic arrays. We pursue high-level synthesis tools for recurrence equations that can search the space of possible parallel array designs to optimize various design criteria. Most existing approaches produce an array that is latency-space optimal. We target applications that operate on a large collection of small inputs, e.g. a database of biological sequences. For these applications, overall throughput, rather than latency per input, is the most important measure of *** this work we introduce a new design space exploration procedure to optimize throughput of a systolic array subject to area and bandwidth constraints of an FPGA device. We show that the throughput of an array is dependent on the maximum number of lattice points executed by any processor in the array, which is determined solely by the array's projection vector. We describe a bounded search process to find throughput-optimized projection vectors, discovering a range of array designs that are optimal for inputs of various *** have written a tool in C++ that accepts recurrence descriptions and produces throughput-optimized array mappings. We have applied our technique to the banded Smith-Waterman and Nussinov RNA folding algorithms, and present novel arrays that are 4-33x and 2-14x faster, respectively, than the currently-used latency-optimized array.
This work explores architectural ideas to build a scalable programmable quantum gate array (PQGA) by exploiting unique quantum effects such as superposition and entanglement/teleportation. In contrast to prior studies...
详细信息
ISBN:
(纸本)9781605589114
This work explores architectural ideas to build a scalable programmable quantum gate array (PQGA) by exploiting unique quantum effects such as superposition and entanglement/teleportation. In contrast to prior studies, in which a quantum computing machine is implemented either as an ASIC-like special-purpose chip tailored for specific algorithm or as a general-purpose processor based on the Von-Neumann model, we propose a PQGA architecture that is reconfigurable for different domain-specific applications with high logic density. The PQGA architecture is novel in several aspects, among which its interconnect work is built with "virtual wires" implemented with quantum entanglement. In this work, we propose various designs for logic block, interconnect network, and design strategies to construct large designs in PQGA. Our goal is to investigate new architectural ideas based on reconfigurable computing method in order to overcome the primary scalability challenges of reliability, communication, and quantum resource distribution that plague current proposals for large-scale quantum comput- ing. Leveraging the extensive groundwork in quantum computing and algorithm design, we provide estimation results to show that our proposed PQGA architecture can achieve not only high performance but also scalability. Finally, we benchmark our proposed PQGA architecture against previous quantum computer architecture and illustrate on average a 3x improvement in terms of logic density for the well-known Shor's quantum factoring algorithm.
This paper describes a bus mastering implementation of the PCI Express protocol using a Xilinx FPGA. While the theoretical peak performance of PCI Express is quite high, attaining that performance is a complex endeavo...
详细信息
ISBN:
(纸本)9781605584102
This paper describes a bus mastering implementation of the PCI Express protocol using a Xilinx FPGA. While the theoretical peak performance of PCI Express is quite high, attaining that performance is a complex endeavor on top of an already complex protocol. The implementation is described and its performance is analyzed. Source code is offered for free download via the web. Copyright 2009 acm.
This article presents the performance evaluation of two new diagonal routing tracks in FPGAs. We discuss the automatic detailed architecture generation issues and propose changes in the conventional placement and rout...
详细信息
ISBN:
(纸本)9781605584102
This article presents the performance evaluation of two new diagonal routing tracks in FPGAs. We discuss the automatic detailed architecture generation issues and propose changes in the conventional placement and routing to suit these architectures better. We conduct a series of experiments on these architecture with MCNC Benchmarks, where key parameters are varied over practical ranges and we conclude that the results are well in accordance, as predicted by the theory. Copyright 2009 acm.
暂无评论