For the increasing market of smart phones, mobile internet devices, and ultra-mobile PCs, mainstream vendors propose two approaches: one is based on ARM SoC, and the other is based on power-efficient x86 processor. Ho...
详细信息
ISBN:
(纸本)9781605589114
For the increasing market of smart phones, mobile internet devices, and ultra-mobile PCs, mainstream vendors propose two approaches: one is based on ARM SoC, and the other is based on power-efficient x86 processor. However, either approach has its own limitation. The ARM-based approach lacks application software while the x86-based approach does not support flexible SoC extension. To overcome the limitations, we propose the PKUnity86 SoC architecture, which is based on AMBA bus architecture to support fast IP integration. Furthermore, it contains a reduced AMD Geode GX2 :processor and several specific designs to support Microsoft Windows and exploit the massive PC software resources. This paper presents two fpga prototypes of PKUnity86: P86-Core and P86-Min. For P86-Core, which is to verify the core of PKUnity86, we change the RTL code of the reduced Geode GX2 to make it fpga-synthesizable and implement it on a Xilinx Virtex-4 LX200 fpga device. We connect the fpga board to a Geode SP4GX22 motherboard so that we can do full-system emulation. For P86-Min, which is to verify the minimum set of PKUnity86, we implement the RTL code on two Xilinx Virtex-4 LX200 fpga devices and emulate the full system on a single fpga board. In addition, we adopt a hardware-software co-development methodology and employ various debug tools to facilitate building P86-Min. Both prototypes reach its own compatibility goal: P86-Core supports Windows XP and previous versions and P86-Min supports Windows 98 and previous versions. The evaluation results show that PKUnity86 achieves Windows compatibility with small hardware overheads and no performance loss.
We propose a non-deterministic finite automata (NFA) based architecture for regexp scanners on fpga, called CES: the Character Class with Constraint Repetition (CCR) based regExp Scanner. CES is designed to realize a ...
详细信息
ISBN:
(纸本)9781605589114
We propose a non-deterministic finite automata (NFA) based architecture for regexp scanners on fpga, called CES: the Character Class with Constraint Repetition (CCR) based regExp Scanner. CES is designed to realize a new MIN-MAX counting algorithm, which can solve both the character class ambiguity problem and the overlapped matching problem. CES also supports non-regular Pert grammars such as zero-width pattern and back-reference We propose a CCR-syntax tree and its parsing scheme to map a Perl or POSIX regexp rule to a CES topology. The interconnection patterns, and operational parameters of CCR modules (CCRM), which are the building blocks of CES, can be easily configured by regular memory writes when regexp rules change, without re-synthesis of low-level logic. For implementation, character classes of CCRs are stored in Block RAMs. The MIN-MAX algorithm uses two counters MIN and MAX to resolve the character class ambiguity problem. Two checkpoint counters are employed to implement overlapped matching detection. CES topologies optimized for different types of rules can run in different Partial Reconfigurable Regions (PRR), and can be swapped on the fly by a PRR controller. We developed a tool chain to automate the CES implementation to a Virtex 5 LX110T device. This device can host up to 3000 CCRMs, and run at an estimated throughput of 1.996 Gbps in simulation, and 863 Mbps between a PC and the Virtex 5 board in real tests. The Snort and SpamAssassin rule sets can be parsed and mapped in milliseconds. Once a base CES architecture is synthesized, the physical reconfiguration of a CES on the Virtex 5 LX110T chip can be done in less than a second.
We present a hardware JPEG2000 decoder architecture based on the DCI specification, which can decode digital cinema frames without accessing any external memory. Besides, an innovative method is proposed to implement ...
详细信息
"Open Source", ubiquitous in the software community, has grown to become vital in application domains served by reconfigurable computing. But what exactly is "Open Source", and what are the values ...
详细信息
ISBN:
(纸本)9781605589114
"Open Source", ubiquitous in the software community, has grown to become vital in application domains served by reconfigurable computing. But what exactly is "Open Source", and what are the values and pitfalls it brings? This workshop draws together technologists from academia and industry to share their experiences, opinions, and lessons learned.
This paper presents the implementation of a high resolution time-to-digital converter (TDC) on a dynamically reconfigurable fpga. The TDC architecture is based on the Vernier method using two ring oscillators with sli...
详细信息
ISBN:
(纸本)9781605589114
This paper presents the implementation of a high resolution time-to-digital converter (TDC) on a dynamically reconfigurable fpga. The TDC architecture is based on the Vernier method using two ring oscillators with slightly different frequencies. The proposed oscillators can be calibrated with picoseconds resolution by taking advantage of partial reconfiguration, and moreover recalibrated over time. The results obtained on a Xilinx Virtex-II Pro fpga show that the proposed TDC implementation can achieve unprecedented resolutions (on fpga) as low as 5ps and precisions up to 25ps.
Specific architectures for different low level vision modalities have been developed and described using reconfigurable hardware. Each of them tries to solve a single low level vision problem: optical flow, disparity,...
详细信息
ISBN:
(纸本)9781424463916
Specific architectures for different low level vision modalities have been developed and described using reconfigurable hardware. Each of them tries to solve a single low level vision problem: optical flow, disparity, segmentation, tracking, etc. We introduce a novel architecture that includes multiple processing engines in a massively parallel low level vision processing engine of very high complexity and performance. Our design is able to process input images and extract at the same time different visual features such as multi-scale stereo, optical flow and local contrast descriptors such as local orientation, energy or phase. The latest hardware design techniques have been employed in order to achieve the presented system with more than 2000 basic processing elements running in parallel. We have based our system in a Harmonic filter image decomposition model based on Gabor-like filters. It has been validated in multiple scenarios in previous works and it allows sharing hardware resources among different vision modalities on the same chip. In this paper we present an fpga based implementation of this intensive processing engine as well as the designing techniques employed. The circuit processes input frames of 512x512 pixels at 28 frames per second.
The Triple Modular Redundancy (TMR) technology allows protection of the functionality of fpgas against single event upsets (SEUs). Each logic block is implemented three times with a 2-out-of-3 voter at the output. Thu...
详细信息
Decoding operation is one of the major performance bottlenecks in network coding applications. To address the problem caused by decoding delay, this paper proposes high-performance decoding logic on the field-programm...
详细信息
ISBN:
(纸本)9781605589114
Decoding operation is one of the major performance bottlenecks in network coding applications. To address the problem caused by decoding delay, this paper proposes high-performance decoding logic on the field-programmablegate-array (fpga). A Galois field arithmetic logic unit (GF ALU) is implemented with a full parallelization. We claim that the complexity of hardware is reduced by use of the log and anti-log tables. In addition, the fast arithmetic operation is achieved by the parallelized GF ALU architecture, which allows one-row-calculations of a matrix to be performed concurrently. The decoders for four different sizes of the coefficient matrix have been implemented while the degree of parallelism is preserved for each size. The performance is evaluated by comparing with the performance of the decoding operation both on the ARM processor emulator and a real ARM processor. Using a modern Xilinx Virtex-5 device, the decoding time of 3.5 ms for the size 16 x 16 and 190.5 ms for 128 x 128 has been achieved at the operating frequency of 50MHz, which is equal to 12.7 and 21.7 in terms of speedup.
暂无评论