Latency insensitive communication oers many potential benets for FPGA designs, including easier timing closure by enabling automatic pipelining, and easier interfacing with embedded NoCs. However, it is important to u...
详细信息
ISBN:
(纸本)9781450326711
Latency insensitive communication oers many potential benets for FPGA designs, including easier timing closure by enabling automatic pipelining, and easier interfacing with embedded NoCs. However, it is important to understand the costs and trade-os associated with any new design style. This paper presents optimized implementations of latency insensitive communication building blocks, quanties their overheads in terms of area and frequency, and provides guidance to designers on how to generate high-speed and areae cient latency insensitive systems.
The increasing computational power enables various new applications that are runtime prohibitive before. FPGA is one of such computational power with both reconfigurability and energy efficiency. In this paper, we dem...
详细信息
ISBN:
(纸本)9781450343541
The increasing computational power enables various new applications that are runtime prohibitive before. FPGA is one of such computational power with both reconfigurability and energy efficiency. In this paper, we demonstrate the feasibility of eyeglasses-free displays through FPGA acceleration. Specifically, we propose several techniques to accelerate the sparse matrix-vector multiplication and the L-BFGS iterative optimization algorithm with the consideration of the characteristics of FPGAs. The experimental results show that we reach a 12.78X overall speedup of the glass-free display application.
Locality exploitation is essential to asymptotic energy minimization for gate array netlist evaluation. Naive implementations that ignore locality, including flat crossbars and simple processors based on monolithic me...
详细信息
While reconfigurable computing promises to deliver incomparable performance, it is still a marginal technology due to the high cost of developing and upgrading applications. Hardware virtualization can be used to sign...
详细信息
ISBN:
(纸本)9780897919784
While reconfigurable computing promises to deliver incomparable performance, it is still a marginal technology due to the high cost of developing and upgrading applications. Hardware virtualization can be used to significantly reduce both these costs. In this paper we describe the benefits of hardware virtualization, and show how it can be achieved using a combination of pipeline reconfiguration and run-time scheduling of both configuration streams and data streams. The result is PipeRench, an architecture that supports robust compilation and provides forward compatibility. Our preliminary performance analysis predicts that PipeRench will outperform commercial FPGAs and DSPs in both overall performance and in performance per mm2.
A fundamental feature of Dynamically Reconfigurable FPGAs (DRFPGAs) is that the logic and interconnect is time-multiplexed. Thus for a circuit to be implemented on a DRFPGA, it needs to be partitioned such that each s...
详细信息
A fundamental feature of Dynamically Reconfigurable FPGAs (DRFPGAs) is that the logic and interconnect is time-multiplexed. Thus for a circuit to be implemented on a DRFPGA, it needs to be partitioned such that each subcircuit can be executed at a different time. In this paper, the partitioning of sequential circuits for execution on a DRFPGA is studied. To determine how to correctly partition a sequential circuit, and what are the costs in doing so, we propose a new gate-level model that handles time-multiplexed computation. We also introduce an enhanced force directed scheduling (FDS) algorithm to partition sequential circuits that finds a correct partition with low logic and communication costs, under the assumption that maximum performance is desired. We use our algorithm to partition seven large ISC AS'89 sequential benchmark circuits. The experimental results show that the enhanced FDS reduces communication costs by 27.5% with only a 1.1% increase in the gate cost compared to traditional FDS.
We present in this paper the first reported FPGA implementation of the Position Specific Iterated BLAST (PSI-BLAST) algorithm. The latter is a heuristic biological sequence alignment algorithm that is widely used in t...
详细信息
ISBN:
(纸本)9781605584102
We present in this paper the first reported FPGA implementation of the Position Specific Iterated BLAST (PSI-BLAST) algorithm. The latter is a heuristic biological sequence alignment algorithm that is widely used in the bioinformatics and computational biology world in order to detect weak homologs. The architecture of our FPGA implementation is parameterized in terms of sequence lengths, scoring matrix, gap penalties and cut-off and threshold values. It is composed of various blmocks each of which performs one step of the algorithm in parallel. This results in high performance implementations, which easily outperform equivalent software implementations by one order of magnitude or more. Furthermore, the core was captured in an FPGA-platformindependent language, namely the Handel-C language, to which no specific resource inference or placement constraints were applied. This makes our core portable across different FPGA families and architectures. Copyright 2009 acm.
The aim of this paper is to propose a real time reconfigurable (RTR) micro-FPGA using new non volatile memory. Magnetic tunneling junctions (MTJ) used in Magnetic random access memories (MRAM.) are compatible with cla...
详细信息
ISBN:
(纸本)1595932925
The aim of this paper is to propose a real time reconfigurable (RTR) micro-FPGA using new non volatile memory. Magnetic tunneling junctions (MTJ) used in Magnetic random access memories (MRAM.) are compatible with classical CMOS processes. Moreover remanent property of such a memory could limit configuration time and power consumption required at each power up of the die. Nevertheless, each configuration memory point has to be readable independently from each other, that is why the approach is different from the classical memory array one. Copyright 2006 acm.
Carry chains are an important consideration for most computations, including FPGAs. Current FPGAs dedicate a portion of their logic to support these demands via a simple ripple carry scheme. In this paper we demonstra...
详细信息
ISBN:
(纸本)9780897919784
Carry chains are an important consideration for most computations, including FPGAs. Current FPGAs dedicate a portion of their logic to support these demands via a simple ripple carry scheme. In this paper we demonstrate how more advanced carry constructs can be embedded into FPGAs, providing significantly higher performance carry computations. We redesign the standard ripple carry chain to reduce the number of logic levels in each cell. We also develop entirely new carry structures based on high performance adders such as Carry Select, Carry Lookahead, and Brent-Kung. Overall, these optimizations achieve a speedup in carry performance of 3.8 times over current architectures.
In current countermeasure design trends against differential power analysis (DPA), security at gate level is required in addition to the security algorithm. Several dual-rail pre-charge logics (DPL) have been proposed...
详细信息
ISBN:
(纸本)9781450311557
In current countermeasure design trends against differential power analysis (DPA), security at gate level is required in addition to the security algorithm. Several dual-rail pre-charge logics (DPL) have been proposed to achieve this goal. Designs using ASIC can attain this goal owing to its backend design restrictions on placement and routing. However, implementing these designs on fieldprogrammablegatearrays (FPGA) without information leakage is still a problem because of the difficulty involved in the restrictions on placement and routing on FPGA. This paper describes our novel masked dual-rail pre-charged memory approach, called "intra-masking dual-rail memory on LUT," and its implementation on FPGA for tamper-resistant AES. In the proposed design, all unsafe nodes, such as unmasking and masking, and the dual-rail memory and buses are packed into a single LUT. This makes them balanced and independent of the placement and routing tools. The design is independent of the cryptographic algorithm, and hence, it can be applied to available cryptographic standards such as DES or AES as well as future standards. It requires no special placement or route constraints in its implementation. A correlation power analysis (CPA) attack on 1,000,000 traces of AES implementation on FPGA showed that the secret information is well protected against first-order side-channel attacks. Even though the number of LUTs used for memory in this implementation is seven times greater than that of the conventional unprotected single-rail memory table-lookup AES and three times greater than the implementation based on a composite field, it requires a smaller number of LUTs than all other advanced tamper-resistant implementations such as the wave dynamic differential logic, masked dual-rail pre-charge logic, and threshold.
fieldprogrammablegatearrays (FPGAs) are an increasingly popular choice of platform for the implementation of cryptographic systems. Until recently, designers using FPGAs had less than optimal choices for a source o...
详细信息
fieldprogrammablegatearrays (FPGAs) are an increasingly popular choice of platform for the implementation of cryptographic systems. Until recently, designers using FPGAs had less than optimal choices for a source of truly random bits. In this paper we extend a technique that uses on-chip jitter and PLLs to a much larger class of FPGAs that do not contain PLLs. Our design uses only the Configurable Logic Blocks (CLBs) common to all FPGAs, and has a self-testing capability. Using the intrinsic jitter contained in digital circuits, we produce random bits at speeds of up to 0.5 Mbits/second with good statistical characteristics. We discuss the engineering challenges of extracting random bits from digital circuits, and we report the results of running standard statistical tests (NIST) on the output generated by our system.
暂无评论