Large-scale protein sequence comparison is an important but compute-intensive task in molecular biology. The popular BLASTP software for this task has become a bottleneck for proteomic database search. One third of th...
详细信息
ISBN:
(纸本)9781424410590
Large-scale protein sequence comparison is an important but compute-intensive task in molecular biology. The popular BLASTP software for this task has become a bottleneck for proteomic database search. One third of this software's time is spent executing the Smith-Waterman dynamic programming algorithm. This work describes a novel FPGA design for banded Smith-Waterman, an algorithmic variant tuned to the needs of BLASTP. This design has been implemented in Mercury BLASTP, our FPGA-accelerated version of the BLASTP algorithm. We show that Mercury BLASTP runs 6-16 times faster than software BLASTP on a modem CPU while delivering 99% identical results.
Hard disk storage capacity has continued to rise whilst at the same time the cost per megabyte continues to fall. This, combined with increased usage of digital storage for documents, photography and video for both ho...
详细信息
ISBN:
(纸本)9781424403127
Hard disk storage capacity has continued to rise whilst at the same time the cost per megabyte continues to fall. This, combined with increased usage of digital storage for documents, photography and video for both home and business use has led to increased need for reliable data storage system. Redundant arrays of inexpensive disks (RAID) have proven to offer the best characteristics for reliable storage. However, to date RAID based systems have been limited, due to cost and circuit complexity, by their support for only single disk erasure tolerance. FPGAs allow us to overcome these difficulties and allow support for more complex storage algorithms. This paper introduces an efficient FPGA based hardware RAID 6 accelerator providing uninterrupted access during all single and double disk erasures and recovery.
An integrated platform for fast genetic operators is presented to support intrinsic evolution on Xilinx Virtex II Pro fieldprogrammable Gate Arrays (FPGAs). Dynamic bitstream compilation is achieved by directly manip...
详细信息
ISBN:
(纸本)9781424410590
An integrated platform for fast genetic operators is presented to support intrinsic evolution on Xilinx Virtex II Pro fieldprogrammable Gate Arrays (FPGAs). Dynamic bitstream compilation is achieved by directly manipulating the bitstream using a layered design. Experimental results on a case study have shown that a full design as well as a full repair is achievable using this platform with an average time of 0.4 microseconds to perform the genetic mutation, 0.7 microseconds to perform the genetic crossover, and 5.6 milliseconds for one input pattern intrinsic evaluation. This represents a performance advantage of three orders of magnitude over JBITS and more than seven orders of magnitude over the Xilinx design tool driven flow for realizing intrinsic genetic operators on a Virtex 11 Pro device.
The traditional approach to FPGA packing and CLB-level placement has been shown to yield significantly worse quality than approaches which allow BLES to move during placement. In practice, however, modern FPGA archite...
详细信息
ISBN:
(纸本)9781424410590
The traditional approach to FPGA packing and CLB-level placement has been shown to yield significantly worse quality than approaches which allow BLES to move during placement. In practice, however, modern FPGA architectures require expensive DRC checks which can render full BLE-level placement impractical. We address this problem by proposing a novel clustering framework that uses physical information to produce better initial packings which can, in turn, reduce the amount Of BLE-level placement that is required. We quantify our packing technique across accepted benchmarks and show that it produces results with 16% less wire length, 19% smaller minimum channel widths, and 8% less critical path delay, on average, than leading methods.
This work shows a modular architecture based on FPGA's to solve the eigenvalue problem according to the Jacobi method. This method is able to solve the eigenvalues and eigenvectors concurrently. The main contribut...
详细信息
ISBN:
(纸本)9781424403127
This work shows a modular architecture based on FPGA's to solve the eigenvalue problem according to the Jacobi method. This method is able to solve the eigenvalues and eigenvectors concurrently. The main contribution of this work is the low execution time compared with other sequential algorithms, and minimal internal FPGA consumed resources, mainly due to the fact of using the CORDIC algorithm. Two CORDIC modules have been designed to solve the trigonometric operations involved. A parallel CORDIC architecture is proposed as it is the best option to compute the eigenvalues with this method. Both CORDIC modules can work in rotation and vector mode. The whole system has been done in VHDL language, attempting to optimize the design.
This short paper describes a remote laboratory facility for Platform FPGA education. With the addition of an inexpensive piece of hardware, many commercial off-the-shelf FPGA development boards can be made suitable fo...
详细信息
ISBN:
(纸本)9781424419609
This short paper describes a remote laboratory facility for Platform FPGA education. With the addition of an inexpensive piece of hardware, many commercial off-the-shelf FPGA development boards can be made suitable for use in a remote laboratory. The hardware and software required to implement a remote laboratory has been developed and a remote laboratory facility deployed at the University of North Carolina at Charlotte. Advantages, concerns, and actual costs are reported. The experience of using this facility in a senior/first-year graduate-level Platform FPGA course is also described. Although these data are preliminary, survey results and first-hand experience with the laboratory were very encouraging and suggests that further studies on student learning are warranted.
Microarchitecture optimization for processor design is a must to achieve target system performance. Provided the register transfer level (RTL) model in real chip design, this paper proposes MOFPGA system, which uses f...
详细信息
ISBN:
(纸本)9781467381239
Microarchitecture optimization for processor design is a must to achieve target system performance. Provided the register transfer level (RTL) model in real chip design, this paper proposes MOFPGA system, which uses fieldprogrammable gate array (FPGA) prototyping as an effective method for fine-grain microarchitecture optimization. It is a fast, reconfigurable, and visible platform with zero impact on the performance of the monitored processor. MOFPGA implements a complete computing platform equipped with a modern out-of-order processor and is able to achieve 60 MHz processor frequency. Besides general FPGA implementation techniques such as multi-port SRAM design and gate-clock conversion, extensive optimization efforts are done to improve the FPGA performance of mapping such a large core. To our knowledge, MOFPGA is the first published FPGA system that implements a modern out-of-order processor running at such high frequency and can report the real SPEC CPU2000 evaluation results.
In this paper, we present the ReCoBus-Builder tool chain that simplifies the generation of dynamically reconfigurable systems to almost a push-button process. The generated systems provide one or more resource areas t...
详细信息
ISBN:
(纸本)9781424419609
In this paper, we present the ReCoBus-Builder tool chain that simplifies the generation of dynamically reconfigurable systems to almost a push-button process. The generated systems provide one or more resource areas that will be used by different partially reconfigurable modules at runtime. It is possible to integrate multiple partially reconfigurable modules into the same resource area at the same time and these modules can communicate via a fixed bus infrastructure or dedicated point-to-point links with other parts of the system. This allows building encapsulated modules that will be integrated into the system by linking together bitstreams at runtime. We will demonstrate that bitstream linking can further be used to speed up the design process of static only systems by eliminating long synthesis runs or place and route steps, when only small portions of a design are exchanged.
This paper presents a compact and FPGA based implementation of the AES encryption standard, specifically designed for processing two independent 128-bit input blocks in feedback modes. This configuration is particular...
详细信息
ISBN:
(纸本)9781467381239
This paper presents a compact and FPGA based implementation of the AES encryption standard, specifically designed for processing two independent 128-bit input blocks in feedback modes. This configuration is particularly focused on the Counter with CBC-MAC Protocol, but can also be adapted to other AES based encryption-authentication protocols requiring the processing of two independent data streams. Most of the state of the art solutions implementing CCMP consider large datapaths, sometimes with separated encryption datapaths for the different data streams, leading to low resource efficiency. The work herein proposed suggests that with adequate FPGA component usage and with proper data scheduling a very compact and efficient dual AES core can be derived particularly on FPGAs. Overall, the proposed structure allows for a throughput of 1.7Gbps while achieving a Throughput/Slice efficiency of 24.22 Mbps/Slice, 47% higher than the existing related state of the art.
FPGA designs often contain significant amounts of logic such as a board support package that remains unaltered throughout the design process. However, during normal operation, standard FPGA implementation tools re-imp...
详细信息
ISBN:
(纸本)9781479900046
FPGA designs often contain significant amounts of logic such as a board support package that remains unaltered throughout the design process. However, during normal operation, standard FPGA implementation tools re-implement the entire system, including the unchanged logic, adding to the turn around time of design iterations. Recently, FPGA implementation flows have appeared that allow preserving parts of a previously implemented design. In this study, we evaluate the potential speedups in implementation time achievable through preserving the unchanging portion of a design's implementation. We perform these evaluations using Xilinx Partitions, Xilinx SmartGuide, and the HMFlow rapid implementation tool.
暂无评论