Large-scale protein sequence comparison is an important but compute-intensive task in molecular biology. the popular BLASTP software for this task has become a bottleneck for proteomic database search. One third of th...
详细信息
ISBN:
(纸本)9781424410590
Large-scale protein sequence comparison is an important but compute-intensive task in molecular biology. the popular BLASTP software for this task has become a bottleneck for proteomic database search. One third of this software's time is spent executing the Smith-Waterman dynamic programming algorithm. this work describes a novel FPGA design for banded Smith-Waterman, an algorithmic variant tuned to the needs of BLASTP. this design has been implemented in Mercury BLASTP, our FPGA-accelerated version of the BLASTP algorithm. We show that Mercury BLASTP runs 6-16 times faster than software BLASTP on a modem CPU while delivering 99% identical results.
QR decomposition, especially through the means of Householder transformation, is often used to solve least squares problems. A matrix to be decomposed withthis method is usually very large, often large enough that it...
详细信息
ISBN:
(纸本)9781424410590
QR decomposition, especially through the means of Householder transformation, is often used to solve least squares problems. A matrix to be decomposed withthis method is usually very large, often large enough that it is not able to fit into the main memory of a workstation, let alone the internal memory of an FPGA nowadays. Efficient out-of-core algorithms have been developed to address the factorization of large matrices. this paper describes the application of variants of Householder QR decomposition on FPGA-based systems. More specifically, issues on applying out-of-core algorithms to the relatively small internal memory architecture of FPGA's are investigated.
A high speed FPGA off-loading engine for detecting the license plate itself in order to avoid the traffic accident is proposed. A complicated algorithm is written in Handel-C, and parallel processing is explicitly uti...
详细信息
ISBN:
(纸本)9781424410590
A high speed FPGA off-loading engine for detecting the license plate itself in order to avoid the traffic accident is proposed. A complicated algorithm is written in Handel-C, and parallel processing is explicitly utilized in every level of implementation;an input image is segmented into 16 areas, and each area is processed in parallel by a multiple calculation unit executing pipeline processing and a distributed memory module. A prototype circuit implemented on a general purpose FPGA board achieved 4.16 times performance as software execution on a Pentium-III desktop PC. the highest performance in literature;100 frames per second;can be achieved.
FPGA CAD tools require wirelength predictions to make informed decisions through clustering, placement and routing stages towards power, area or delay based design goals. Unfortunately, there has been minimal work dev...
详细信息
ISBN:
(纸本)9781424410590
FPGA CAD tools require wirelength predictions to make informed decisions through clustering, placement and routing stages towards power, area or delay based design goals. Unfortunately, there has been minimal work devoted to estimating individual wirelengths early in the CAD flow. Rent's rule can be used to generate a wirelength distribution but cannot be used to predict lengths of individual wires. Hence, this paper explores "structural metrics" that have been found to possess strong predictive qualities in the ASIC domain. To our knowledge this is a first study in the application of these metrics in the FPGA CAD flow. Results show that the studied metrics capture characteristics of placement optimization carried out by VPR, and hence, are good indicators of post-placement wirelengths.
In recent years, IP protection of FPGA hardware designs has become a requirement for many IP vendors. To this end solutions have been proposed based on the idea of bitstream encryption, symmetric-key primitives, and t...
详细信息
ISBN:
(纸本)9781424410590
In recent years, IP protection of FPGA hardware designs has become a requirement for many IP vendors. To this end solutions have been proposed based on the idea of bitstream encryption, symmetric-key primitives, and the use of Physical Unclonable Functions (PUFs). In this paper, we propose new protocols for the IP protection problem on FPGAs based on public-key (PK) cryptography, analyze the advantages and costs of such an approach, and describe a PUF intrinsic to cur-rent FPGAs based on SRAM properties. A major advantage of using PK-based protocols is that they do not require the private key stored in the FPGA to leave the device, thus increasing security. this added security comes at the cost of additional hardware resources but it does not cause significant performance degradation.
this paper introduces a software supported methodology for exploring/evaluating 3D FPGA architectures. Two new CAD tools are developed: (i) the 3DPRO for placement and routing on 3D FPGAs and (ii) the 3DPower for powe...
详细信息
ISBN:
(纸本)9781424410590
this paper introduces a software supported methodology for exploring/evaluating 3D FPGA architectures. Two new CAD tools are developed: (i) the 3DPRO for placement and routing on 3D FPGAs and (ii) the 3DPower for power/energy estimation on such architectures. We mainly focus our exploration on the total number of layers and the amount of vertical interconnects (or vias). the efficiency of the proposed architecture is evaluated by making an exhaustive exploration for via connections under the EnergyxDelay Product criterion. Experimental results demonstrate the effectiveness of our solution, considering the 20 largest MCNC benchmarks. Considering 3D architectures with 4 layers and two scenarios of fabricated via densities (30% and 70%), we achieve an average decrease in the delay, the wire length, and the energy consumption of 18%, 17%, and 31%, respectively, as compared to 2D FPGAs. We also achieved high utilization of vias links.
An integrated platform for fast genetic operators is presented to support intrinsic evolution on Xilinx Virtex II Pro fieldprogrammable Gate Arrays (FPGAs). Dynamic bitstream compilation is achieved by directly manip...
详细信息
ISBN:
(纸本)9781424410590
An integrated platform for fast genetic operators is presented to support intrinsic evolution on Xilinx Virtex II Pro fieldprogrammable Gate Arrays (FPGAs). Dynamic bitstream compilation is achieved by directly manipulating the bitstream using a layered design. Experimental results on a case study have shown that a full design as well as a full repair is achievable using this platform with an average time of 0.4 microseconds to perform the genetic mutation, 0.7 microseconds to perform the genetic crossover, and 5.6 milliseconds for one input pattern intrinsic evaluation. this represents a performance advantage of three orders of magnitude over JBITS and more than seven orders of magnitude over the Xilinx design tool driven flow for realizing intrinsic genetic operators on a Virtex 11 Pro device.
A method is described for enumerating the frequencies of DNA subsequences on a system comprising a host computer and a fieldprogrammable gate array (FPGA) board with one FPGA. Frequencies of subsequences with lengths...
详细信息
ISBN:
(纸本)9781424410590
A method is described for enumerating the frequencies of DNA subsequences on a system comprising a host computer and a fieldprogrammable gate array (FPGA) board with one FPGA. Frequencies of subsequences with lengths of up to K-0 K-1 K-2 (24 in the current implementation) are enumerated in three phases. In these three phases, subsequences with lengths of up to K-0, K (0) K-1, and K-0 K-1 K-2, respectively, are enumerated;these three phases are executed simultaneously on a pipelined circuit, resulting in high performance. the enumeration of frequent subsequences in databases, which are becoming larger and larger, will enable subsequences that are unique and/or repeatedly used in many parts of the sequences to be found.
A new scalable systolic hardware architecture for RSA cryptosystems is presented. the kernel of the architecture can operate with different precision of inputs which enables making area-time tradeoff in design. the ad...
详细信息
ISBN:
(纸本)9781424410590
A new scalable systolic hardware architecture for RSA cryptosystems is presented. the kernel of the architecture can operate with different precision of inputs which enables making area-time tradeoff in design. the add-shift Montgomery algorithm is used for modular multiplication. Unlike previous approaches after add operation, the result is shifted to the previous systole to divide by radix. this simplifies the structure of processing elements. the R-L binary Montgomery exponentiation algorithm is used. the square and multiply operations are performed in parallel. the architecture is implemented in Xilinx Virtex-5 FPGA (fieldprogrammable Gate Array) chips for different radixes. the DSP48E slices in the FPGA chips are used to increase the throughput of the design. the results are compared withthe literature. It is seen that the highest performance per area is obtained withthe Radix-2(16) design.
In this paper we present a new hardware design pattern for improving memory transfers to external dynamic memory in Altera's SOPC-builder tool by reusing the standard DMA IP core for all bulk memory transfers with...
详细信息
ISBN:
(纸本)9781424410590
In this paper we present a new hardware design pattern for improving memory transfers to external dynamic memory in Altera's SOPC-builder tool by reusing the standard DMA IP core for all bulk memory transfers without the need for a CPU. the presented approach doubles the data throughput without the need for extra system resources. In addition it is more effective for choosing optimal clock settings for the different components of the system on a programmable chip. the benefits and limitations of this new approach are illustrated with a real world example: a bitplane assembler for scalable wavelet based video. the new design is. times faster withthe same clock settings as the original design and uses about 100 logic elements less. Applying our new approach also has a positive impact on energy consumption.
暂无评论