Large-scale protein sequence comparison is an important but compute-intensive task in molecular biology. The popular BLASTP software for this task has become a bottleneck for proteomic database search. One third of th...
详细信息
ISBN:
(纸本)9781424410590
Large-scale protein sequence comparison is an important but compute-intensive task in molecular biology. The popular BLASTP software for this task has become a bottleneck for proteomic database search. One third of this software's time is spent executing the Smith-Waterman dynamic programming algorithm. This work describes a novel FPGA design for banded Smith-Waterman, an algorithmic variant tuned to the needs of BLASTP. This design has been implemented in Mercury BLASTP, our FPGA-accelerated version of the BLASTP algorithm. We show that Mercury BLASTP runs 6-16 times faster than software BLASTP on a modem CPU while delivering 99% identical results.
The latest published studies with extensive explorations of look-up table and cluster sizes are now more than a decade old. However, CMOS technology as well as CAD and transistor modeling tools have improved so much s...
详细信息
ISBN:
(纸本)9789090304281
The latest published studies with extensive explorations of look-up table and cluster sizes are now more than a decade old. However, CMOS technology as well as CAD and transistor modeling tools have improved so much since that it is reasonable to wonder whether the conclusions of such studies still hold. One of the major difficulties of conducting these studies, especially in academia, is producing credible delay and area models. In this paper, we take advantage of a recently developed architecture modeling tool to re-evaluate the effect of the various cluster parameters on the FPGA. We considerably extend the exploration space beyond that of the classic studies to include sparse crossbars and fracturable LUTs, and show some results that go against the current tenets of FPGA architecture.
A multi-threaded microprocessor with a customisable instruction set, CUStomisable Threaded ARchitecture (CUSTARD), is proposed. CUSTARD features include design space exploration and a compiler for automatic selection ...
详细信息
A multi-threaded microprocessor with a customisable instruction set, CUStomisable Threaded ARchitecture (CUSTARD), is proposed. CUSTARD features include design space exploration and a compiler for automatic selection of custom instructions. Custom instructions, optimised for a specific application, accelerate frequently performed computations by implementing them as dedicated hardware. fieldprogrammable gate array implementations of CUSTARD are evaluated using media and cryptography benchmarks, and commercial MicroBlaze processor is compared. As low as 28% area overhead for four interleaved threads and up to 355% speedup over a processor without custom instructions are demonstrated.
Many applications in image processing have high inherent parallelism. FPGAs have shown very high performance in spite of their low operational frequency by fully extracting the parallelism. In recent micro processors,...
详细信息
ISBN:
(纸本)9781424438914
Many applications in image processing have high inherent parallelism. FPGAs have shown very high performance in spite of their low operational frequency by fully extracting the parallelism. In recent micro processors, it also becomes possible to utilize the parallelism using multi-cores which support improved SIMD instructions, though programmers have louse them explicitly to achieve high performance. Recent GPUs support a large number of cores, and have a potential for high performance in many applications. However, the cores are grouped, and data transfer between the groups is very limited. Programming tools for FPGA, SIMD instructions on CPU and a large number of cores on GPU have been developed, but it is still difficult to achieve high performance on these platforms. In this paper, we compare the performance of FPGA, GPU and CPU using three applications in image processing;two-dimensional filters, stereo-vision and k-means clustering, and make it clear which platform is faster under which conditions.
In the last years FPGAs have become very important for electronic designs - they are very flexible, provide high configurability and allow short turn around times. Especially for Rapid Prototyping (RP) another feature...
详细信息
ISBN:
(纸本)9781424403127
In the last years FPGAs have become very important for electronic designs - they are very flexible, provide high configurability and allow short turn around times. Especially for Rapid Prototyping (RP) another feature plays an important rule: the nearly infinite reprogrammability. Now ever, handling these devices in the engineering process is not an easy issue. Therefore our approach presents an efficient, flexible and versatile FPGA configuration methodology based on partial bitstream merging at design time.
QR decomposition, especially through the means of Householder transformation, is often used to solve least squares problems. A matrix to be decomposed with this method is usually very large, often large enough that it...
详细信息
ISBN:
(纸本)9781424410590
QR decomposition, especially through the means of Householder transformation, is often used to solve least squares problems. A matrix to be decomposed with this method is usually very large, often large enough that it is not able to fit into the main memory of a workstation, let alone the internal memory of an FPGA nowadays. Efficient out-of-core algorithms have been developed to address the factorization of large matrices. This paper describes the application of variants of Householder QR decomposition on FPGA-based systems. More specifically, issues on applying out-of-core algorithms to the relatively small internal memory architecture of FPGA's are investigated.
This paper presents an implementation of a high-performance network application layer parser in FPGAs. At the core of the architecture resides a pattern matcher and a parser. The pattern matcher scans for patterns in ...
详细信息
ISBN:
(纸本)9781424403127
This paper presents an implementation of a high-performance network application layer parser in FPGAs. At the core of the architecture resides a pattern matcher and a parser. The pattern matcher scans for patterns in high-speed streaming TCP data streams. The parser core augments each pattern found with semantic information determined from the patterns location within the data stream. The packet payload parser can provide a higher level of understanding of a data stream for many network applications. Such applications include high performance XML parsers, content-based/aware routers, and others. Additionally, a TCP processor allows stateful packet payload parsing of up to 8 million simultaneous TCP flows. The payload parser has been implemented in a Xilinx Virtex E 2000 FPGA on the field-programmable Port Extender platform. The parsing module runs,at 200 MHz and parse raw data at 6.4 Gbps. The payload parser, integrated with the TCP processor, runs at 100 MHz for a throughput of 3.2 Gbps.
In this paper we present the design and the implementation of an FPGA-based floating-point adder with three inputs. The design is based on a 5-level pipeline stage in order to distribute the critical paths and to maxi...
详细信息
ISBN:
(纸本)9781424419609
In this paper we present the design and the implementation of an FPGA-based floating-point adder with three inputs. The design is based on a 5-level pipeline stage in order to distribute the critical paths and to maximize the performance. We examine the data dependencies to minimize the number of the pipeline stages and to reduce the resource allocation. Our design is parameterisable in order to cope with different floating-point formats, including the standard IEEE 754 formats and the custom configurations. The proposed design with the single precision, 32-bit floating-point format, can be operated at 143 MHz on Xilinx Virtex2Pro XC2VP30-7.
This short paper describes a remote laboratory facility for Platform FPGA education. With the addition of an inexpensive piece of hardware, many commercial off-the-shelf FPGA development boards can be made suitable fo...
详细信息
ISBN:
(纸本)9781424419609
This short paper describes a remote laboratory facility for Platform FPGA education. With the addition of an inexpensive piece of hardware, many commercial off-the-shelf FPGA development boards can be made suitable for use in a remote laboratory. The hardware and software required to implement a remote laboratory has been developed and a remote laboratory facility deployed at the University of North Carolina at Charlotte. Advantages, concerns, and actual costs are reported. The experience of using this facility in a senior/first-year graduate-level Platform FPGA course is also described. Although these data are preliminary, survey results and first-hand experience with the laboratory were very encouraging and suggests that further studies on student learning are warranted.
Block matching motion estimation takes a great part of the processing time for video encoding. To accelerate this process is must to reach real time video coding. The best motion vector is obtained by full-search bloc...
详细信息
ISBN:
(纸本)9781424403127
Block matching motion estimation takes a great part of the processing time for video encoding. To accelerate this process is must to reach real time video coding. The best motion vector is obtained by full-search block matching algorithm which has to be usually implemented by hardware. In recent years, several FPGA based designs have been proposed since these devices support high number of process elements in parallel mode. In this paper a survey, of recent architectures to perform the full-search block matching algorithm in FPGAs is presented. A further comparison on terms of frames per second reached, hardware cost in CLB slices and system frequency is presented.
暂无评论