Compared to the use of a typical software development flow, the productivity of developing fpga-based compute applications remains much lower. Although the use of high-level synthesis (HLS) tools may partly alleviate ...
详细信息
ISBN:
(纸本)9780769549699;9781467360050
Compared to the use of a typical software development flow, the productivity of developing fpga-based compute applications remains much lower. Although the use of high-level synthesis (HLS) tools may partly alleviate this shortcoming, the lengthy low-level fpga implementation process remains a major obstacle to high productivity computing, limiting the number of compile-debug-edit cycles per day. Furthermore, high-level application developers often lack the intimate hardware engineering experience that is needed to achieve high performance on fpgas, therefore undermining their usefulness as accelerators. To address these productivity and performance problems, a high-level synthesis methodology that utilizes soft coarse-grained reconfigurable arrays (SCGRAs) as an intermediate compilation step is presented. Instead of compiling high-level applications directly as circuits implemented on the fpga, the compilation process is reduced to an operation scheduling task targeting the SCGRA. Furthermore, the softness of the SCGRA allows domain-specific design of the processing elements, while allowing highly optimized SCGRA array be developed by a separate hardware design team. An SCGRA operating at over 400MHz on a commercial fpga is presented here. When compared to commercial high-level synthesis tools, the proposed design methodology achieved 0.8-21x times speedup in the application run time while application compilation time is reduced by 10-100x.
Numerous application areas, including bioinformatics and computational biology, demand increasing amounts of processing capability. In many cases, the computation cores and data types are suited to field-programmable ...
详细信息
Numerous application areas, including bioinformatics and computational biology, demand increasing amounts of processing capability. In many cases, the computation cores and data types are suited to field-programmable gate arrays. The challenge is identifying the design techniques that can extract high performance potential from the fpga fabric.
Screening is an important task to convert a continuous-tone image into a binary image with pure black and white pixels. The main contribution of this paper is to show a new algorithm for cluster-dot screening using a ...
详细信息
Screening is an important task to convert a continuous-tone image into a binary image with pure black and white pixels. The main contribution of this paper is to show a new algorithm for cluster-dot screening using a local exhaustive search. Our new algorithm generates 2-cluster, 3-cluster, and 4-cluster binary images, in which all dots have at least 2, 3, and 4 pixels, respectively. The key idea of our new screening method is to repeat a local exhaustive search that finds the best binary pattern in small windows of size k x k in a binary image. The experimental results show that the local exhaustive search produces high quality and sharp cluster-dot binary images. We also present an hardware algorithm to accelerate the computation. Our hardware algorithm for a round of the local exhaustive search runs O(k(2)) clock cycles while the software implementation runs in O(2(k2) w(2)) time, where (2w + 1) x (2w + 1) is the size of Gaussian filter. Thus, from theoretical point of view, our hardware algorithm achieves a speedup factor of O(w(2)). To show that our hardware algorithm is practically fast, we have implemented it on an fpga. Our hardware algorithm achieved a speedup factor of up to 229 over the software implementation.
The main contribution of this paper is to show a new approach for FM screening which we call Local Exhaustive Search (LES) method, and to present ways to accelerate the computation using an fpga. FM screening, as oppo...
详细信息
The main contribution of this paper is to show a new approach for FM screening which we call Local Exhaustive Search (LES) method, and to present ways to accelerate the computation using an fpga. FM screening, as opposed to conventional AM screening, keeps unit dot size when converting an original gray-scale image into the binary image for printing. FM screening pays great attention to generate moire-free binary images reproducing continuous-tone and fine details of original photographic images. Our basic approach for FM screening is to generate a binary image whose projected image onto human eyes is very close to the original image. The projected image is computed by applying a Gaussian filter to the binary image. LES performs an exhaustive search for each of the small square subimages in the binary image and replaces the subimage by the best binary pattern. The exhaustive search is repeated until no more improvement is possible. The experimental results show that LES produces a high quality and sharp binary image. We also implemented LES on an fpga to accelerate the computation and achieved a speedup factor of up to 51 over the software implementations.
The main contribution of this work is to present several hardware implementations of an "n choose k" counter (C(n, k) counter for short), which lists all n-bit numbers with (n - k) 0's and k 1's, and...
详细信息
The main contribution of this work is to present several hardware implementations of an "n choose k" counter (C(n, k) counter for short), which lists all n-bit numbers with (n - k) 0's and k 1's, and to show their applications. We first present concepts of C(n, k) counters and their efficient implementations on an fpga. We then go on to evaluate their performance in terms of the number of used slices and the clock frequency for the Xilinx VirtexII family fpga XC2V3000-4. As one of the real life applications, we use a C(n, k) counter to accelerate a digital halftoning method that generates a binary image reproducing an original gray-scale image. This method repeatedly replaces an image pattern in small square regions of a binary image by the best one. By the partial exhaustive search using a C(n, k) counter we succeeded in accelerating the task of finding the best image pattern and achieved a speedup factor of more than 2.5 over the simple exhaustive search.
暂无评论