A polynomic curve based representation system is a three-dimensional system for rendering images using polynomic curves instead of triangles meshes. This system allows render a scene in a way that the computational co...
详细信息
ISBN:
(纸本)9781424419920
A polynomic curve based representation system is a three-dimensional system for rendering images using polynomic curves instead of triangles meshes. This system allows render a scene in a way that the computational cost depends on the image size more than on the scene complexity. Moreover, It allows the user to describe a scene using a minimal amount of data in comparison with traditional methods. According to that, we have implemented a hardware that demonstrates how this kind of applications can be accelerated in a huge grade using the proper solution. The hardware was implemented on a FPGA;thanks to it, the execution times were very reduced, showing a very promising speedup compared against an only software solution.
Advances in FPGA technologies allow designing highly complex systems using on-chip FPGA resources and intellectual property (IP) cores. Furthermore, it is possible to build multiprocessor systems using hard-core or so...
详细信息
ISBN:
(纸本)9781424406067
Advances in FPGA technologies allow designing highly complex systems using on-chip FPGA resources and intellectual property (IP) cores. Furthermore, it is possible to build multiprocessor systems using hard-core or soft-core processors increasing the range of applications that can be implemented on an FPGA. This paper presents an implementation of a symmetric multiprocessor (SMP) system on an FPGA using a vendor provided soft-core processor and a new set of software libraries specially developed for writing applications for this kind of systems. Experimental results show how this approach can improve performance of parallelizable software applications.
This paper shows that, under certain conditions, digital arithmetical circuits do not meet the addition commutation property in terms of power consumption. That is, the power consumed by the operation AxB is different...
详细信息
ISBN:
(纸本)9781424406067
This paper shows that, under certain conditions, digital arithmetical circuits do not meet the addition commutation property in terms of power consumption. That is, the power consumed by the operation AxB is different from BxA. As a consequence, it is possible to get a power saving simply permuting the circuit inputs, wherever any of the following three conditions are present: a) The data to be processed has a strong temporal correlation;b) The delays between the circuit paths are highly unequalized;c) One of the input data communication is broadcast type, meanwhile the other is local. In order to verify these hypotheses, several binary multipliers were constructed and measured. The power consumption reduction resulted between 12% and 28% in Virtex FPGAs.
This work presents a novel, accurate, and fast post-layout logic perturbation method for improving LUT-based FPGA routing, without affecting the placement. The ATPG-based rewiring techniques are used to design the Rew...
详细信息
ISBN:
(纸本)9781424406067
This work presents a novel, accurate, and fast post-layout logic perturbation method for improving LUT-based FPGA routing, without affecting the placement. The ATPG-based rewiring techniques are used to design the Rewiring engine, which is embedded into VPR, the most powerful academic FPGA CAD tool currently. Compared with VPR's high-quality results, our method can reduce critical path delay by up to 31.74% (avg. 10%) without disturbing placement or sacrificing area. The CPU time used by the Rewiring engine is only 5% of the total time consumed by VPR's placement and routing. All the benchmark circuits can be placed and routed within 3 minutes, which is much faster than the SPFD approach. This paper also analyzes the power of the ATPG-based rewiring techniques in LUT-based FPGAs. Experimental results show that 3% of all nets can be replaced by their alternative wires for FPGA performance improvement.
A significant development in the history of semiconductor devices is the invention of programmablelogic devices. The intent of this paper is to take a comprehensive look into the world of programmablelogic as one of...
详细信息
A significant development in the history of semiconductor devices is the invention of programmablelogic devices. The intent of this paper is to take a comprehensive look into the world of programmablelogic as one of the semi-custom alternatives facing the systems designer.
In this paper, we investigate the mechanism of soft error generation and propagation in asynchronous circuits which are implemented on FPGAs. The effects of the soft errors on Quasi-delay-insensitive (QDI) asynchronou...
详细信息
Packet classification is a kernel application performed at network routers. Many classification engines are optimized for prefix and exact match, while a range-to-prefix translation can lead to rule set expansion. Und...
详细信息
ISBN:
(纸本)9781467381239
Packet classification is a kernel application performed at network routers. Many classification engines are optimized for prefix and exact match, while a range-to-prefix translation can lead to rule set expansion. Under limited power budget, it is challenging to achieve high classification throughput. In this paper, we present a high-performance and power-efficient packet classification engine on FPGA. We construct a modular Processing Element ( PE);each PE compares a stride of the input packet header against a stride of a range boundary. We concatenate multiple PEs into a systolic array. Efficient power optimization techniques including self-enabled power gating and entropy-based scheduling are explored on our architecture. Experimental results show that, for 4 K 15-field rule sets, our prototype on a state-of-the-art FPGA can achieve 250 Million Packets Per Second (MPPS) throughput. Using the proposed power optimization techniques, our classification engine consumes 30 % of the power without sacrificing the throughput.
Homomorphic Encryption (HE) is a promising technique to guarantee the security and privacy of Machine Learning (ML) applications in the cloud. Rotation is a key operation in HE ML;however, the high computational compl...
详细信息
ISBN:
(纸本)9798350341515
Homomorphic Encryption (HE) is a promising technique to guarantee the security and privacy of Machine Learning (ML) applications in the cloud. Rotation is a key operation in HE ML;however, the high computational complexity and memory bandwidth requirements severely limit its performance. This work proposes a low-latency HE rotation accelerator targeting HBM-enabled FPGAs. First, we identify memory inefficiencies due to the access patterns of various sub-routines in rotation. We propose a dynamic data layout technique that converts large stride memory accesses to unit stride accesses to improve the bandwidth utilization. We leverage this technique to develop an FPGA accelerator that supports rotation for various HE parameter settings. The accelerator utilizes an optimized dataflow and an architecture specially designed to perform the dynamic data layout. We evaluate the accelerator using AMD U280 FPGA. Our design achieves up to 2.1x speedup compared with two commonly used static layout approaches and up to 1.47x speedup compared with state-of-the-art GPU implementation across various rotation benchmarks.
In this work we revisit the Atari 2600, the first widely popular home video game, recreating it on a programmable hardware. The console is designed from the scratch using the Verilog Hardware Description Language (HDL...
详细信息
This paper proposes a parallel fixed point radial basis function (RBF) artificial neural network (ANN), implemented in a field programmable gate array (FPGA) trained with a least mean square (LMS) algorithm. The proce...
详细信息
ISBN:
(纸本)9781479968480
This paper proposes a parallel fixed point radial basis function (RBF) artificial neural network (ANN), implemented in a field programmable gate array (FPGA) trained with a least mean square (LMS) algorithm. The processing time and occupied area were analyzed for various fixed point formats. The problems of precision of the ANN response for nonlinear classification using the XOR gate, and interpolation using the sine function, were also analyzed in hardware implementation. The entire project was developed using the System Generator platform (Xilinx), with a Virtex-6 xc6vcx240t-1ff1156 as the target FPGA.
暂无评论