Random number generators play an important role in the field of cryptography and security. It is often required that a random number generator consists of digital logic blocks only, so that it can be implemented on re...
详细信息
ISBN:
(纸本)9781424438914
Random number generators play an important role in the field of cryptography and security. It is often required that a random number generator consists of digital logic blocks only, so that it can be implemented on reconfigurable platforms. Since randomness cannot be proved by statistical tests there is a need for a provably secure hardware random number generator. In order to provide a proof of security, an experimental investigation of various physical effects on reconfigurable platforms is needed. In this paper we focus on the effect of narrow transitions suppression in the logic gates. the estimation of this effect may be crucial for the validity of the security proof of a RNG design. We explain our views on how experiments on FPGA should be performed and we give description of the measurement setup. We show that up to 98% of the transitions are suppressed in our experimental FPGA setup.
this paper presents an FPGA implementation of a low cost 8bit reconfigurable processor core for media processing applications. the core is optimized to provide all basic arithmetic and logic functions required by the ...
详细信息
ISBN:
(纸本)9781424438914
this paper presents an FPGA implementation of a low cost 8bit reconfigurable processor core for media processing applications. the core is optimized to provide all basic arithmetic and logic functions required by the media processing and other domains, as well as to make it easily integrable into a 2D array. this paper presents an investigation of the feasibility of the core as a potential soft processing architecture for FPGA platforms. the core was synthesized on the entire Virtex FPGA family to evaluate its overall performance, scalability and portability. A special feature of the proposed architecture is its simple programming model which allows low level programming. throughput results for popular benchmarks coded using the programming model and cycle accurate simulator are presented.
Nowadays, FPGAs are integrated in high-performance computing systems, servers, or even used as accelerators in System-on-Chip (SoC) platforms. Since the execution is performed in hardware, FPGA gives much higher perfo...
详细信息
Networked embedded systems have seen tremendous growth with many more complex critical and non-critical systems exchanging information over networks of various types. At each node, information is processed by the netw...
详细信息
ISBN:
(数字)9781538685174
ISBN:
(纸本)9781538685174
Networked embedded systems have seen tremendous growth with many more complex critical and non-critical systems exchanging information over networks of various types. At each node, information is processed by the network stack before the application sees the data. Large portions of the stack are in software, resulting in significant and non-deterministic delays. While hybrid compute platforms like the Xilinx Zynq can accelerate processing tasks through offloading to programmablelogic, the delays incurred due to connectivity can significantly impact overall application latency. In this paper, we present a smart network interface approach for the Xilinx Zynq platform based on datapath extensions within the otherwise standard Ethernet interface. We show that this approach improves computation offload latency by 24-27% and throughput by 37% for a complex computational kernel.
this tutorial describes the Why and How of the new 65-nm families of Virtex-5 FPGAs. It describes several aspects of the technology that affect speed, density, and power consumption. the basic device structure and pac...
详细信息
Sharing multi-cycle hardware blocks like the DSP48E1 primitive in Xilinx FPGAs can result in significant resource savings, but complicates scheduling. For high-throughput, DSP blocks must be pipelined, which results i...
详细信息
ISBN:
(纸本)9782839918442
Sharing multi-cycle hardware blocks like the DSP48E1 primitive in Xilinx FPGAs can result in significant resource savings, but complicates scheduling. For high-throughput, DSP blocks must be pipelined, which results in a high initiation interval (II) for resource shared implementations. In this paper, we propose a resource reduction technique that minimises DSP block usage while also offering improved II over traditional approaches. this is integrated in a high-level tool which takes datapath descriptions in C and generates synthesisable Verilog RTL with different levels of resource sharing. We demonstrate significantly improved throughput compared to traditional resource sharing while achieving resource reduction compared to resource unconstrained and HLS implementations. the approach explores an otherwise infeasible design space between resource unconstrained and traditional resource sharing methods.
作者:
Ye, A.Rose, J.Ryerson Univ
Dept Elect & Comp Engn Toronto ON M5B 2K3 Canada Univ Toronto
Edward S Rogers Sr Dept Elect & Comp Engn Toronto ON M5S 3G4 Canada
As the logic capacity of field-programmable gate arrays (FPGAs) increases, there has been a corresponding increase in the variety of FPGA building blocks. From a mere collection of conventional logic blocks, FPGAs can...
详细信息
As the logic capacity of field-programmable gate arrays (FPGAs) increases, there has been a corresponding increase in the variety of FPGA building blocks. From a mere collection of conventional logic blocks, FPGAs can now include digital signal processors, multipliers, multi-bit addressable memory cells and even processor cores. One of the common characteristics of these new building blocks is their multi-bit design, where each block is designed specifically to process several bits of data at a time. this multi-bit processing paradigm is significantly different from the single-bit processing design of the conventional FPGA logic blocks, as it creates differentiation in signals through its bussed structures. Consequently, the correlation between the positions of the signals in buses and the connectivity of these signals is examined. On the basis of correlation measurements, a multi-bit routing architecture is then proposed along with its routing tool. It is experimentally shown that, compared withthe conventional routing architectures, the multi-bit architecture requires 6-12% less area to implement. In particular, it needs 27% fewer routing switches to connect its multi-bit blocks to their routing tracks and 18% less configuration memory to store the configuration information.
Image features are broadly used in embedded computer vision applications, from object detection and tracking to motion estimation and 3D reconstruction. Efficient feature extraction and description are crucial due to ...
详细信息
ISBN:
(纸本)9782839918442
Image features are broadly used in embedded computer vision applications, from object detection and tracking to motion estimation and 3D reconstruction. Efficient feature extraction and description are crucial due to the real-time requirements of such applications over a constant stream of input data. High-speed computation typically comes at the cost of high power dissipation, yet embedded systems are often highly power constrained, making discovery of power-aware solutions especially critical for these systems. In this paper, we present a power and performance evaluation of three low cost feature detection and description algorithms implemented on various embedded systems (embedded CPUs, GPUs and FPGAs). We show that FPGAs in particular offer attractive solutions for both performance and power and describe several design techniques utilized to accelerate feature extraction and description algorithms on low-cost Zynq SoC FPGAs.
the interconnection networks used by current fine grain FPGAs are not scalable for very big array sizes. To address this issue, we apply the GALS (Globally Asynchronous and Locally Synchronous) paradigm to build scala...
详细信息
ISBN:
(纸本)9781424438914
the interconnection networks used by current fine grain FPGAs are not scalable for very big array sizes. To address this issue, we apply the GALS (Globally Asynchronous and Locally Synchronous) paradigm to build scalable FPGAs. the logic resources are divided into locally synchronous tiles and asynchronous communications among different tiles. To route the asynchronous communications, we build a serial network-on-chip. Targeting streaming applications, we propose a design flow that maps user applications to our new FPGA architecture. To validate our architecture and design flow, we build an emulation prototype and develop a JPEG baseline encoder as the case study. We have successfully demonstrated the concept and predict a maximum frequency of 224MHz for designs mapping to sFPGA2 architecture.
Configurable architectures offer the unique opportunity of customizing the storage allocation to meet specific applications' needs. In this paper we describe a compiler approach to map the arrays of a loop-based c...
详细信息
ISBN:
(纸本)9781424403127
Configurable architectures offer the unique opportunity of customizing the storage allocation to meet specific applications' needs. In this paper we describe a compiler approach to map the arrays of a loop-based computation to internal memories of a configurable architecture withthe objective of minimizing the overall execution time. We present an algorithm that considers the data access patterns of the arrays along the critical path of the computation as well as the available storage and memory bandwidth. We demonstrate experimental results of the application of this approach for a set of kernel codes when targeting a field-programmable Gate-Array (FPGA). the results reveal that our algorithm outperforms naive and custom data layouts for these kernels by an average of 33% and 15% in terms of execution time, while taking into account the available hardware resources.
暂无评论