Modern FPGAs provide increased gate count with decreased power consumption. Several IP cores along with embedded processor and memory provide a great opportunity of implementing System-on-Chip (SoC) designs on configu...
详细信息
ISBN:
(纸本)9781424403127
Modern FPGAs provide increased gate count with decreased power consumption. Several IP cores along with embedded processor and memory provide a great opportunity of implementing System-on-Chip (SoC) designs on configurable devices. Networks-on-Chip (NoC) is an emerging style of SoC design, introduced to overcome the communication and performance bottlenecks of a shared-bus approach. Multi Local Port Router (MLPR) present a novel design alternative for the traditional NoC design. this new methodology offers numerous advantages including bandwidth optimization and reduced network area & power consumption, resulting eventually in improved performance of the NoC system. Unlike the bus-based systems, communication in NoCs until now have been between pair of cores, with no scope of multi-casting. In this research, we advance a step further in the pursuit of a high performance FPGA-based NoC system. We exploit the multi-casting nature present in various application system task graphs And present a novel & improved MLPR architecture with broadcast capability. We present the modified architecture, the decoding scheme and the stripped-down crosspoint matrix, resulting in reduced logic usage & increased performance. We report the synthesis and the simulation results.
this paper presents the architecture, design, validation, and prototyping of inverse transforms and quantization, intra prediction, motion compensation and loop filter, for a main profile H.264/AVC decoder. these arch...
详细信息
ISBN:
(纸本)9781424403127
this paper presents the architecture, design, validation, and prototyping of inverse transforms and quantization, intra prediction, motion compensation and loop filter, for a main profile H.264/AVC decoder. these architectures were designed to reach high throughputs and to be easily integrated withthe other H.264/AVC modules. the architectures, all fully H.264/AVC compliant, were completely described in VHDL and further validated through simulations down to prototyping. the architectures were prototyped using a Digilent XUP V2P board, containing a Virtex-H Pro XC2VP30 Xilinx FPGA. the post place-and-route synthesis results indicate that the designed architectures are able to process 114 million of samples per second and, in the worst case, they are able to process 64 HDTV frames (1080x1920) per second, allowing their use in H.264/AVC decoders targeting real time HDTV applications.
In this paper we present a framework for the seamlessly utilization of hardware accelerators in heterogeneous SoCs that are used to speedup the processing of Spark data analytics applications.
ISBN:
(纸本)9789090304281
In this paper we present a framework for the seamlessly utilization of hardware accelerators in heterogeneous SoCs that are used to speedup the processing of Spark data analytics applications.
the power consumption of digital circuits, e.g., fieldprogrammable Gate Arrays (FPGAs), is directly related to their operating supply voltages. On the other hand, usually, chip vendors introduce a conservative voltag...
详细信息
ISBN:
(数字)9781538685174
ISBN:
(纸本)9781538685174
the power consumption of digital circuits, e.g., fieldprogrammable Gate Arrays (FPGAs), is directly related to their operating supply voltages. On the other hand, usually, chip vendors introduce a conservative voltage guardband below the standard nominal level to ensure the correct functionality of the design in worst-case process and environmental scenarios. For instance, this voltage guardband is empirically measured to be 12%, 20%, and 16% of the nominal level in commercial CPUs [1], Graphics Processing Units (GPUs) [2], and Dynamic RAMs (DRAMs) [3], respectively. However, in many real-world applications, this guardband is extremely conservative and eliminating it can result in significant power savings without any overhead. Motivated by these studies, we aim to extend the undevolting technique to commercial FPGAs. Toward this goal, we will practically demonstrate the voltage guardband for a representative Xilinx FPGA, with a preliminary concentration on on-chip memories, or Block RAMs (BRAMs).
the benefits of customising the precision throughout an FPGA design according to a design tolerance are well known. However, customising the precision of a design at run-time has the potential for an even greater perf...
详细信息
ISBN:
(纸本)9789090304281
the benefits of customising the precision throughout an FPGA design according to a design tolerance are well known. However, customising the precision of a design at run-time has the potential for an even greater performance impact. In this paper, we add the ability to dynamically choose the internal precision of a datapath. this enables a result that is at least as accurate as the worst-case under standard precisions, whilst internally operating at a lower precision. We demonstrate this technique on fused floating-point dot-product circuits. We show that for circuits with inputs that have a wide dynamic range, we can see substantial resource savings. We provide examples with savings of up to 75% of the DSPs and 16% of the ALMs over an optimised fused dot-product design.
We propose embedding hard NoCs on FPGAs to improve system-level communication as detailed in our previous studies [1-6]. this demo paper outlines the three main design and simulation tools that we have been using to e...
详细信息
ISBN:
(纸本)9781467381239
We propose embedding hard NoCs on FPGAs to improve system-level communication as detailed in our previous studies [1-6]. this demo paper outlines the three main design and simulation tools that we have been using to experiment with Embedded NoCs on FPGAs.
this paper presents a non-monolithic top-down reconfigurable multiplier suitable for embedding in an FPGA structure. It is constructed of four individual partitions that can operate as separate multipliers but also co...
详细信息
ISBN:
(纸本)9781424419609
this paper presents a non-monolithic top-down reconfigurable multiplier suitable for embedding in an FPGA structure. It is constructed of four individual partitions that can operate as separate multipliers but also concatenate to form a superior multiplier with increased precision and sign handling ability. the number of possible operation modes is limited in order to keep the reconfiguration overhead low. A small set of control signals determines behavior and mode selection. Inactive partitions are disconnected from the supply to save power. EMMA (Embedded Multi-precision Multiplier Array) can compute signed two's complement numbers at up to 32 x 16-bit precision when all partitions are active and concatenated, or up to four separate 16 x 8-bit multiplications running simultaneously.
A high speed FPGA off-loading engine for detecting the license plate itself in order to avoid the traffic accident is proposed. A complicated algorithm is written in Handel-C, and parallel processing is explicitly uti...
详细信息
ISBN:
(纸本)9781424410590
A high speed FPGA off-loading engine for detecting the license plate itself in order to avoid the traffic accident is proposed. A complicated algorithm is written in Handel-C, and parallel processing is explicitly utilized in every level of implementation;an input image is segmented into 16 areas, and each area is processed in parallel by a multiple calculation unit executing pipeline processing and a distributed memory module. A prototype circuit implemented on a general purpose FPGA board achieved 4.16 times performance as software execution on a Pentium-III desktop PC. the highest performance in literature;100 frames per second;can be achieved.
Restricted Boltzmann Machines (RBMs) - the building block for newly popular Deep Belief Networks (DBNs) - are a promising new tool for machine learning practitioners. However, future research in applications of DBNs i...
详细信息
ISBN:
(纸本)9781424438914
Restricted Boltzmann Machines (RBMs) - the building block for newly popular Deep Belief Networks (DBNs) - are a promising new tool for machine learning practitioners. However, future research in applications of DBNs is hampered by the considerable computation that training requires. In this paper, we describe a novel architecture and FPGA implementation that accelerates the training of general RBMs in a scalable manner, withthe goal of producing a system that machine learning researchers can use to investigate ever-larger networks. Our design uses a highly efficient, fully-pipelined architecture based on 16-bit arithmetic for performing RBM training on an FPGA. We show that only 16-bit arithmetic precision is necessary, and we consequently use embedded hardware multiply-and-add (MADD) units. We present performance results to show that a speedup of 25-30X can be achieved over an optimized software implementation on a high-end CPU.
暂无评论