In recent years, hardware accelerators have shown significant advantages in the field of deep neural networks. The architecture used in such systems has high memory utilization. To make the most of memory technology i...
详细信息
作者:
Di Renzo, MarcoUniversité Paris-Saclay
Cnrs CentraleSupélec Laboratoire des Signaux et Systémes Gif-sur-Yvette 91192 France King's College London
Centre for Telecommunications Research Department of Engineering London WC2R 2LS United Kingdom
Stacked intelligent metasurface (SIM) is an emerging technology that capitalizes on reconfigurable metasurfaces for several applications in wireless communications. SIM is considered an enabler for integrating communi...
详细信息
Object detection and tracking is one of the most important vision algorithms in intelligent environments, applications based on it are the driving force behind the growing deployment of edge devices. As fpgas can impl...
详细信息
reconfigurable Intelligent Surfaces (RIS) makes the channel more controllable by manipulating the propagation environment, so that wireless medium becomes smart when compared to conventional wireless communication sys...
详细信息
The proceedings contain 18 papers. The topics discussed include: performance evaluation on GPU-FPGA accelerated computing considering interconnections between accelerators;verifying hardware optimizations for efficien...
ISBN:
(纸本)9781450396608
The proceedings contain 18 papers. The topics discussed include: performance evaluation on GPU-FPGA accelerated computing considering interconnections between accelerators;verifying hardware optimizations for efficient acceleration;accelerating decision tree ensemble with guided branch approximation;a hardware/software co-design approach to prototype 6G mobile applications inside the GNU radio SDR ecosystem using FPGA hardware accelerators;meta-programming design-flow patterns for automating reusable optimizations;a single-source C++20 HLS flow for function evaluation on FPGA and beyond;memory and energy efficient memory model and instruction set architectures for tree data structures;stream computation of 3D approximate convex hulls with an FPGA;and FPGA-dedicated network vs. server network for pipelined computing with multiple fpgas.
The recent development of increasingly successful artificial intelligence applications have determined an unprecedented demand for optimized hardware for data-intensive computing. The currently employed device archite...
详细信息
ISBN:
(纸本)9798350373738
The recent development of increasingly successful artificial intelligence applications have determined an unprecedented demand for optimized hardware for data-intensive computing. The currently employed device architecture, which relies on the separation between computing and memory units, requires frequent data transfers, causing energy inefficiencies and increased latency. A promising approach to solve this problem is represented by the Logic-in-Memory (LiM) architecture 1 , which aims at drastically reducing data transfer by relying on computing units that are also able to locally store information. A LiM enabling technology should be CMOS compatible and fully scalable. In recent years it was shown that, by recurring to an independently-gated Schottky-Barrier Field Effect Transistor (SB-FET), it is possible to obtain reconfigurable transistors that can show both n- and p-behavior without the need of doping2: while slightly increasing the complexity of the single transistor, this design allows to realize polymorphic logic gates with a reduced footprint with respect to the classic CMOS architecture3. When equipped with a memory element, such as a ferroelectric layer, this approach could prove promising for the realization of densely packed, adaptable LiM hardware. Here we explore the integration of a thin Hf0.5Zr0.5O2 (HZO) layer into a Double-Top-Gated (DTG) SB-FET, a device where two polarity gates (PG) are located onto the two Schottky Barriers. This design is explicitly chosen to obtain a ferroelectric layer just in proximity of the Schottky Barrier areas, as shown in Fig. 1a , by exploiting the property of HZO of crystallizing into such a phase only when in contact with specific metals, like Pd or TiN. This allows us to investigate how a ferroelectric HZO phase can influence the injection of carriers, without modifying the channel transport properties. HZO also offers the advantage of being ALD- and CMOS-compatible and has already been shown to be promising fo
Pruning has become an extremely powerful and effective technique to compress and accelerate sophisticated deep neural networks on resource-constrained platforms. Existing pruning and quantization methods aim to reduce...
详细信息
Modern applications in the fields of machine learning, and scientific computing benefit from custom hardware configurations. In these cases, rather than designing or remapping the algorithms onto new accelerators, we ...
详细信息
ISBN:
(数字)9798350356830
ISBN:
(纸本)9798350356847
Modern applications in the fields of machine learning, and scientific computing benefit from custom hardware configurations. In these cases, rather than designing or remapping the algorithms onto new accelerators, we propose flexible Processing Elements (PEs) that can adapt or reconfigure themselves according to the data type and compute type of the workloads. In this paper, we propose FlexPE, a framework that can generate reconfigurable and flexible (shrinkable or expandable) PEs according to the workload. It can generate an Field Programmable Gate Array (FPGA)-based custom accelerator in Register Transfer Language (RTL) with exactly the type of computations/data types required for the workload so that idle resources are not instantiated. FlexPE is evaluated on AMD-Xilinx ZedBoard and ZCU-104 fpgas on PolyBench and MachSuite workloads. Through this approach, we achieve a remarkable reduction of resources by nearly 35x on the fpgas. We achieve an impressive throughput increase of 81x and 43x on the two fpgas on average, and 35x reduction in resources compared to related work.
The two largest barriers to adoption of FPGA platforms for HPC applications are the difficulty of programming fpgas and the performance gap when compared to GPUs. To address the first barrier, new ecosystems like Inte...
详细信息
In this paper, Rossler’s attractor and the circular shifting method are used to create a hardware-based encryption scheme that can be changed to protect speech communication. In order to ratify the limitations of sof...
详细信息
暂无评论