Edge computing bridges the gap between end devices on the Internet of Things (IoT) and central, server-based IT services by accelerating tasks which otherwise could not be executed on the IoT device itself. The Fast F...
详细信息
The FPGA configurable integrated circuits (ICs) are a valid alternative to the common microcontroller solutions for application-specific (ASIC) embedded applications. The fpgas are applied in the prototype step, befor...
详细信息
Arbitrary-precision integer multiplication is the core kernel of many applications including scientific computing, cryptographic algorithms, etc. Existing acceleration of arbitrary-precision integer multiplication inc...
详细信息
ISBN:
(纸本)9798350322255
Arbitrary-precision integer multiplication is the core kernel of many applications including scientific computing, cryptographic algorithms, etc. Existing acceleration of arbitrary-precision integer multiplication includes CPUs, GPUs, fpgas, and ASICs. To leverage the hardware intrinsics low-bit function units (32/64-bit), arbitrary-precision integer multiplication can be calculated using Karatsuba decomposition, and Schoolbook decomposition by decomposing the two large operands into several small operands, generating a set of low-bit multiplications that can be processed either in a spatial or sequential manner on the low-bit function units, e.g., CPU vector instructions, GPU CUDA cores, FPGA digital signal processing (DSP) blocks. Among these accelerators, reconfigurablecomputing, e.g., FPGA accelerators are promised to provide both good energy efficiency and flexibility. We implement the state-of-the-art (SOTA) FPGA accelerator and compare it with the SOTA libraries on CPUs and GPUs. Surprisingly, in terms of energy efficiency, we find that the FPGA has the lowest energy efficiency, i.e., 0.29x of the CPU and 0.17x of the GPU with the same generation fabrication. Therefore, key questions arise: Where do the energy efficiency gains of CPUs and GPUs come from? Can reconfigurablecomputing do better? If can, how to achieve that? We first identify that the biggest energy efficiency gains of the CPUs and GPUs come from the dedicated vector units, i.e., vector instruction units in CPUs and CUDA cores in GPUs. FPGA uses DSPs and lookup tables (LUTs) to compose the needed computation, which incurs overhead when compared to using vector units directly. New reconfigurablecomputing, e.g., "FPGA+vector units" is a novel and feasible solution to improve energy efficiency. In this paper, we propose to map arbitrary-precision integer multiplication onto such a "FPGA+vector units" platform, i.e., AMD/Xilinx Versal ACAP architecture, a heterogeneous reconfigurablecomputing pla
When an application is accelerated with CoarseGrained reconfigurable Architecture (CGRA), it is compiled into Data Flow Graph (DFG). In conventional CGRA frameworks, only one DFG is accelerated in each epoch. Conseque...
详细信息
ISBN:
(纸本)9798350359114
When an application is accelerated with CoarseGrained reconfigurable Architecture (CGRA), it is compiled into Data Flow Graph (DFG). In conventional CGRA frameworks, only one DFG is accelerated in each epoch. Consequently, singlecontext CGRAs can't fully utilize hardware resources when executing multi-kernel applications. In this paper, we propose a dynamic partial reconfigurable CGRA framework for multikernel applications. The modeled CGRA can flexibly partition hardware resources and support parallelism of multiple DFGs by implementing dynamic partial reconfiguration (DPR). A multikernel scheduler based on integer linear programming (ILP) makes a timetable for the execution state of the application, and an incremental mapper compiles DFGs according to the timetable. Compared with the baseline, TRAM, our framework achieves an average throughput increase of 67.30% and utilization increase of 32.46% for a single task with multikernels while an average execution time reduction of 55.71% and an average utilization increase of 70.43% for applications with multiple tasks.
This project involves the design and implementation of a reconfigurable antenna system tailored explicitly to 5G applications. The proposed slotted meander reconfigurable patch antenna for 5G communication has been de...
详细信息
With the emergence of the Internet of Things, there is a growing demand for distributed and federated processing in consortium with several devices. One current trend is to use devices as processing hubs, providing se...
详细信息
Hyperspectral image analysis represents an extremely complex procedure from a computational point of view, mainly due to the high dimensionality of the data. This computational cost represents a significant disadvanta...
详细信息
ISBN:
(纸本)9798350344196
Hyperspectral image analysis represents an extremely complex procedure from a computational point of view, mainly due to the high dimensionality of the data. This computational cost represents a significant disadvantage in applications that require a real-time response, such as fire monitoring, prevention and monitoring of natural disasters, chemical spills, and other environmental pollutants. Dimensional reduction allows us to decrease the size of the image while preserving the important discriminating features, by eliminating redundant data or noise. Due to their reduced size, weight, and power consumption when compared to other high-performance computing systems, reconfigurable hardware solutions, such as field-programmable gate arrays, have been consolidated in recent years as one of the standard options for the quick processing of hyperspectral remotely sensed images. In this paper, we have implemented an optimized hardware version of the Fast Independent Component Analysis (FastICA) version for dimensional reduction of hyperspectral images using an FPGA device. Our implementation achieves a 34x speedup compared with its equivalent software version, achieving the objective of real-time processing considering the capture time of the AVIRIS spectral sensor.
Multiple-Input Multiple-Output (MIMO) technol-ogy is extensively used in various communication scenarios, posing challenges to the flexibility and real-time performance of its hardware solutions. In this paper, we pro...
详细信息
The computational requirements in modern workloads like artificial intelligence (AI), machine learning (ML), etc., demand the necessity of hardware acceleration and partial reconfiguration (PR) on a platform like a fi...
详细信息
Due to new enhancements in the field of computer architecture and the proliferation of heterogeneous computing devices, there is an increasing demand for portable and efficient programming applications. These applicat...
详细信息
暂无评论