检索结果-内蒙古大学图书馆

Proceedings of the 23rd Conference on Design, Automation and Test in Europe

作者： Xifan Tang Edouard Giacomin Patsy Cadareanu Ganesh Gore Pierre-Emmanuel Gaillardon University of Utah

ISBN: (纸本)9783981926347

The shift from centralized cloud to edge computing demands hardware systems with data processing capability at ultra-low power. Reconfigurable solutions such as field-programmable gate arrays (FPGAs) offer a high flexibility in terms of hardware implementation and are thus popular for use in many edge computing systems. However, breaking through the energy wall of FPGAs is a challenge, as low-power operation often requires compromising performances. In this paper, we study a low-power high-performance FPGA architecture exploiting Resistive Random Access Memory (RRAM) technology. To perform a comprehensive analysis, we introduce a novel design flow which can rapidly prototype FPGA fabrics from which accurate area, delay, and power results can be obtained. Based on full-chip layouts and SPICE simulations, we show that RRAM-based FPGAs can improve up to 8%/22%/16% in area/delay/power compared to SRAM-based counterparts at nominal voltage. Even when operated at a near-Vt supply, the proposed RRAM-based FPGA can improve the Energy-Delay Product by about 2 X without any delay overhead, when compared to an SRAM-based FPGA. In addition, Monte Carlo simulations showed that the proposed RRAM-based FPGA architecture stays robust under different CMOS process corners as well as under a 30% RRAM resistance standard deviation.

关键词： low-power design field-programmable gate arrays resistive memories

来源：评论

学校读者我要写书评

暂无评论

FPGA-Based Architecture for Medium Access Techniques in Broadband PLC

引用

IEEE ACCESS 2018年 6卷 9534-9542页

作者： Poudereux, Pablo Hernandez, Alvaro Cruz-Roldan, Fernando Mateos, Raul Univ Alcala Dept Elect E-28805 Alcala De Henares Spain Univ Alcala Signal Theory & Commun Dept E-28805 Alcala De Henares Spain

In this paper, two real-time architectures of medium access techniques useful for future generation of wireline and wireless communication systems are presented. One architecture is based on discrete cosine transform (DCT), while the second approach implements a filter-bank multi-carrier (FBMC) system. A comparative analysis, in terms of resource consumption, performance, and precision, is shown. The comparison considers a floating-point model, a fixed-point model, and experimental tests. These models make it possible to evaluate the effect of the fixed-point precision in the implementation and, in turn, to verify the correctness of the developed architecture. The simulation models and the experimental tests have been carried out in different practical environments in order to achieve a further analysis. The two proposed architectures have been implemented on a field-programmable gate array (FPGA) device. Furthermore, the architectures have been included as advanced peripherals in a system-on-chip, which also integrates a soft microprocessor to monitor the whole system and manage the data transfers. As a communication scenario, the proposed architectures have been particularized to operate in real time while meeting all timing requirements de fined by a broadband power line communications standard. For that case, the system has achieved a desired transmission rate of 62.5 Ms/s at the converters, providing mean squared errors, at the output for an ideal channel, below 3 .10(-5) for both the DCT and FBMC approaches, whereas each transmitter/receiver requires around 50% of the DSP cells available in the Xilinx XC6VLX240T FPGA, the most demanded resource in the device.

关键词： field-programmable gate arrays multi-carrier communication (MCM) filter-bank multicarrier (FBMC) systems broadband power-line communications discrete cosine transform (DCT)

来源：评论

学校读者我要写书评

暂无评论

Algorithms for Multiplierless Multiple Constant Multiplication in Online Arithmetic

引用

CIRCUITS SYSTEMS AND SIGNAL PROCESSING 2018年第11期37卷 5127-5142页

作者： Joseph, Georgina Binoy Devanathan, R. Toc H Inst Sci & Technol Dept Elect & Commun Engn Kochi Kerala India Hindustan Inst Technol & Sci Sch Elect Engn Madras Tamil Nadu India

Online arithmetic operators offer advantages of reduction in resource utilization and interconnection complexity besides providing pipelining at digit level. Multiplierless constant coefficient multiplication using the shift-and-add technique is widely used in digital signal processing applications. This paper proposes a novel bit serial adaptation of the parallel shift-and-add algorithm to online arithmetic. The proposed multipliers use right shifts instead of the traditional left shifts resulting in causal online implementations. Graph-based and hybrid algorithms are developed for the estimation of the distance of a constant from a set of constants in terms of the number of additions and for the synthesis of online multiple constant multipliers under area and online delay constraints. The computational complexity of the algorithms is determined. Results of implementation on randomly generated constant sets and FIR filter instances show substantial improvements in the number of operations required using the distance heuristic. Further, it is shown that the proposed techniques and algorithms result in significant savings in resource utilization, logic depth, and clock frequency compared to parallel and digit-serial algorithms.

关键词： Online arithmetic Digital signal processing Constant coefficient multiplication Graph-based algorithms field-programmable gate arrays

来源：评论

学校读者我要写书评

暂无评论

Onboard Processing With Hybrid and Reconfigurable Computing on Small Satellites

引用

PROCEEDINGS OF THE IEEE 2018年第3期106卷 458-470页

作者： George, Alan D. Wilson, Christopher M. Univ Pittsburgh Dept Elect & Comp Engn Dept NSF Ctr Space High Performance & Resilient Comp Pittsburgh PA 15261 USA Univ Pittsburgh NSF Ctr Space High Performance & Resilient Comp Pittsburgh PA 15261 USA

Due to the increasing demands of onboard sensor and autonomous processing, one of the principal needs and challenges for future spacecraft is onboard computing. Space computers must provide high performance and reliability (which are often at odds), using limited resources (power, size, weight, and cost), in an extremely harsh environment (due to radiation, temperature, vacuum, and vibration). As spacecraft shrink in size, while assuming a growing role for science and defense missions, the challenges for space computing become particularly acute. For example, processing capabilities on CubeSats (smaller class of SmallSats) have been extremely limited to date, often featuring microcontrollers with performance and reliability barely sufficient to operate the vehicle let alone support various sensor and autonomous applications. This article surveys the challenges and opportunities of onboard computers for small satellites (SmallSats) and focuses upon new concepts, methods, and technologies that are revolutionizing their capabilities, in terms of two guiding themes: hybrid computing and reconfigurable computing. These innovations are of particular need and value to CubeSats and other SmallSats. With new technologies, such as CHREC Space Processor (CSP), we demonstrate how system designers can exploit hybrid and reconfigurable computing on SmallSats to harness these advantages for a variety of purposes, and we highlight several recent missions by NASA and industry that feature these principles and technologies.

关键词： Fault-tolerant systems field-programmable gate arrays radiation effects reconfigurable architectures satellites space radiation

来源：评论

学校读者我要写书评

暂无评论

A Resistive RAM-Based FPGA Architecture Equipped With Efficient Programming Circuitry

引用

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS 2018年第7期65卷 2196-2209页

作者： Khaleghi, Behnam Asadi, Hossein Sharif Univ Technol Dept Comp Engn Data Storage Networks & Proc DSN Lab Tehran 111558639 Iran Sharif Univ Technol Dept Comp Engn Tehran 111558639 Iran

Despite the considerable effort has been put on the application of Non-Volatile Memories (NVMs) in field-programmable gate arrays FPGAs, previously suggested designs are not mature enough to substitute the state of-the-art SRAM-based counterparts mainly due to the inefficient building blocks and/or the overhead of programming structure which can impair their potential benefits. In this paper, we present a Resistive Random Access Memory RRAM-based FPGA architecture employing efficient Switch Box (SB) and Look-Up Table (LUT) designs with programming circuitry integrated in both SB and LUT designs that creates area and power efficient programmable components while precluding performance overhead to these blocks. In addition, we present an efficient scheme to load the configuration bitstream into the memory elements, which makes the configuration time comparable to that of SRAM-based FPGAs. Besides, we investigate the correct functionality and reliability of the programming structure subject to fluctuations in attributes of RRAM cells. Using Versatile Place and Route (VTR) tool with the obtained characteristics of the proposed blocks demonstrate that the average area and delay of the proposed FPGA architecture are 59.4% and 20.1% less than conventional SRAM-based FPGAs. Compared with a recent RRAM-based architecture, the proposed architecture improves the area and power by 49.7% and 33.8% while keeps the delay intact.

关键词： Emerging non-volatile memory resistive random access memory field-programmable gate arrays programming circuitry resistive fluctuation

来源：评论

学校读者我要写书评

暂无评论

Real-Time Finite-Element Simulation of Electromagnetic Transients of Transformer on FPGA

引用

IEEE TRANSACTIONS ON POWER DELIVERY 2018年第4期33卷 1991-2001页

作者： Liu, Peng Dinavahi, Venkata Univ Alberta Dept Elect & Comp Engn Edmonton AB T6G 2R3 Canada

The computation of the electromagnetic transients in a power transformer with nonlinear material using the finite element method (FEM) is so dense that the traditional nonlinear solver employing the Newton-Raphson method can hardly execute in real time. In this paper, we emulate the finite-element computation of electromagnetic transients of a transformer in real time for the first time. The transmission line modeling (TLM) method employed in the FEM successfully decoupled the nonlinear elements from the linear network so the nonlinearities could he solved individually, which is perfect for parallel processing. The parallelism of the TLM-FE solution is sufficiently explored and realized on a field-programmable gate array with deep data pipelining, and the implementation can execute in real time and provide detailed field information of the transformer during the transients. The proposed noniterative field-circuit coupling enabled the transformer to interface with an external network and the comparison with commercial FEM software proved the accuracy and computational efficiency of the real-time FE model.

关键词： Electromagnetic transients field-programmable gate arrays field-circuit coupling finite element method parallel processing power transformer real-time systems transmission line modeling

来源：评论

学校读者我要写书评

暂无评论

Enhanced Model and Real-Time Simulation Architecture for Modular Multilevel Converter

引用

IEEE TRANSACTIONS ON POWER DELIVERY 2018年第1期33卷 466-476页

作者： Ashourloo, Mojtaba Mirzahosseini, Ramin Iravani, Reza Univ Toronto Dept Elect & Comp Engn Toronto ON M5S 3G4 Canada

This paper presents i) an equivalent model of the half-bridge modular multilevel converter (HB-MMC) which is suitable for real-time applications, ii) a hybrid central-processing unit/ field-programmable gate array (CPU/FPGA)-based architecture for real-time simulation of electromagnetic transients of systems which include HB-MMC, and iii) a novel arrangement for sorting results referred to as the "sub-module (SM) rank list", which tackles the bottleneck for parallel implementation of the MMC arm model solver on the FPGA. The Adam-Bashforth (AB) method is used for numerical integration of the HB-SM capacitor model. The second-order AB method provides a constant admittance matrix of the HB-MMC and, thus, reduces computational burden while offering the same accuracy as that of the widely used Trapezoidal method. The CPU/FPGA-based architecture is optimized to obtain maximum parallelism of the HB-MMC model implementation, adopting a standard, single-precision, floating-point computational engine. The proposed sorting arrangement is independent of the utilized sorting algorithm and its application to the odd-even bubble sorting scheme is presented in this paper. The proposed architecture offers a simulation time-step of 825 ns while including the sorting module as the SM capacitor voltage-balancing control unit. This enables accurate analysis of MMC controls based on either software-in-the-loop or hardware-in-the-loop approaches. Performance and accuracy of the MMC model and the hybrid CPU/FPGA-based architecture are evaluated based on a set of case studies on a 401-level HB-MMC-based HVDC station and verified based on offline simulation results in the PSCAD/EMTDC environment.

关键词： Equivalent circuits field-programmable gate arrays HVDC converters real-time systems

来源：评论

学校读者我要写书评

暂无评论

Mapping a Guided Image Filter on the HARP Reconfigurable Architecture Using OpenCL

引用

ALGORITHMS 2019年第8期12卷 149-149页

作者： Faict, Thomas D'Hollander, Erik H. Goossens, Bart Univ Ghent Dept Elect & Informat Syst B-9052 Ghent Belgium Univ Ghent IMEC IPI Dept Telecommun & Informat Proc B-9000 Ghent Belgium

Intel recently introduced the Heterogeneous Architecture Research Platform, HARP. In this platform, the Central Processing Unit and a field-programmable gate Array are connected through a high-bandwidth, low-latency interconnect and both share DRAM memory. For this platform, Open Computing Language (OpenCL), a High-Level Synthesis (HLS) language, is made available. By making use of HLS, a faster design cycle can be achieved compared to programming in a traditional hardware description language. This, however, comes at the cost of having less control over the hardware implementation. We will investigate how OpenCL can be applied to implement a real-time guided image filter on the HARP platform. In the first phase, the performance-critical parameters of the OpenCL programming model are defined using several specialized benchmarks. In a second phase, the guided image filter algorithm is implemented using the insights gained in the first phase. Both a floating-point and a fixed-point implementation were developed for this algorithm, based on a sliding window implementation. This resulted in a maximum floating-point performance of 135 GFLOPS, a maximum fixed-point performance of 430 GOPS and a throughput of HD color images at 74 frames per second.

关键词： field-programmable gate arrays OpenCL high-performance computing guided image filter

来源：评论

学校读者我要写书评

暂无评论

A Monolithic 3D Hybrid Architecture for Energy-Efficient Computation

引用

IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS 2018年第4期4卷 533-547页

作者： Yu, Ye Jha, Niraj K. Princeton Univ Dept Elect Engn Princeton NJ 08544 USA

The exponentially increasing performance of chip multiprocessors (CMPs) predicted by Moore's Law is no longer due to the increasing clock rate of a single CPU core, but on account of the increase of core counts in the CMP. More transistors are integrated within the same footprint area as the technology node shrinks to deliver higher performance. However, this is accompanied by higher power dissipation that usually exceeds the coping capability of inexpensive cooling techniques. This Power Wall prevents the chip from running at full speed with all the devices powered-on. This is known as the dark silicon problem. Another major bottleneck in CMP development is the imbalance between the CPU clock rate and memory access speed. This Memory Wall keeps the CPU from fully utilizing its compute power. To address both the Power and Memory Walls, we propose a monolithic 3D hybrid architecture that consists of a multi-core CPU tier, a fine-grain dynamically reconfigurable (FDR) field-programmable gate array (FPGA) tier, and multiple resistive RAM (RRAM) tiers. The FDR tier is used as an accelerator. It uses the concept of temporal logic folding to localize on-chip communication. The RRAM tiers are connected to the CPU and FDR tiers through an efficient memory interface that takes advantage of the tremendous bandwidth available from monolithic inter-tier vias and hides the latency of large data transfers. We evaluate the architecture on two types of benchmarks: compute-intensive and memory-intensive. We show that the architecture reduces both power and energy significantly at a better performance for both types of applications. Compared to the baseline, our architecture achieves an average of 43.1x and 2.5x speedup on compute-intensive and memory-intensive benchmarks, respectively. The power and energy consumption are reduced by 5.0x and 40.5x, respectively, for compute-intensive applications, and 2.0x and 4.2x, respectively, for memory-intensive applications. This translates to

关键词： Dynamic reconfiguration field-programmable gate arrays hybrid architecture monolithic 3D integration memory-processor interface

来源：评论

学校读者我要写书评

暂无评论

Implementation and optimisation of pulse compression algorithm on open CL-based FPGA

引用

JOURNAL OF ENGINEERING-JOE 2019年第21期2019卷 7752-7754页

作者： Feng, Yingxu Hu, Shanqing Li, Xingming Yu, Jiacheng Beijing Inst Technol Sch Informat & Elect Beijing Key Lab Embedded Real Time Informat Proc Beijing 100081 Peoples R China Beijing Sci & Technol Leike Elect Informat Techno Beijing 10081 Peoples R China

As Moore's law meets bottlenecks, the demand for heterogeneous parallel processing systems is increasing. field-programmable gate arrays (FPGAs) are becoming more efficient acceleration devices due to their powerful processing performance, and the CPU + FPGA architecture under the OpenCL framework has become the trend of heterogeneous parallel processing systems. This study focuses on the optimisation of pulse compression algorithm in FPGA based on OpenCL, which plays an important role in modern radar signal processing systems. By using double cache for ping-pang storage of data between matched filter and inverse fast Fourier transform (IFFT), an optimised processing method is proposed by using a pipeline and verify the method by using Arria 10 GX1150 FPGA with two groups of 2 GB DDR3;the results show that the proposed method can achieve 2.89x performance improvement over the conventional implementation.

关键词： optimisation field programmable gate arrays pulse compression matched filters cache storage fast Fourier transforms microprocessor chips inverse transforms parallel architectures OpenCL framework heterogeneous parallel processing systems pulse compression algorithm optimised processing method Arria 10 GX1150 FPGA open CL-based FPGA Moore's law field-programmable gate arrays efficient acceleration devices processing performance CPU-FPGA architecture radar signal processing systems double cache ping-pang storage DDR3 matched filter IFFT storage capacity 2 Gbit

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：