检索结果-内蒙古大学图书馆

Shrinking FPGA Static Power via Machine Learning-Based Power Gating and Enhanced Routing

IEEE ACCESS 2021年 9卷 115599-115619页

作者： Seifoori, Zeinab Asadi, Hossein Stojilovic, Mirjana Sharif Univ Technol Dept Comp Engn Tehran *** Iran Ecole Polytech Fed Lausanne EPFL Sch Comp & Commun Sci CH-1015 Lausanne Switzerland

Despite FPGAs rapidly evolving to support the requirements of the most demanding emerging applications, their high static power consumption, concentrated within the routing resources, still presents a major hurdle for low-power applications. Augmenting the FPGAs with power-gating ability is a promising way to effectively address the power-consumption obstacle. However, the main challenge when implementing power gating is in choosing the clusters of resources in a way that would allow the most power-saving opportunities. In this paper, we take advantage of machine learning approaches, such as K-means clustering, to propose efficient algorithms for creating power-gating clusters of FPGA routing resources. In the first group of proposed algorithms, we employ K-means clustering and exploit the utilization pattern of routing resources. In the second group of algorithms, we enhance the power-gating efficiency by minimizing the power overhead introduced by power-gating logic and by taking into account the size of routing multiplexers, which influences the power-gating efficiency. Finally, we enhance and further develop the baseline FPGA routing algorithm to be aware and take advantage of power gating opportunities. The experimental results on Titan benchmark suite and the latest Intel Stratix-IV FPGA architecture in VTR 8.0 show that our approaches achieve an improvement of about 70%, on average, in reducing the FPGA static power consumption over the best power-gating approaches proposed in the previous studies.

关键词： field programmable gate arrays Routing Power demand Clustering algorithms Machine learning algorithms Switches Benchmark testing field-programmable gate arrays static power consumption power gating routing algorithm machine learning

来源：评论

学校读者我要写书评

暂无评论

Square Kilometre Array Low Atomic commercial off-the-shelf correlator and beamformer

引用

JOURNAL OF ASTRONOMICAL TELESCOPES INSTRUMENTS AND SYSTEMS 2022年第1期8卷

作者： Hampson, Grant A. Bunton, John D. Humphrey, David Bengston, Keith J. Jourjon, Guillaume Bolin, Andrew B. Chen, Yuqing Troup, Euan R. Babich, Giles C. van Aardt, Jason C. CSIRO Space & Astron Marsfield NSW Australia

The Square Kilometre Array Low is a next generation radio telescope, consisting of 512 antenna stations spread over 65 km, to be built in Western Australia. The correlator and beamformer (CBF) design is central to the telescope signal processing. CBF receives 6 Tera-bits-per-second (Tbps) of station data continuously and processes it in real time with a compute load of 2 Peta-operations-per-second (Pops). The correlator calculates up to 22 million cross products between all pairs of stations, whereas the beamformers (BFs) coherently sum station data to form more than 500 beams. The output of the correlator is up to 7 Tbps, and the BF 2 Tbps. The design philosophy, called "Atomic COTS," is based on commercial off-the-shelf (COTS) hardware. Data routing is implemented in network switches programmed using the Programming Protocol-Independent Packet Processors (P4) language and the signal processing occurs in COTS field-programmable gate array (FPGA) cards. The P4 language allows routing to be determined from the metadata in the Ethernet packets from the stations. That is, metadata describing the contents of the packet determines the routing. Each FPGA card inputs a fraction of the overall bandwidth for all stations and then implements the processing needed to generate complete science data products. Generation of complete science products in a single FPGA is named here as Atomic processing. A Tango distributed control system configures the multitude of processing modes as well as maintaining the overall health of the CBF system hardware. The resulting 6 Tbps in and 9 Tbps out, 2 Pops Atomic COTS network attached accelerator occupies five racks and consumes 60 kW. (C) The Authors. Published by SPIE under a Creative Commons Attribution 4.0 International License.

关键词： antenna arrays correlators beam steering accelerator architectures communication switching field-programmable gate arrays

来源：评论

学校读者我要写书评

暂无评论

Optimized FPGA-based elliptic curve cryptography processor for high-speed applications

引用

INTEGRATION-THE VLSI JOURNAL 2011年第4期44卷 270-279页

作者： Jarvinen, Kimmo Aalto Univ Sch Sci & Technol Dept Informat & Comp Sci FIN-00076 Aalto Finland

In this paper, we introduce an FPGA-based processor for elliptic curve cryptography on Koblitz curves. The processor targets specifically to applications requiring very high speed. The processor is optimized for performing scalar multiplications, which are the basic operations of every elliptic curve cryptosystem, only on one specific Koblitz curve;the support for other curves is achieved by reconfiguring the FPGA. We combine efficient methods from various recent papers into a very efficient processor architecture. The processor includes carefully designed processing units dedicated for different parts of the scalar multiplication in order to increase performance. The computation is pipelined providing simultaneous processing of up to three scalar multiplications. We provide experimental results on an Altera Stratix II FPGA demonstrating that the processor computes a single scalar multiplication on average in 11.71 mu s and achieves a throughput of 235,550 scalar multiplications per second on NIST K-163. (C) 2010 Elsevier B.V. All rights reserved.

关键词： Elliptic curve cryptography field-programmable gate arrays Koblitz curve Parallelism

来源：评论

学校读者我要写书评

暂无评论

A Flexible Heterogeneous Hardware/Software Solution for Real-Time HD H.264 Motion Estimation

引用

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 2008年第12期18卷 1781-1785页

作者： Urban, Fabrice Poullaouec, Ronan Nezan, Jean-Francois Deforges, Olivier Thomson R&D Content Delivery & Compress Lab F-35576 Cesson Sevugne France CNRS UMR 6164 Inst Elect & Telecommun Rennes Image Grp Lab Rennes France

Quarter-pixel accuracy and variable block-size significantly enhance compression performances of the MPEG-4 AVC/H.264 video compression standard over its predecessors, but also significantly increase computation requirements. Firstly, a digital signal processor (DSP)-based solution that achieves real-time integer motion estimation is proposed. Fractional-pixel refinement is too computationally intensive to be efficiently processed on a software-based processor. To address this restriction, a flexible and low complexity VLSI subpixel refinement coprocessor is designed. Thanks to an improved datapath, a high throughput is achieved with low logic resources. Finally, an heterogeneous (DSP-field-programmable gate array) solution to handle real-time motion estimation with variable block-size and fractional-pixel accuracy for high-definition video is studied. This solution, combining programmability and efficiency, achieves motion estimation of 720 p sequences at up to 60 fps.

关键词： Digital signal processors field-programmable gate arrays H.264 motion estimation parallel processing real-time

来源：评论

学校读者我要写书评

暂无评论

Partitioned state encoding for low power in FPGAs

引用

ELECTRONICS LETTERS 2005年第17期41卷 948-949页

作者： Mengibar, L Entrena, L Lorenz, AG Millán, ES Univ Carlos III Madrid Dept Tecnol Elect Grp Microelect E-28911 Madrid Spain

The problem of finite state machine (FSM) encoding for low power in field-programmable gate arrays (FPGAs) is addressed. In this technology, one-hot encoding is typically recommended for large FSMs and binary encoding for small FSMs. A partitioned encoding approach is proposed which uses a combination of both binary encoding and zero-one-hot encoding with intermediate code size. Experimental results demonstrate that the proposed encoding approach can produce significant power savings.

关键词： binary codes field programmable gate arrays finite state machines low-power electronics FPGA binary encoding field-programmable gate arrays finite state machineencoding partitioned state encoding zero-one-hot encoding

来源：评论

学校读者我要写书评

暂无评论

Scalable and modular algorithms for floating-point matrix multiplication on reconfigurable computing systems

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2007年第4期18卷 433-448页

作者： Zhuo, Ling Prasanna, Viktor K. Univ So Calif Dept Elect Engn Syst Los Angeles CA 90089 USA

The abundant hardware resources on current reconfigurable computing systems provide new opportunities for high-performance parallel implementations of scientific computations. In this paper, we study designs for floating- point matrix multiplication, a fundamental kernel in a number of scientific applications, on reconfigurable computing systems. We first analyze design trade-offs in implementing this kernel. These trade-offs are caused by the inherent parallelism of matrix multiplication and the resource constraints, including the number of configurable slices, the size of on-chip memory, and the available memory bandwidth. We propose three parameterized algorithms which can be tuned according to the problem size and the available hardware resources. Our algorithms employ a linear array architecture with simple control logic. This architecture effectively utilizes the available resources and reduces routing complexity. The Processing Elements (PEs) used in our algorithms are modular so that it is easy to embed floating- point units into them. Experimental results on a Xilinx Virtex-II Pro XC2VP100 show that our algorithms achieve good scalability and high sustained GFLOPS performance. We also implement our algorithms on Cray XD1. XD1 is a high-end reconfigurable computing system that employs both general-purpose processors and reconfigurable devices. Our algorithms achieve a sustained performance of 2.06 GFLOPS on a single node of XD1.

关键词： scientific computing field-programmable gate arrays reconfigurable hardware computations on matrices parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Experimental Validation of a Novel Method for Harmonic Mitigation for a Three-Phase Five-Level Cascaded H-Bridges Inverter

引用

IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS 2019年第6期55卷 6089-6101页

作者： Schettino, Giuseppe Viola, Fabio Di Tommaso, Antonino Oscar Livreri, Patrizia Miceli, Rosario Univ Palermo Dept Engn I-90133 Palermo Italy

Inmodern high-power electrical drives, the efficiency of the system is a crucial constraint. Moreover, the efficiency of power converters plays a fundamental role in modern applications requiring also a limited weight, such as the electric vehicles and novel more electric aircraft. The reduction of losses pushes for systems with a dc bus and a high number of dc/ac converters, widespread in the vehicle, not burdened by a too expensive data processing system. The purpose of this article is to concur to reduce losses by proposing an innovative selective harmonic mitigation method based on the identification of the working areas where the reference harmonics present lower amplitudes. In particular, the main objective is to find a new way to calculate the control angles in real-time operation without solving nonlinear equations, whose resolution would require expensive controllers. Through a very simple approach, the polynomial equations, which drive the control angles, were detected for a three-phase five-level cascaded H-bridge inverter and implemented in a digital system to real-time operation with a low computational cost. As a result, a comparison between the simulation and experimental behavior is presented. In the last part of this article, a real electric machine is driven by considering the appropriate working areas and current harmonics are also evaluated.

关键词： Cascade systems field-programmable gate arrays inverters multilevel systems power conversion harmonics

来源：评论

学校读者我要写书评

暂无评论

FPGA-based test bed for measurement of AM/AM and AM/PM distortion and modeling memory effects in RF PAs

引用

INTEGRATION-THE VLSI JOURNAL 2016年 52卷 291-300页

作者： Cruz Nunez-Perez, Jose Ricardo Cardenas-Valdez, Jose Montoya-Villegas, Katherine Apolinar Reynoso-Hernandez, J. Raul Loo-Yau, Jose Gontrand, Christian Tlelo-Cuautle, Esteban Inst Politecn Nacl Ctr Invest & Desarrollo Tecnol Digital IPN CITEDI Tijuana 22150 Baja California Mexico Ctr Sci Res & Higher Educ Ensenada CICESE Dept Elect & Telecommun Ensenada 22860 Baja California Mexico Inst Politecn Nacl IPN CINVESTAV Ctr Invest & Estudios Avanzados Zapopan 45019 Jalisco Mexico Inst Natl Sci Appl Lyon INSA Lyon UMR CNRS 5270 INL F-69621 Villeurbanne France INAOE Puebla 72840 Mexico

Using a field-programmable gate array (FPGA) development board, a digital signal processor (DSP) builder, and the phase-to-amplitude conversion principle, a low-cost system for measuring the amplitude-to-amplitude (AM/AM) and amplitude-to-phase (AM/PM) distortion curves of radio frequency (RF) power amplifiers (PAs) is presented. The state of the art based on the measurements and preliminary studies of AM/AM and AM/PM distortion curves is discussed. A full digital control of the test bed simulated/emulated in Matlab/Simulink is introduced to recalculate the known AM/AM and AM/PM measurements stored as look-up table (LUT). Finally, the low-cost system comprises the memory polynomial model (MPM) that involves the nonlinearity order and memory effects of real PAs. (C) 2015 Elsevier B.V. All rights reserved.

关键词： AM/AM AM/PM field-programmable gate arrays Memory modeling Memory polynomial model Power amplifier RF Test bed

来源：评论

学校读者我要写书评

暂无评论

Analysis of a Real-Time FSO System Utilizing Xia and SRRC Pulses in Multi-Band Carrier-Less Amplitude and Phase Modulation

引用

IEEE ACCESS 2024年 12卷 21004-21011页

作者： Haigh, Paul Anthony Abadi, Mojtaba Mansour Ghassemlooy, Zabih Quang, Nguyen The Thai Le, Son Hung, Nguyen Tan Newcastle Univ Sch Engn Newcastle Upon Tyne NE1 7RU Northd England Northumbria Univ Opt Commun Res Grp Newcastle Upon Tyne NE1 8ST England Le Quy Don Tech Univ Dept Commun Hanoi 100000 Vietnam Nokia Bell Labs Holmdel NJ 07974 USA Univ Da Nang Adv Inst Sci & Technol Da Nang 550000 Vietnam

In this paper, we investigate the impact of two pulse shapes on the performance of a real-time free-space optical communication link. The two candidate pulse shapes are the square-root raised cosine and Xia pulse, respectively which are tested as the basis function for multi-band carrier-less amplitude and phase modulation. We first develop a real-time system based on a Xilinx Zynq ZCU102 system-on-chip platform utilising a high-resolution analogue-to-digital-converter. We then generate multi-band carrier-less amplitude and phase modulation formats using it and test the error vector magnitude whilst varying parameters. We emulate the fog environment utilising neutral density filters and evaluate the error performance of the link under increasingly poor visibility conditions. We show that contrary to previous reports, the SRRC pulse shape offers superior performance over the first-order Xia pulse in the FSO environment operating at data rates exceeding 1 Gb/s.

关键词： Advanced modulation formats carrier-less amplitude and phase modulation communication systems field-programmable gate arrays free-space optics

来源：评论

学校读者我要写书评

暂无评论

Embedded architecture for noise-adaptive video object detection using parameter-compressed background modeling

引用

JOURNAL OF REAL-TIME IMAGE PROCESSING 2017年第2期13卷 397-414页

作者： Ratnayake, Kumara Amer, Aishy Concordia Univ Dept Elect & Comp Engn Montreal PQ Canada

Video processing algorithms are computationally intensive and place stringent requirements on performance and efficiency of memory bandwidth and capacity. As such, efficient hardware accelerations are inevitable for fast video processing systems. In this paper, we propose resource- and power-optimized FPGA-based configurable architecture for video object detection by integrating noise estimation, Mixture-of-Gaussian background modeling, motion detection, and thresholding. Due to large amount of background modeling parameters, we propose a novel Gaussian parameter compression technique suitable for resource- and power-constraint embedded video systems. The proposed architecture is simulated, synthesized and verified for its functionality, accuracy and performance on a Virtex-5 FPGA-based embedded platform by directly interfacing to a digital video input. Intentional exploitation of heterogeneous resources in FPGAs, and advanced design techniques such as heavy pipelining and data parallelism yield real-time processing of HD-1080p video streams at 30 frames per second. Objective and subjective evaluations to existing hardware-based methods show that the proposed architecture obtains orders of magnitude performance improvements, while utilizing minimal hardware resources. This work is an early attempt to devise a complete video surveillance system onto a stand-alone resource-constraint FPGA-based smart camera.

关键词： field-programmable gate arrays FPGA Video signal processing Noise Moving objects Motion detection Thresholding Gaussian background update Gaussian parameter compression

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：