检索结果-内蒙古大学图书馆

HLS-Based Large Scale Self-Organizing Feature Maps

IEEE ACCESS 2024年 12卷 142459-142474页

作者： Porrmann, Florian Hagemeyer, Jens Porrmann, Mario Bielefeld Univ Cognitron & Sensor Syst Grp CITEC D-33615 Bielefeld Germany Osnabruck Univ Inst Comp Sci Comp Engn Grp D-49090 Osnabruck Germany

The Self-Organizing Map (SOM) algorithm is a clustering algorithm used in a wide variety of application domains. Over the last few decades, it has been accelerated using various hardware architectures, including FPGAs, CPUs, and GPUs. This publication presents an High-Level Synthesis-based implementation that utilizes multiple processing elements to realize a high-performance system architecture. An extensive design space exploration was conducted to evaluate the performance range of the architecture. For this, vector dimensions ranging from 8 up to 512 and map sizes from 16x 16 to 512x512 were used. The evaluation was performed using two different AMD/Xilinx UltraScale+ FPGA systems, the VCU128 PCIe-based accelerator card and the ZCU106 stand-alone evaluation kit. From the achieved results, it can be seen that the performance scales nearly linearly for a given vector dimension when the map size is increased. In addition, the energy efficiency for both FPGAs was analyzed, revealing that while the ZCU106 is less powerful in terms of raw compute power, it requires up to 4x less power and, depending on the configuration, can be 2x more energy efficient compared to the VCU128. One of the main reasons for this is that it does not require a dedicated host system but utilizes its internal ARM cores. Finally, a comparison against state-of-the-art SOM implementations was conducted. The proposed design achieves a speed-up of up to 458.7, 1,630.4 , and 4.9 compared to other CPU, GPU, and FPGA realizations, respectively.

关键词： Neurons Computer architecture Clustering algorithms Training Graphics processing units Statistical analysis Space exploration Heterogeneous networks field programmable gate array hardware acceleration machine learning reconfigurable architectures reconfigurable computing heterogeneous computing heterogeneous architectures self-organizing feature maps optimization design space exploration heterogeneous architectures self-organizing feature maps optimization design space exploration

来源：评论

学校读者我要写书评

暂无评论

Modeling and simulation of three-phase IGBT full-bridge inverter circuit based on FPGA

引用

COMPUTERS & ELECTRICAL ENGINEERING 2024年第PartA期118卷

作者： Zhang, Junchu Yan, Jianjun Zhang, Yongming East China Univ Sci & Technol Sch Mech & Power Engn Shanghai 200237 Peoples R China East China Univ Sci & Technol Shanghai Key Lab Intelligent Sensing & Detect Tech Shanghai 200237 Peoples R China Shanghai Inst Elect Engn Sch Elect Engn Shanghai 201306 Peoples R China

The field of motor drive makes extensive use of electronic power modeling and simulation of three-phase IGBT full-bridge inverter circuits. The accuracy and computational efficiency of these models have a direct impact on the dependability of the motor control system. The majority of earlier research focused solely on static processes of turning on and off in IGBTs, disregarding the transient proesses that occur when three-phase IGBT full-bridge inverter circuits are switched at high frequencies. This has an impact on the circuits' accuracy in real-time simulation. Therefore, this paper proposes and builds a field-programmable logic gate array (FPGA)-based steady-state and transient dual-phase three-phase IGBT full-bridge inverter circuit model for the static and transient characteristics of the insulated gate bipolar transistor (IGBT) element in the circuit. Depending on whether or not the switching states of the six IGBTs in the three-phase IGBT fullbridge inverter circuit are altered, the simulation process is split into steady state and transient phases. In the steady state phase with large step size, the circuit is d iscretized using the binary L/ C approach. In the transient phase, the transient process is divided into several small-step-long time domains. Real-time simulation waveforms are generated by interleaving and combining the multistage fitting method's solution of the circuit's transient waveforms at tiny step lengths with the steady state phase. Finally, in order to demonstrate the accuracy of the circuit model in this work, the simulation results of the two-stage three-phase IGBT full-bridge inverter circuit model based on FPGA are compared with those of the conventional ideal model for waveform comparison and data analysis.

关键词： field programmable gate array Three-phase inverter circuit Switch characteristics Real-time simulation

来源：评论

学校读者我要写书评

暂无评论

Hardware Architectures for Computing Eigendecomposition-Based Discrete Fractional Fourier Transforms with Reduced Arithmetic Complexity

引用

CIRCUITS SYSTEMS AND SIGNAL PROCESSING 2024年第1期43卷 593-614页

作者： Bispo, Breno C. de Oliveira Neto, Jose R. Lima, Juliano B. Univ Fed Pernambuco Dept Elect & Syst Recife PE Brazil Univ Fed Pernambuco Dept Mech Engn Recife PE Brazil

The fractional Fourier transform (FrFT) is a useful mathematical tool for signal and image processing. In some applications, the eigendecomposition-based discrete FrFT (DFrFT) is suitable due to its properties of orthogonality, additivity, reversibility and approximation of continuous FrFT. Although recent studies have introduced reduced arithmetic complexity algorithms for DFrFT computation, which are attractive for real-time and low-power consumption practical scenarios, reliable hardware architectures in this context are gaps in the literature. In this paper, we present two hardware architectures based on the referred algorithms to obtain N-point DFrFT (N = 4L, L is a positive integer). We validate and compare the performance of such architectures by employing field-programmable gate array implementations, co-designed with an embedded hard processor unit. In particular, we carry out computer experiments where synthesis, error and latency analyses are performed, and consider an application related to compact signal representation.

关键词： field programmable gate array Discrete fractional Fourier transform Hardware implementation Arithmetic complexity

来源：评论

学校读者我要写书评

暂无评论

Enhanced Real-Time Multiuser Uplink UWOC System Based on Hybrid Multiple Access and SGD-PID Power Control Algorithm

引用

JOURNAL OF LIGHTWAVE TECHNOLOGY 2025年第1期43卷 190-197页

作者： Li, Xiao Xia, Yu Gui, Liangqi Li, Han Lang, Liang Huazhong Univ Sci & Technol Sch Cyber Sci & Engn Res Ctr 6G Mobile Commun Wuhan 430074 Peoples R China Huazhong Univ Sci & Technol Sch Elect Informat & Commun Wuhan 430074 Peoples R China

Non-orthogonal multiple access (NOMA) has been widely regarded as the most promising technique for achieving high spectral efficiency in optical communication systems. However, the practical implementation of power domain NOMA faces challenges related to inter-user interference and decoding complexity, limiting its multiplexing capability to a pair of users. In this paper, we experimentally demonstrate a hybrid multiple access scheme in the four-user underwater wireless optical communication (UWOC) system. Specifically, power domain NOMA is employed to multiplex two users within a user pair (UP), while time division multiple access (TDMA) is utilized for each UP. To validate the efficacy of the hybrid multiple access technique, robust watertight transceivers are designed and implemented in a 10-m outdoor pool. A calculation method based on the channel condition is first introduced to retain the received power within a proper range when users in the UP are randomly positioned. Besides, an adaptive stochastic gradient descent-based proportional-integral-derivation (SGD-PID) algorithm is proposed for practical scenarios where determining the system and channel parameters is difficult. Experimental results show that the proposed adaptive power control schemes can effectively enhance system performance under different channel conditions experienced by users. The UWOC system achieves a data rate of 30 Mbps for each user, maintaining bit error rates (BERs) below forward error correction (FEC) threshold. The results highlight the remarkable potential of the hybrid multiple access scheme along with our proposed adaptive power control algorithm.

关键词： NOMA Time division multiple access Synchronization Power control Uplink Resource management Real-time systems field programmable gate array hybrid multiple access scheme non-orthogonal multiple access power control underwater wireless optical communication user pairing

来源：评论

学校读者我要写书评

暂无评论

Remote Power Side- Channel Attacks on FPGAs

IEEE DESIGN & TEST

引用

IEEE DESIGN & TEST 2025年第1期42卷 13-19页

作者： Zhao, Mark Suh, G. Edward Stanford Univ Dept Elect Engn Stanford CA 94305 USA Nvidia Westford MA 01886 USA Cornell Univ Sch Elect & Comp Engn Ithaca NY 14853 USA

Editor's notes: Side-channel attacks, like those exploiting power and timing, are generally thought to require physical access to a system. This research challenges that idea by demonstrating how such attacks can be carried out on remote systems without physical access. It emphasizes the need to rethink how modern shared-FPGA systems are designed, prioritizing security as a core consideration. -Jeyavijayan Rajendran, Texas AM University, USA

关键词： field programmable gate arrays Logic Circuits Monitoring Power demand Logic gates Switches Channel models Side-channel attacks field programmable gate array ring oscillator power analysis attack side-channel attack system-on-chip

来源：评论

学校读者我要写书评

暂无评论

High-Throughput, Sorted QR Accelerator for Non-Linear Processing in Open-RAN Systems

引用

IEEE ACCESS 2024年 12卷 44564-44572页

作者： Thomas, Thomas James Filo, Marcin Nikitopoulos, Konstantinos Univ Surrey Inst Commun Syst 5G & 6G Innovat Ctr Wireless Syst Lab Guildford GU27XH England

Open Radio Access Networks (Open-RANs) require cost- and energy-efficient solutions to facilitate their large-scale deployment. A significant concern in multiple-input multiple-output (MIMO) systems employing traditional linear processing is the substantial number of radio frequency (RF) chains at the base station (BS), which is required to ensure accurate decoding of the spatially multiplexed streams. Recently, however, practical non-linear approaches, which facilitate near-optimal parallelizable tree searches, have been successfully implemented on actual systems and have demonstrated the capability to considerably reduce the required RF chains without affecting user performance. Similar to QR decomposition (QRD), which is used to perform channel inversion in linear systems, these non-linear approaches employ a sorted QRD (SQRD) to curtail the search complexity. However, this can be a significant bottleneck for general software-based non-linear solutions, preventing them from fully exploiting their gains. To address the latency limitations of SQRD, this work presents a high-throughput hardware accelerator based on reformulating the underlying Modified Gram Schmidt process (MGS) to extract further parallelism than previous designs. Implementations of the proposed architecture demonstrate at least 2-fold improvements in the achievable throughput and processing latency over existing 4 x 4 and 8 x 8 field-programmable gate array (FPGA) implementations and can be scaled up to 16 x 16 MIMO systems. Furthermore, the proposed accelerator was integrated with a software framework that can considerably offload the processing burden for a higher number of streams under strict latency conditions.

关键词： Streams MIMO communication Software Throughput Complexity theory Sorting Hardware acceleration Open radio access network multiple-input multiple-output sorted QR decomposition non-linear detection field programmable gate array

来源：评论

学校读者我要写书评

暂无评论

Toward Universal Multiplexer Multiply-Accumulate Architecture in Stochastic Computing

引用

IEEE ACCESS 2025年 13卷 33874-33882页

作者： Lee, Yang Yang Halim, Zaini Abdul Ab Wahab, Mohd Nadhir Almohamad, Tarik Adnan Univ Sains Malaysia Sch Elect & Elect Engn Nibong Tebal 14300 Penang Malaysia Univ Sains Malaysia Sch Comp Sci Gelugor 11800 Penang Malaysia Karabuk Univ Fac Engn Dept Elect Elect Engn TR-78050 Karabuk Turkiye

Stochastic Computing (SC) has recently gained attraction due to its inherent error tolerance and extremely simple arithmetic hardware, making it particularly effective for accelerating modern applications such as neural networks on resource-constrained devices. Traditional SC architectures often adopt binary computing principles, relying on dedicated hardware, i.e. AND gates for multiplication and multiplexers (MUX) for addition. However, SC's mathematical foundation enables the fusion of complex operations into remarkably simple hardware. Several SC studies demonstrated the potential of MUX-based architectures to perform multiply-and-accumulate (MAC) operations, but existing designs face correlation complication, scaling problems, and limited application scope. This paper introduces an auxiliary logic block to address the complexity of MUX select inputs, significantly enhancing the scalability of MUX-based MAC operations. The proposed approach has been validated through SC image processing tasks, including grayscale conversion and Sobel edge detection, achieving up to 75% reduction in hardware resource utilization on field-programmable gate arrays (FPGAs) and up to 96% improvement in computational accuracy compared to traditional AND/XNOR-based SC multipliers.

关键词： Binary sequences Logic gates Hardware Arithmetic Correlation Logic Finite impulse response filters field programmable gate arrays Adders Accuracy Stochastic computing stochastic multiplier image processing field programmable gate array multiply-accumulate hardware

来源：评论

学校读者我要写书评

暂无评论

NOMA Assisted Real-Time Downlink UWOC System Using a Software-Configurable Driver Under Different Channel Conditions

引用

JOURNAL OF LIGHTWAVE TECHNOLOGY 2024年第14期42卷 4860-4873页

作者： Li, Xiao Gui, Liangqi Xia, Yu Li, Yinan Yang, Xiaojiao Li, Hao Lang, Liang Huazhong Univ Sci & Technol Res Ctr Mobile Commun 6G Sch Cyber Sci & Engn Wuhan 430074 Peoples R China Huazhong Univ Sci & Technol Sch Elect Informat & Commun Wuhan 430074 Peoples R China Xian Inst Space Radio Technol Xian 710100 Peoples R China

In this article, we demonstrate a real-time, multi-user downlink underwater wireless optical communication (UWOC) system for practical applications via field programmable gate arrays (FPGAs). The performance of the established UWOC system is experimentally investigated under diverse channel conditions. The established UWOC system utilizes arrayed light emitting diodes (LEDs) as the transmitter and employs optical superimposition-based non-orthogonal multiple access (NOMA). The feasibility of employing the arrayed LEDs as the transmitter under different channel conditions is validated by simulating the optical intensity distributions of LED groups. Both the simulation and experimental results reveal that the bit error rate (BER) of user 1 (higher power user) decreases with increasing power allocation ratio (PAR), while the BER of user 2 (lower power user) increases with the increase in PAR. Uniform post-equalization is employed to extend the bandwidth of each LED group. Besides, a metal-oxide-semiconductor field-effect transistor (MOSFET)-based driver circuit and a remaining carrier sweep-out circuit unit (CSCU) are proposed to facilitate the independent control over different LED groups and enhance the response speed of LEDs, which outperform bias-tee and the single MOSFET-based driver circuit. Experimental results also indicate that NOMA exhibits superior spectral efficiency compared to the conventional time division multiple access (TDMA). A PAR of approximately 2:1 is appropriate to ensure both users operate within the forward error correction (FEC) threshold. Based on the proposed schemes, the real-time UWOC system achieves a data rate of 40 Mbps for both users with BER below FEC limit under different channel conditions. The results highlight the significant potential of the designed UWOC system to effectively meet diverse real-time, multi-user UWOC application requirements.

关键词： Light emitting diodes Optical scattering Optical transmitters Optical attenuators NOMA Adaptive optics Bandwidth field programmable gate array light emitting diode driver circuit non-orthogonal multiple access (NOMA) underwater channel condition underwater wireless optical communication (UWOC)

来源：评论

学校读者我要写书评

暂无评论

Resource-aware Montgomery modular multiplication optimization for digital signal processing

引用

JOURNAL OF SYSTEMS ARCHITECTURE 2024年 151卷

作者： Tao, Qiqi Li, Liying Zhou, Junlong Cao, Guitao Meng, Dan East China Normal Univ Shanghai Key Lab Trustworthy Comp Shanghai 200062 Peoples R China East China Normal Univ MoE Engn Res Ctr SW HW Codesign Technol & Applica Shanghai 200062 Peoples R China Nanjing Univ Sci & Technol Sch Comp Sci & Engn Nanjing 210094 Peoples R China Southeast Univ Natl Mobile Commun Res Lab Nanjing 211111 Peoples R China

Homomorphic encryption is an important technology for protecting data privacy, and the performance of modular multiplication directly affects the efficiency of homomorphic encryption. Currently, there are numerous FPGA-based acceleration techniques targeting modular multiplication. However, many of these implementations require substantial hardware resources or suffer from resource imbalance. This leads to a lower throughput. Therefore, we present a novel FPGA-based implementation of Montgomery Modular Multiplication aimed at addressing these challenges. Our design employs a suitable radix bit width and word size based on the digital signal processing (DSP) bit width rather than the conventional binary powers of two. We aim to instantiate more modular multipliers using limited resources while minimizing latency. We also introduce a novel DSP cascade structure, called parallel grouping cascade DSP, which reduces the number of clock cycles of internal multipliers. To balance the ratio of lookup table (LUT) and DSP usage, we also use multipliers implemented in the LUT to replace some DSPs. Our results, implemented on Xilinx Virtex-7 field -programmable gate array (FPGA), demonstrate more than 27% improvement in throughput on 1024bit modular multiplication and more than 70% improvement on 2048 -bit compared to the best previous state-of-the-art references.

关键词： Cryptosystem Large integer arithmetic field programmable gate array Digital signal processing Montgomery modular multiplication

来源：评论

学校读者我要写书评

暂无评论

VLSI Architectures and Hardware Implementation of Ultra Low-Latency and Area-Efficient Pietra-Ricci Index Detector for Spectrum Sensing

引用

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS 2024年第5期71卷 2348-2361页

作者： Pereira, Elivander Judas Tadeu Guimaraes, Dayan Adionel Shrestha, Rahul Inatel Natl Inst Telecommun BR-37540000 Santa Rita do Sapuca Brazil IIT Mandi Sch Comp & Elect Engn Mandi 175005 India

The Pietra-Ricci index detector (PRIDe) has been recently proposed as one of the simplest techniques for centralized, data-fusion cooperative spectrum sensing, attaining robustness against time-varying signal and noise levels, constant false alarm rate, and high detection power. In this paper, we propose the design and implementation of the PRIDe detector, targeting field programmable gate array (FPGA) and application-specific integrated circuit (ASIC) solutions. Novel approaches are proposed for computing the PRIDe's test statistic, including the absolute value of complex quantities, the complex multiplier-accumulator, and the spectrum occupancy decision. The absolute value operation, which is critical to the PRIDe test statistic computational cost, applies the coordinate rotation digital computer (CORDIC) algorithm as a low latency and resource-efficient option. Register transfer level (RTL) and Monte Carlo simulations show that the resulting ultra-low latency PRIDe detector architectures attain no performance loss with respect to floating-point simulations. One of the two proposed ASIC design versions of the PRIDe sensor occupies 34.9% lower area compared to the most area-efficient sensor reported in literature, whereas the other one is $5.7\times$ faster than the fastest state-of-the-art sensor. In a nutshell, the proposed detector architecture delivers the highest area and power efficiencies, considering the scaled values of area-time product (ATP) and power-delay product (PDP) metrics, in comparison to implementations reported to date.

关键词： Cognitive radio coordinate rotation digital computer field programmable gate array application-specific integrated circuit Pietra-Ricci index detector spectrum sensing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：