检索结果-内蒙古大学图书馆

Programmable Metasurface Hybrid MIMO Beamforming: Channel Estimation, Data Transmission, and System implementations at 28 GHz

引用

IEEE SYSTEMS JOURNAL 2023年第1期17卷 1270-1281页

作者： Li, Yueheng Long, Xueyun de Oliveira, Lucas Giroto Eisenbeis, Joerg Alabd, Mohamad Basim Bettinga, Sven Wan, Xiang Cui, Tie Jun Zwick, Thomas Karlsruhe Inst Technol D-76131 Karlsruhe Germany Southeast Univ Nanjing 12579 Jiangsu Peoples R China

With the development of wireless communication generations, multiple-input and multiple-output (MIMO) architectures have proven to be a solution to the higher required data rates for the increasing number of mobile users. As a matter of fact, better earnings can be gathered from a low-cost system with effective functions, which results in the hybrid beamforming architecture combining digital chains with large-scale antenna arrays. The programmable metasurface (PM) is a promising antenna array concept that realizes reconfigurable beamforming. It is raised to be an attractive antenna array architecture due to its low price and power consumption. However, experimental investigations combining PMs with modern wireless communication systems have not been intensively studied. Its challenges and difficulties of signal processing and system implementation remain uncertainties to be discovered. In this article, a PM hybrid MIMO beamforming system including the aforementioned important topics is presented as follows. First of all, the hybrid beamforming channel estimation algorithm adapted for PM is created by merging analog beam training and digital interleaved orthogonal frequency division multiplexing. Afterward, data transmissions based on the estimated channel state information utilizes variant signal recovery methods to examine the channel estimation accuracy and system feasibilities. To practically analyze the proposed system, the aforementioned aspects are implemented into system-level experiments using three PMs operating at 28 GHz for downlink wireless communication in both single-user and multiuser scenarios. Proper results are delivered, which successfully prove the PM hybrid MIMO beamforming system functionalities.

关键词： Array signal processing MIMO communication Channel estimation signal processing algorithms Antenna arrays Training Symbols Antenna arrays communication systems signal processing

来源：评论

学校读者我要写书评

暂无评论

WAFER-SCALE INTEGRATION AND 2-LEVEL PIPELINED implementations OF SYSTOLIC ARRAYS

引用

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 1984年第1期1卷 32-63页

作者： KUNG, HT LAM, MS Department of Computer Science Carnegie-Mellon University Pittsburgh Pennsylvania 15213 USA

Two important issues in systolic array designs are addressed: How is fault tolerance provided in systolic arrays to enhance the yield of wafer-scale integration implementations? And, how are efficient systolic arrays with two levels of pipelining designed? (The first level refers to the pipelined organization of the array at the cellular level, and the second refers to the pipelined functional units inside the cells.) The fault-tolerant scheme proposed replaces defective cells with clocked delays. This has the distinct characteristic that data can flow through the array with faulty cells at the original clock speed. It is shown that both the defective cells under this fault-tolerant scheme and the second-level pipeline stages can simply be modeled as additional delays in the data paths of “generic” systolic designs. The mathematical notion of a cut is introduced to solve the problem of how to allow for these extra delays while preserving the correctness of the original systolic array designs. The results obtained by applying these techniques are encouraging. When applied to systolic arrays without feedback cycles, the arrays can tolerate large numbers of failures (with the addition of very little hardware) while maintaining the original throughput. Furthermore, all of the pipeline stages in the cells can be kept fully utilized through the addition of a small number of delay registers. However, adding delays to systolic arrays with cycles typically induces a significant decrease in throughput. In response to this, a new class of systolic algorithms has been derived in which the data cycle around a ring of processing cells. The systolic ring architecture has the property that its performance degrades gracefully as cells fail. Use of the cut theory and ring architectures for arrays with feedback gives effective fault-tolerant and two-level pipelining schemes for most systolic arrays. As a side effect of developing the ring architecture approach, several new systolic al

关键词： INTEGRATED CIRCUIT MANUFACTURE

来源：评论

学校读者我要写书评

暂无评论

Operation Merging for Hardware implementations of Fast Polar Decoders

引用

JOURNAL OF signal processing SYSTEMS FOR signal IMAGE AND VIDEO TECHNOLOGY 2019年第9期91卷 995-1007页

作者： Ercan, Furkan Tonnellier, Thibaud Condo, Carlo Gross, Warren J. McGill Univ Dept Elect & Comp Engn Montreal PQ Canada McGill Univ ISIP Lab Montreal PQ Canada Huawei Paris Res Ctr Commun Algorithms Design Team Boulogne France

Polar codes are a class of linear block codes that provably achieves channel capacity. They have been selected as a coding scheme for the control channel of enhanced mobile broadband (eMBB) scenario for 5(th) generation wireless communication networks (5G) and are being considered for additional use scenarios. As a result, fast decoding techniques for polar codes are essential. Previous works targeting improved throughput for successive-cancellation (SC) decoding of polar codes are semi-parallel implementations that exploit special maximum-likelihood (ML) nodes. In this work, we present a new fast simplified SC (Fast-SSC) decoder architecture. Compared to a baseline Fast-SSC decoder, our solution is able to reduce the memory requirements. We achieve this through a more efficient memory utilization, which also enables to execute multiple operations in a single clock cycle. Finally, we propose new special node merging techniques that improve the throughput further, and detail a new Fast-SSC-based decoder architecture to support merged operations. The proposed decoder reduces the operation sequence requirement by up to 39%, which enables to reduce the number of time steps to decode a codeword by 35%. ASIC implementation results with 65 nm TSMC technology show that the proposed decoder has a throughput improvement of up to 31% compared to previous Fast-SSC decoder architectures.

关键词： Polar codes Wireless communications Successive cancellation decoding Throughput 5G

来源：评论

学校读者我要写书评

暂无评论

Novel FPGA implementations of Walsh-Hadamard transforms for signal processing

Novel FPGA implementations of Walsh-Hadamard transforms for ...

引用

作者： Amira, A. Bouridane, A. Milligan, P. Roula, M. School of Computer Science Queen's University Belfast BT7 1NN United Kingdom

The paper describes two approaches Suitable for a field-programmable gate-array (FPGA) implementation of fast Walsh-Hadamard transforms. These transforms are important in many signal-processing applications including speech compression, filtering and coding. Two novel architectures for the fast Hadamard transforms using both a systolic architecture and distributed arithmetic techniques are presented. The first approach uses the Baugh-Wooley multiplication algorithm for a systolic architecture implementation. The second approach is based on both a distributed arithmetic ROM and accumulator structure, and a sparse matrix-factorisation technique. implementations of the algorithms on a Xilinx FPGA board are described. The distributed arithmetic approach exhibits better performances when compared with the systolic architecture approach.

关键词： Field programmable gate arrays

来源：评论

学校读者我要写书评

暂无评论

Generating High-Performance Number Theoretic Transform implementations for Vector architectures

Generating High-Performance Number Theoretic Transform Imple...

引用

IEEE High Performance Extreme Computing Virtual Conference (HPEC)

作者： Zhang, Naifeng Ebel, Austin Neda, Negar Brinich, Patrick Reynwar, Benedict Schmidt, Andrew G. Franusich, Mike Johnson, Jeremy Reagen, Brandon Franchetti, Franz Carnegie Mellon Univ Pittsburgh PA 15213 USA New York Univ New York NY USA Drexel Univ Philadelphia PA 19104 USA USC Informat Sci Marina Del Rey CA USA SpiralGen Inc Pittsburgh PA USA

ISBN: (纸本)9798350308600

Fully homomorphic encryption (FHE) offers the ability to perform computations directly on encrypted data by encoding numerical vectors onto mathematical structures. However, the adoption of FHE is hindered by substantial overheads that make it impractical for many applications. Number theoretic transforms (NTTs) are a key optimization technique for FHE by accelerating vector convolutions. Towards practical usage of FHE, we propose to use SPIRAL, a code generator renowned for generating efficient linear transform implementations, to generate high-performance NTT on vector architectures. We identify suitable NTT algorithms and translate the dataflow graphs of those algorithms into SPIRAL's internal mathematical representations. We then implement the entire workflow required for generating efficient vectorized NTT code. In this work, we target the Ring processing Unit (RPU), a multi-tile long vector accelerator designed for FHE computations. On average, the SPIRAL-generated NTT kernel achieves a 1.7x speedup over naive implementations on RPU, showcasing the effectiveness of our approach towards maximizing performance for NTT computations on vector architectures.

关键词： Fully homomorphic encryption number theoretic transform SPIRAL code generation vectorization

来源：评论

学校读者我要写书评

暂无评论

Interactive optimization and massively parallel implementations of video compression algorithms

Interactive optimization and massively parallel implementati...

引用

International Conference on Image processing (ICIP-96)

作者： Nicolas, H Jordan, F ECOLE POLYTECH FED LAUSANNE CRAY RESCH-1015 LAUSANNESWITZERLAND

ISBN: (纸本)0780332598

This paper presents a new technique which allows interactive optimization of video compression algorithms using massively parallel computers such as the CRAY T3D. This work aims to exploit as much as possible the parallel nature of digital image processing algorithms to obtain almost real-time computing with the flexibility of a software implementation. Thanks to this low computation time, interactive tools have been developed which allow easy and fast visual evaluation of image quality. This leads to significant productivity gain when developing new video compression techniques. Our approach has been validated on advanced region-based video compression algorithms. The interactive facilities offered by the proposed technique permit the accurate optimization of the algorithm parameters in few minutes, where several days were previously needed. Depending on the complexity of the compression algorithms, 8-12 images are compressed, decompressed and visualized per second.

关键词： Image compression

来源：评论

学校读者我要写书评

暂无评论

DESIGN and implementations SERIES VI

引用

IEEE COMMUNICATIONS MAGAZINE 2010年第12期48卷 74-75页

作者： Loreto, Salvatore Moore, Sean NomadicLab at Ericsson Research Finland Centripetal Networks Inc.

Welcome to the sixth installment of the Design and Implementation (D&I) Series. The D&I Series was created with a specific goal of addressing the needs of the Communications Society's industry members by p... 详细信息

关键词： Special issues and sections Testing Design methodology Web services IP networks

来源：评论

学校读者我要写书评

暂无评论

SCALABLE PARALLEL implementations OF PERCEPTUAL GROUPING ON CONNECTION MACHINE CM-5 12

SCALABLE PARALLEL IMPLEMENTATIONS OF PERCEPTUAL GROUPING ON ...

引用

Conference C on signal processing and Conference D on Parallel Computing, at the 12th IAPR International Conference on Pattern Recognition

作者： PRASANNA, VK WANG, CL UNIV SO CALIF DEPT EE SYSTLOS ANGELESCA 90089

ISBN: (纸本)0818662751

Perceptual grouping is a key step in vision to organize image data into structural hypotheses to be used for high level analysis. In this paper, we propose data allocation and load balancing strategies which reduce the communication cost and evenly distribute the grouping operations among the processors. These techniques result in scalable algorithms for performing perceptual grouping on CM-5. The performance of our algorithms depends only on the total grouping operations generated by the image data and is independent of the distribution of the data among the processors. Our implementations show that given a 1K × 1K input image, extraction of line segments and several perceptual grouping steps can be performed in 5.0 seconds using a partition of CM-5 having 32 processing nodes. A serial implementation of these steps on a Sun Sparc 400 takes more than 2 minutes. © 1994 IEEE.

关键词： Parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Generic High-Speed Design with Low-Area implementations of Statistical Operations Based on an FPGA Device 26

Generic High-Speed Design with Low-Area Implementations of S...

引用

26th International Symposium on Symbolic and Numeric algorithms for Scientific Computing, SYNASC 2024

作者： Chehaitly, Mouhamad Querol, Jorge Vummadisetty, Praveen Chatzinotas, Symeon University of Luxembourg SnT Luxembourg Luxembourg

ISBN: (纸本)9798331532833

Artificial Intelligence has emerged as a transformative technology, revolutionizing numerous industries by enabling advanced automation, predictive analytics, and decision-making capabilities. For that Artificial Intelligence overruns many domains like telecommunication, smart manufacturing industry, autonomous machines, Automated Disease Diagnosis in Medical Imaging, defense, and others. On the other hand, the hardware implementation of Artificial Intelligence comes with certain challenges and constraints, especially in a critical area, which leverages machine learning algorithms and real-time data analysis to optimize production processes and improve overall efficiency. Statistical operations play a crucial role in various machine learning algorithms to understand, process data, or make predictions to optimize models. So, in this work, we developed a high-speed and low-area design and implemented statistical operations for image or signal processing using an FPGA Device. To enhance the performance, we develop different hardware architectures based on different levels of parallelism to process the statistical operations to compute the Mean, Variance, and RMS (Root Mean Square). These generic architectures work in parallel/pipeline architectures with and without memory. The proposed architectures implement an FPGA target (Intel/Altera Agilex 7: AGMH039R47A2E1V) using Altera Quartus prime pro edition version 23.4 and achieve an ultra-high throughput with low-area consumption compared to the state-of-art methods. For 480×640 image size, the mean calculation architecture involves 1498 logic registers, 1912 slice LUT, and just 29kbits memory and it operates at a maximum frequency of 406.5MHz. Additionally, for an 8×8 image size, we need 33 clock cycles to achieve the mean calculation and 33+1 clock cycles to complete the variance calculation, compared to other approaches that require more than 64 clock cycles. © 2024 IEEE.

关键词： Digital storage

来源：评论

学校读者我要写书评

暂无评论

Creating C++ IP for high performance hardware implementations of FFTs

Creating C++ IP for high performance hardware implementation...

引用

DesignCon 2010

作者： Takach, Andres C-based Design Mentor Graphics Corporation United States OSCI's Synthesis Working Group United States

ISBN: (纸本)9781617385469

The increasing level of circuit integration is enabling the use of more complex digital signal processing algorithms in modern applications. The fast Fourier transform (FFT) is one the most widely used signal processing algorithms. There is no single hardware implementation that fits all needs and in fact higher performance FFTs are used in applications as it becomes feasible to do so. This paper presents a family of register based architectures that can be obtained from a highly parameterizable C++ specification. The size, numerical precision, radix algorithm and parallelism of both computation and I/O transfers are parameterized in the specification. Optimized RTL targeted to a specific ASIC or FPGA technology can be generated from that specification using a high-level synthesis (HLS) flow. The generated RTL is then verified against the original C++ specification using an automated verification environment.

关键词： Specifications

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：