Artificial Intelligence has emerged as a transformative technology, revolutionizing numerous industries by enabling advanced automation, predictive analytics, and decision-making capabilities. For that Artificial Inte...
详细信息
Artificial Intelligence has emerged as a transformative technology, revolutionizing numerous industries by enabling advanced automation, predictive analytics, and decision-making capabilities. For that Artificial Inte...
详细信息
ISBN:
(数字)9798331532833
ISBN:
(纸本)9798331532840
Artificial Intelligence has emerged as a transformative technology, revolutionizing numerous industries by enabling advanced automation, predictive analytics, and decision-making capabilities. For that Artificial Intelligence overruns many domains like telecommunication, smart manufacturing industry, autonomous machines, Automated Disease Diagnosis in Medical Imaging, defense, and others. On the other hand, the hardware implementation of Artificial Intelligence comes with certain challenges and constraints, especially in a critical area, which leverages machine learning algorithms and real-time data analysis to optimize production processes and improve overall efficiency. Statistical operations play a crucial role in various machine learning algorithms to understand, process data, or make predictions to optimize models. So, in this work, we developed a high-speed and low-area design and implemented statistical operations for image or signalprocessing using an FPGA Device. To enhance the performance, we develop different hardware architectures based on different levels of parallelism to process the statistical operations to compute the Mean, Variance, and RMS (Root Mean Square). These generic architectures work in parallel/pipeline architectures with and without memory. The proposed architectures implement an FPGA target (Intel/Altera Agilex 7: AGMH039R47A2E1V) using Altera Quartus prime pro edition version 23.4 and achieve an ultra-high throughput with low-area consumption compared to the state-of-art methods. For 480×640 image size, the mean calculation architecture involves 1498 logic registers, 1912 slice LUT, and just 29kbits memory and it operates at a maximum frequency of 406.5MHz. Additionally, for an 8×8 image size, we need 33 clock cycles to achieve the mean calculation and 33+1 clock cycles to complete the variance calculation, compared to other approaches that require more than 64 clock cycles.
The rapid emergence of 5G communications technology and standardization has seen an accelerated transfer of theoretical concepts to advanced development and implementation. Not only are 5G baseband signalprocessing a...
详细信息
The rapid emergence of 5G communications technology and standardization has seen an accelerated transfer of theoretical concepts to advanced development and implementation. Not only are 5G baseband signalprocessingalgorithms becoming more important, but also the co-design and implementation of corresponding circuits, architectures, and platforms are becoming necessary due to rapid standardization of 5G communications. This timely overview paper introduces circuits and systems (CAS) for key enabling technologies for the new 5G era: massive MIMO, mmWave baseband systems, NOMA schemes, advanced channel coding, and so on. The state-of-the-art research progress in these areas is summarized for interested readers to initiate discussion on limitations of existing solutions and open research problems that are looking for innovative solutions, especially in the CAS area. We hope this paper can bridge the gap between the theoretical investigation and application implementation for 5G communications.
Motivated by challenges from today's fast-evolving wireless communication standards and soaring silicon design cost, it is important to design a flexible hardware platform that can be dynamically reconfigured to a...
Motivated by challenges from today's fast-evolving wireless communication standards and soaring silicon design cost, it is important to design a flexible hardware platform that can be dynamically reconfigured to adapt to current operating scenarios, provide seamless handover between different communi- cation networks, and extend the longevity of advanced systems. Moreover, increasingly sophisticated baseband processingalgorithms pose stringent re- quirements of real-time processing for hardware implementations, especially for power-budget limited mobile terminals. With existing hardware platforms such as Application-Specific Integrated Circuits (ASICs), Field-Programmable Gate Arrays (FPGAs), and Digital signal Processors (DSPs), the contradictory design requirements of flexibility, computational performance, and hardware ef- ficiency cannot be attained at the same time. To achieve a balance between the aforementioned design requirements, a coarse-grained dynamically reconfigurable cell array architecture is proposed. The architecture is constructed from an array of heterogeneous function units interconnected through a hierarchical on-chip network. The adopted in-cell configuration scheme enables fast context switching between standards and be- tween computational tasks during run-time. Although cell array is a generic hardware platform, this thesis focuses on the architectural development of the cell array tailored specifically for digital baseband processing of contemporary wireless communication systems. Various degrees of flexibilities among operat- ing scenarios, algorithms, tasks, and supporting standards are exploited. Be- sides, high hardware efficiency is attained by conducting algorithm-architecture, hardware-software, and processing-memory co-design. In this thesis, flexibility, performance and efficiency of the proposed archi- tecture are demonstrated through two case studies. First, the cell array is de- ployed in a digital front-end receiver, aiming t
Fetal electrocardiogram (ECG) extraction from non-invasive biopotential recordings is a long-standing research topic. Despite the significant number of algorithms presented in the scientific literature, it is difficul...
详细信息
Fetal electrocardiogram (ECG) extraction from non-invasive biopotential recordings is a long-standing research topic. Despite the significant number of algorithms presented in the scientific literature, it is difficult to find information about embedded hardware implementations able to provide real-time support for the required features, bridging the gap between theory and practice. This article presents the NInFEA (non-invasive fetal ECG analysis) tool, an embedded hardware/software framework based on the hybrid dual-core OMAP-L137 low-power processor for the real-time evaluation of fetal ECG extraction algorithms. The hybrid platform, including a digital signal processor (DSP) and a general-purpose processor (GPP), allows achieving the best performance compared with single-core architectures. The GPP provides a portable graphical user interface, whereas the DSP is extensively used for advancedsignalprocessing tasks. As a case study, three state-of-the-art fetal ECG extraction algorithms have been ported onto NInFEA, along with some support routines needed to provide the additional information required by the clinicians and supported by the user interface. NInFEA can be regarded both as a reference design for similar applications and as a common embedded low-power testbed for real-time fetal ECG extraction algorithms.
The next generation radar systems have high performance demands on the signalprocessing chain. Examples include the advanced image creating sensor systems in which complex calculations are to be performed on huge set...
详细信息
ISBN:
(纸本)9780769551173
The next generation radar systems have high performance demands on the signalprocessing chain. Examples include the advanced image creating sensor systems in which complex calculations are to be performed on huge sets of data in real time. Manycore architectures are gaining attention as a means to overcome the computational requirements of the complex radar signalprocessing by exploiting massive parallelism inherent in the algorithms in an energy efficient manner. In this paper, we evaluate a manycore architecture, namely a 16-core Epiphany processor, by implementing two significantly large case studies, viz. an autofocus criterion calculation and the fast factorized back-projection algorithm, both key components in modern synthetic aperture radar systems. The implementation results from the two case studies are compared on the basis of achieved performance and programmability. One of the Epiphany implementations demonstrates the usefulness of the architecture for the streaming based algorithm (the autofocus criterion calculation) by achieving a speedup of 8.9x over a sequential implementation on a state-of-the-art general-purpose processor of a later silicon technology generation and operating at a 2.7x higher clock speed. On the other case study, a highly memory-intensive algorithm (fast factorized backprojection), the Epiphany architecture shows a speedup of 4.25x. For embedded signalprocessing, low power dissipation is equally important as computational performance. In our case studies, the Epiphany implementations of the two algorithms are, respectively, 78x and 38x more energy efficient.
The next generation radar systems have high performance demands on the signalprocessing chain. Examples include the advanced image creating sensor systems in which complex calculations are to be performed on huge set...
详细信息
The next generation radar systems have high performance demands on the signalprocessing chain. Examples include the advanced image creating sensor systems in which complex calculations are to be performed on huge sets of data in real time. Many core architectures are gaining attention as a means to overcome the computational requirements of the complex radar signalprocessing by exploiting massive parallelism inherent in the algorithms in an energy efficient manner. In this paper, we evaluate a many core architecture, namely a 16-core Epiphany processor, by implementing two significantly large case studies, viz. an auto focus criterion calculation and the fast factorized back-projection algorithm, both key components in modern synthetic aperture radar systems. The implementation results from the two case studies are compared on the basis of achieved performance and programmability. One of the Epiphany implementations demonstrates the usefulness of the architecture for the streaming based algorithm (the auto focus criterion calculation) by achieving a speedup of 8.9x over a sequential implementation on a state-of-the-art general-purpose processor of a later silicon technology generation and operating at a 2.7x higher clock speed. On the other case study, a highly memory-intensive algorithm (fast factorized back projection), the Epiphany architecture shows a speedup of 4.25x. For embedded signalprocessing, low power dissipation is equally important as computational performance. In our case studies, the Epiphany implementations of the two algorithms are, respectively, 78x and 38x more energy efficient.
Floating-point division is a very costly operation in FPGA designs. High-frequency implementations of the classic digit-recurrence algorithms for division have long latencies (of the order of the number fraction bits)...
详细信息
Floating-point division is a very costly operation in FPGA designs. High-frequency implementations of the classic digit-recurrence algorithms for division have long latencies (of the order of the number fraction bits) and consume large amounts of logic. Additionally, these implementations require important routing resources, making timing closure difficult in complete designs. In this paper we present two multiplier-based architectures for division which make efficient use of the DSP resources in recent Altera FPGAs. By balancing resource usage between logic, memory and DSP blocks, the presented architectures maintain high frequencies is full designs. Additionally, compared to classical algorithms, the proposed architectures have significantly lower latencies. The architectures target faithfully rounded results, similar to most elementary functions implementations for FPGAs but can also be transformed into correctly rounded architectures with a small overhead. The presented architectures are built using the Altera DSP Builder advanced framework and will be part of the default blockset.
The proceedings contain 29 papers. The topics discussed include: optimization of spanning tree adders;estimating adders for a low density parity check decoder;sublinear constant multiplication algorithms;new identitie...
详细信息
ISBN:
(纸本)0819463922
The proceedings contain 29 papers. The topics discussed include: optimization of spanning tree adders;estimating adders for a low density parity check decoder;sublinear constant multiplication algorithms;new identities and transformations for hardware power operators;interconnection scheme for networks of online modules;reconfigurable architecture for the efficient solution of large-scale non-Hermitian eigenvalue problems;high-resolution iris image reconstruction from low-resolution imagery;using mean-squared error to assess visual image quality;time-frequency analysis of classical and quantum noise;application of time-frequency analysis methods to speaker verification;time-frequency decomposition based on information;time-frequency approximations with applications to filtering, modulation, and propagation;and on the development of a high-order texture analysis using the PWD and Rènyl entropy.
Working with the Naval Research Laboratory, Celestech has implemented advanced non-linear hyperspectral image (HSI) processingalgorithms optimized for Graphics processing Units (GPU). These algorithms have demonstrat...
详细信息
ISBN:
(纸本)9780819481597
Working with the Naval Research Laboratory, Celestech has implemented advanced non-linear hyperspectral image (HSI) processingalgorithms optimized for Graphics processing Units (GPU). These algorithms have demonstrated performance improvements of nearly 2 orders of magnitude over optimal CPU-based implementations. The paper briefly covers the architecture of the NIVIDIA GPU to provide a basis for discussing GPU optimization challenges and strategies. The paper then covers optimization approaches employed to extract performance from the GPU implementation of Dr. Bachmann's algorithms including memory utilization and process thread optimization considerations. The paper goes on to discuss strategies for deploying GPU-enabled servers into enterprise service oriented architectures. Also discussed are Celestech's on-going work in the area of middleware frameworks to provide an optimized multi-GPU utilization and scheduling approach that supports both multiple GPUs in a single computer as well as across multiple computers. This paper is a complementary work to the paper submitted by Dr. Charles Bachmann entitled "A Scalable Approach to Modeling Nonlinear Structure in Hyperspectral Imagery and Other High-Dimensional Data Using Manifold Coordinate Representations". Dr. Bachmann's paper covers the algorithmic and theoretical basis for the HSI processing approach.
暂无评论