The next generation radar systems have high performance demands on the signalprocessing chain. Examples include the advanced image creating sensor systems in which complex calculations are to be performed on huge set...
详细信息
ISBN:
(纸本)9780769551173
The next generation radar systems have high performance demands on the signalprocessing chain. Examples include the advanced image creating sensor systems in which complex calculations are to be performed on huge sets of data in real time. Manycore architectures are gaining attention as a means to overcome the computational requirements of the complex radar signalprocessing by exploiting massive parallelism inherent in the algorithms in an energy efficient manner. In this paper, we evaluate a manycore architecture, namely a 16-core Epiphany processor, by implementing two significantly large case studies, viz. an autofocus criterion calculation and the fast factorized back-projection algorithm, both key components in modern synthetic aperture radar systems. The implementation results from the two case studies are compared on the basis of achieved performance and programmability. One of the Epiphany implementations demonstrates the usefulness of the architecture for the streaming based algorithm (the autofocus criterion calculation) by achieving a speedup of 8.9x over a sequential implementation on a state-of-the-art general-purpose processor of a later silicon technology generation and operating at a 2.7x higher clock speed. On the other case study, a highly memory-intensive algorithm (fast factorized backprojection), the Epiphany architecture shows a speedup of 4.25x. For embedded signalprocessing, low power dissipation is equally important as computational performance. In our case studies, the Epiphany implementations of the two algorithms are, respectively, 78x and 38x more energy efficient.
The next generation radar systems have high performance demands on the signalprocessing chain. Examples include the advanced image creating sensor systems in which complex calculations are to be performed on huge set...
详细信息
The next generation radar systems have high performance demands on the signalprocessing chain. Examples include the advanced image creating sensor systems in which complex calculations are to be performed on huge sets of data in real time. Many core architectures are gaining attention as a means to overcome the computational requirements of the complex radar signalprocessing by exploiting massive parallelism inherent in the algorithms in an energy efficient manner. In this paper, we evaluate a many core architecture, namely a 16-core Epiphany processor, by implementing two significantly large case studies, viz. an auto focus criterion calculation and the fast factorized back-projection algorithm, both key components in modern synthetic aperture radar systems. The implementation results from the two case studies are compared on the basis of achieved performance and programmability. One of the Epiphany implementations demonstrates the usefulness of the architecture for the streaming based algorithm (the auto focus criterion calculation) by achieving a speedup of 8.9x over a sequential implementation on a state-of-the-art general-purpose processor of a later silicon technology generation and operating at a 2.7x higher clock speed. On the other case study, a highly memory-intensive algorithm (fast factorized back projection), the Epiphany architecture shows a speedup of 4.25x. For embedded signalprocessing, low power dissipation is equally important as computational performance. In our case studies, the Epiphany implementations of the two algorithms are, respectively, 78x and 38x more energy efficient.
This paper proposes two efficient architectures for hardware implementation of the advanced Encryption Standard (AES) algorithm. The composite field arithmetic for implementing SubBytes (S-box) and InvSubBytes (Invers...
详细信息
This paper proposes two efficient architectures for hardware implementation of the advanced Encryption Standard (AES) algorithm. The composite field arithmetic for implementing SubBytes (S-box) and InvSubBytes (Inverse S-box) transformations investigated by several authors is used as the basis for deriving the proposed architectures. The first architecture for encryption is based on optimized S-box followed by bit-wise implementation of MixColumns and AddRoundKey and optimized Inverse S-box followed by bit-wise implementation of InvMixColumns and AddMixRoundKey for decryption. The proposed S-box and Inverse S-box used in this architecture are designed as a cascade of three blocks. In the second proposed architecture, the block iii of the proposed S-box is combined with the MixColumns and AddRoundKey transformations forming an integrated unit for encryption. An integrated unit for decryption combining the block iii of the proposed InvSubBytes with InvMixColumns and AddMixRoundKey is formed on similar lines. The delays of the proposed architectures for VLSI implementation are found to be the shortest compared to the state-of-the-art implementations of AES operating in non-feedback mode. Iterative and fully unrolled sub-pipelined designs including key schedule are implemented using FPGA and ASIC. The proposed designs are efficient in terms of Kgates/Giga-bits per second ratio compared with few recent state-of-the-art ASIC (0.18-mu m CMOS standard cell) based designs and throughput per area (TPA) for FPGA implementations.
Floating-point division is a very costly operation in FPGA designs. High-frequency implementations of the classic digit-recurrence algorithms for division have long latencies (of the order of the number fraction bits)...
详细信息
Floating-point division is a very costly operation in FPGA designs. High-frequency implementations of the classic digit-recurrence algorithms for division have long latencies (of the order of the number fraction bits) and consume large amounts of logic. Additionally, these implementations require important routing resources, making timing closure difficult in complete designs. In this paper we present two multiplier-based architectures for division which make efficient use of the DSP resources in recent Altera FPGAs. By balancing resource usage between logic, memory and DSP blocks, the presented architectures maintain high frequencies is full designs. Additionally, compared to classical algorithms, the proposed architectures have significantly lower latencies. The architectures target faithfully rounded results, similar to most elementary functions implementations for FPGAs but can also be transformed into correctly rounded architectures with a small overhead. The presented architectures are built using the Altera DSP Builder advanced framework and will be part of the default blockset.
The book shows how the various paradigms of computational intelligence, employed either singly or in combination, can produce an effective structure for obtaining often vital information from ECG signals. The text is ...
ISBN:
(纸本)0857298674
The book shows how the various paradigms of computational intelligence, employed either singly or in combination, can produce an effective structure for obtaining often vital information from ECG signals. The text is self-contained, addressing concepts, methodology, algorithms, and case studies and applications, providing the reader with the necessary background augmented with step-by-step explanation of the more advanced concepts. It is structured in three parts: Part I covers the fundamental ideas of computational intelligence together with the relevant principles of data acquisition, morphology and use in diagnosis; Part II deals with techniques and models of computational intelligence that are suitable for signalprocessing; andPart iii details ECG system-diagnostic interpretation and knowledge acquisition architectures. Illustrative material includes: brief numerical experiments; detailed schemes, exercises and more advanced problems.
advanced error control coding and signalprocessing techniques find wide applications in various communication systems, such as magnetic recording channels, fiber optical channels, wireline and wireless communication ...
advanced error control coding and signalprocessing techniques find wide applications in various communication systems, such as magnetic recording channels, fiber optical channels, wireline and wireless communication systems. Low-density parity-check (LDPC) codes and multiple-multiple-output (MIMO) technology have been receiving a lot of attention, since they greatly increase the capacity and improve the performance of future communication systems. In this dissertation, we focus on designing algorithms that enable efficient hardware implementations of LDPC codes and MIMO detection systems. Quasi-cyclic (QC) LDPC codes are of great interest since their regular code structure leads to efficient hardware implementations. We propose and implement in FPGA two partly parallel decoder architectures for QC LDPC codes to improve the decoding throughput and memory requirement of existing decoders. Our over-lapped message passing (OMP) decoder achieves the maximum throughput gain and hardware utilization efficiency (HUE) due to overlapping, hence has higher throughput and HUE than previously proposed OMP decoders while maintaining the same hardware requirements and the same error performance. We also show that the maximum throughput gain and HUE achieved by our OMP decoder are ultimately determined by the given code. Thus, we propose a coset-based construction method, which results in QC LDPC codes that allow our optimal OMP decoder to achieve higher throughput and HUE. To further reduce the memory requirement of our OMP decoder, we propose the parallel turbo-sum-product (PTSP) decoder architecture. Implementation results show that our PTSP decoder achieves better error performance, faster convergence and hence higher throughput than the OMP decoder with reduced memory requirement. Hardware implementations of tree search based MIMO detection often have limited performance due to large memory requirement or high computational complexity of sophisticated MIMO detection algorithm
In this paper, we present an arithmetic sum-of-products (SOP) based realization of the general Multiple Constant Multiplication (MCM) algorithm. We also propose an enhanced SOP based algorithm, which uses Partial Max-...
详细信息
ISBN:
(纸本)9781424481927
In this paper, we present an arithmetic sum-of-products (SOP) based realization of the general Multiple Constant Multiplication (MCM) algorithm. We also propose an enhanced SOP based algorithm, which uses Partial Max-SAT (PMSAT) to further optimize the SOP. The enhanced algorithm attempts to reduce the number of rows (partial products) of the SOP, by i) shifting coefficients to realize other coefficients when possible, ii) exploring multiple implementations of each coefficient using a Minimal Signed Digit (MSD) format and iii) exploiting the mutual exclusiveness within certain groups of partial products. Hardware implementations of the Fast Fourier Transform (FFT) algorithm require the incoming data to be multiplied by one of several constant coefficients. We test/validate it for FFT, which is an important problem. We compare our SOP-based architectures with the best existing implementation of MCM for FFT (which utilizes a cascade of adders), and show that our approaches show a significant improvement in area and delay. Our architecture was synthesized using 65nm technology libraries.
The proceedings contain 29 papers. The topics discussed include: GPU implementations for fast factorizations of STAP covariance matrices;accelerating nonuniform fast Fourier transform via reduction in memory access la...
ISBN:
(纸本)9780819472946
The proceedings contain 29 papers. The topics discussed include: GPU implementations for fast factorizations of STAP covariance matrices;accelerating nonuniform fast Fourier transform via reduction in memory access latency;fast computation of local correlation coefficients;object tracking in omni-directional mosaic;3D object matching on the GPU using spin-image surface matching;a sharpness metric implementation for image processing applications with feedback;superresolution imaging: a survey of current techniques;analytical approximations of translational subpixel shifts in signal and image registrations;simultaneous position and number of source estimates using random set theory;decision fusion in sensor networks for spectrum sensing based on likelihood ratio tests;and energy optimization for upstream data transfer in 802.15.4 beacon-enabled star formulation.
Analytical approximations of translational subpixel shifts in both signal and image registrations are derived by setting the derivatives of a normalized cross correlation function to zero and solving them. Without the...
详细信息
ISBN:
(纸本)9780819472946
Analytical approximations of translational subpixel shifts in both signal and image registrations are derived by setting the derivatives of a normalized cross correlation function to zero and solving them. Without the need of iterative searching, this methods achieves a complexity of only O(mn), given an image size of m x n. Without the need to upsample, computation memory is also saved. Tests using simulated signals and images show good results.
暂无评论