The next generation radar systems have high performance demands on the signalprocessing chain. Examples include the advanced image creating sensor systems in which complex calculations are to be performed on huge set...
详细信息
ISBN:
(纸本)9780769551173
The next generation radar systems have high performance demands on the signalprocessing chain. Examples include the advanced image creating sensor systems in which complex calculations are to be performed on huge sets of data in real time. Manycore architectures are gaining attention as a means to overcome the computational requirements of the complex radar signalprocessing by exploiting massive parallelism inherent in the algorithms in an energy efficient manner. In this paper, we evaluate a manycore architecture, namely a 16-core Epiphany processor, by implementing two significantly large case studies, viz. an autofocus criterion calculation and the fast factorized back-projection algorithm, both key components in modern synthetic aperture radar systems. The implementation results from the two case studies are compared on the basis of achieved performance and programmability. One of the Epiphany implementations demonstrates the usefulness of the architecture for the streaming based algorithm (the autofocus criterion calculation) by achieving a speedup of 8.9x over a sequential implementation on a state-of-the-art general-purpose processor of a later silicon technology generation and operating at a 2.7x higher clock speed. On the other case study, a highly memory-intensive algorithm (fast factorized backprojection), the Epiphany architecture shows a speedup of 4.25x. For embedded signalprocessing, low power dissipation is equally important as computational performance. In our case studies, the Epiphany implementations of the two algorithms are, respectively, 78x and 38x more energy efficient.
The next generation radar systems have high performance demands on the signalprocessing chain. Examples include the advanced image creating sensor systems in which complex calculations are to be performed on huge set...
详细信息
The next generation radar systems have high performance demands on the signalprocessing chain. Examples include the advanced image creating sensor systems in which complex calculations are to be performed on huge sets of data in real time. Many core architectures are gaining attention as a means to overcome the computational requirements of the complex radar signalprocessing by exploiting massive parallelism inherent in the algorithms in an energy efficient manner. In this paper, we evaluate a many core architecture, namely a 16-core Epiphany processor, by implementing two significantly large case studies, viz. an auto focus criterion calculation and the fast factorized back-projection algorithm, both key components in modern synthetic aperture radar systems. The implementation results from the two case studies are compared on the basis of achieved performance and programmability. One of the Epiphany implementations demonstrates the usefulness of the architecture for the streaming based algorithm (the auto focus criterion calculation) by achieving a speedup of 8.9x over a sequential implementation on a state-of-the-art general-purpose processor of a later silicon technology generation and operating at a 2.7x higher clock speed. On the other case study, a highly memory-intensive algorithm (fast factorized back projection), the Epiphany architecture shows a speedup of 4.25x. For embedded signalprocessing, low power dissipation is equally important as computational performance. In our case studies, the Epiphany implementations of the two algorithms are, respectively, 78x and 38x more energy efficient.
Floating-point division is a very costly operation in FPGA designs. High-frequency implementations of the classic digit-recurrence algorithms for division have long latencies (of the order of the number fraction bits)...
详细信息
Floating-point division is a very costly operation in FPGA designs. High-frequency implementations of the classic digit-recurrence algorithms for division have long latencies (of the order of the number fraction bits) and consume large amounts of logic. Additionally, these implementations require important routing resources, making timing closure difficult in complete designs. In this paper we present two multiplier-based architectures for division which make efficient use of the DSP resources in recent Altera FPGAs. By balancing resource usage between logic, memory and DSP blocks, the presented architectures maintain high frequencies is full designs. Additionally, compared to classical algorithms, the proposed architectures have significantly lower latencies. The architectures target faithfully rounded results, similar to most elementary functions implementations for FPGAs but can also be transformed into correctly rounded architectures with a small overhead. The presented architectures are built using the Altera DSP Builder advanced framework and will be part of the default blockset.
Analytical approximations of translational subpixel shifts in both signal and image registrations are derived by setting the derivatives of a normalized cross correlation function to zero and solving them. Without the...
详细信息
ISBN:
(纸本)9780819472946
Analytical approximations of translational subpixel shifts in both signal and image registrations are derived by setting the derivatives of a normalized cross correlation function to zero and solving them. Without the need of iterative searching, this methods achieves a complexity of only O(mn), given an image size of m x n. Without the need to upsample, computation memory is also saved. Tests using simulated signals and images show good results.
Some image processing applications require an image meet a quality metric before processing it. If an image is so degraded that it is difficult or impossible to reconstruct, the input image may be discarded. In this p...
详细信息
ISBN:
(纸本)9780819472946
Some image processing applications require an image meet a quality metric before processing it. If an image is so degraded that it is difficult or impossible to reconstruct, the input image may be discarded. In this paper, we present a metric that measures the relative sharpness with respect to a reference image frame. The reference frame may be a previous input image or an output frame from the system. The sharpness metric is based on analyzing edges. The assumption of this problem is that input images are similar to each other in terms of observation angle and time.
The quantification of synchrony is important for the study of large-scale interactions in the brain. Current synchrony measures depend on the energy of the signals rather than the phase, and cannot be reliably used as...
详细信息
ISBN:
(纸本)9780819472946
The quantification of synchrony is important for the study of large-scale interactions in the brain. Current synchrony measures depend on the energy of the signals rather than the phase, and cannot be reliably used as measures of neural synchrony. Moreover, the current methods are insufficient since they are limited to pairs of signals. These approaches cannot quantify the synchrony across a group of electrodes and over time-varying frequency regions. In this paper, we propose two new measures for quantifying the synchrony between both pairs and groups of electrodes using time-frequency analysis. The proposed measures are applied to electroencephalogram (EEG) data to quantify neural synchrony.
One of the main goals of the STAP-BOY program has been the implementation of a, space-time adaptive processing (STAP) algorithm on graphics processing units (GPUs) with the goal of reducing the processing time. Within...
详细信息
ISBN:
(纸本)9780819472946
One of the main goals of the STAP-BOY program has been the implementation of a, space-time adaptive processing (STAP) algorithm on graphics processing units (GPUs) with the goal of reducing the processing time. Within the context of GPU implementation, we have further developed algorithms that exploit data redundancy inherent in particular STAP applications. Integration of these algorithms with GPU architecture is of primary importance for fast algorithmic processing times. STAP algorithms involve solving a linear system in which the transformation matrix is a covariance matrix. A standard method involves estimating a covariance matrix from a data matrix, computing its Cholesky factors by one of several methods. and then solving the system by substitution. Some STAP applications have redundancy in successive data matrices from which the covariance matrices are formed. For STAP applications in which a data matrix is updated with the addition of a new data row at the bottom and the elimination of the oldest data in the top of the matrix, a sequence of data matrices have multiple rows in common. Two methods have been developed for exploiting this type of data redundancy when computing Cholesky factors. These two methods are referred to as 1) Fast QR factorizations of successive data matrices 2) Fast Cholesky factorizations of successive covariance matrices. We have developed GPU implementations of these two methods. We show that these two algorithms exhibit reduced computational complexity when compared to benchmark algorithms that do not exploit data, redundancy. More importantly, we show that when these algorithmic improvements are optimized for the GPU architecture, the processing times of a GPU implementation of these matrix factorization algorithms may be greatly improved.
Modulation filtering is a technique for filtering slowly-varying envelopes of frequency subbands of a non-stationary signal, ideally without affecting the signal's phase and fine-structure. Coherent modulation fil...
详细信息
ISBN:
(纸本)9780819472946
Modulation filtering is a technique for filtering slowly-varying envelopes of frequency subbands of a non-stationary signal, ideally without affecting the signal's phase and fine-structure. Coherent modulation filtering is a potentially more effective subtype of such techniques where subband envelopes are determined through demodulation of the subband signal with a coherently detected subband carrier. In this paper we propose a coherent modulation filtering technique based on detecting the instantaneous frequency of a subband from its time-frequency representation. We show that coherent modulation filtering imposes a new bandlimiting constraint on the modulation product plus the ability to recover arbitrarily chosen envelopes and carriers from their modulation product. We show that a carrier estimate based on the time-varying spectral center-of-gravity satisfies the bandlimiting condition as well as Loughlin's previously derived bandlimiting constraint on the instantaneous frequency of carrier. These bandwidth constraints lead to effective and distortion-free modulation filters, offering new approaches for potential signal modification. The spectral center-of-gravity does not satisfy the condition on arbitrary recovery, however, which somewhat limits the flexibility of coherent modulation filtering. Demonstrations are provided with speech signals.
Spin-image surface matching is a technique for locating objects in a scene by processing three-dimensional surface information from sources such as light detection and ranging (LIDAR), structured light photography, an...
详细信息
ISBN:
(纸本)9780819472946
Spin-image surface matching is a technique for locating objects in a scene by processing three-dimensional surface information from sources such as light detection and ranging (LIDAR), structured light photography, and tomography. It is attractive for parallel processing on graphics processing units (GPUs) because the two main computational steps matching pairs of spin-images by correlation, and matching pairs of points between model and scene - are explicitly parallel. By implementing these parallel computations on the GPU, as well as recasting serial portions of the algorithm into a parallel form and structuring the algorithm to limit data exchanges between host and GPU, this project achieved an overall speedup of 20 times or more compared to conventional serial processing. A demonstration application has been developed that allows users to select among a set of models and scenes and then applies the spin-image surface matching algorithm to match the selected models to the scene. It also has several user interface controls for changing parameters. One new parameter is a geometric consistency ratio (GCR) that quantifies the matching performance and provides a measure for discarding low-quality matches. By toggling between GPU- and host-based processing, the application demonstrates the speedup achieved with parallelization on the GPU.
Measurement of EEG event-related potential (ERP) data has been most commonly undertaken in the time-domain, which can be complicated to interpret when separable activity overlaps in time. When the overlapping activity...
详细信息
ISBN:
(纸本)9780819472946
Measurement of EEG event-related potential (ERP) data has been most commonly undertaken in the time-domain, which can be complicated to interpret when separable activity overlaps in time. When the overlapping activity has distinct frequency characteristics, however, time-frequency (TF) signalprocessing techniques can be useful. The current report utilized ERP data from a cognitive task producing typical feedback-related negativity (FRN) and P300 ERP components which overlap in time. TF transforms were computed using the binomial reduced interference distribution (RID), and the resulting TF activity was then characterized using principal components analysis (PCA). Consistent with previous work, results indicate that the FRN was more related to theta activity (3-7 Hz) and P300 more to delta activity (below 3 Hz). At the same time, both time-domain measures were shown to be mixtures of TF theta and delta activity, highlighting the difficulties with overlapping activity. The TF theta and delta measures, on the other hand, were largely independent from each other, but also independently indexed the feedback stimulus parameters investigated. Results support the view that TF decomposition can greatly improve separation of overlapping EEG/ERP activity relevant to cognitive models of performance monitoring.
暂无评论