In this paper we consider using 'digit serial' processing to build high performance parallel structures, in particular, parallel signal processors. Digit serial arithmetic processors have digit serial data tra...
详细信息
ISBN:
(纸本)0819406945
In this paper we consider using 'digit serial' processing to build high performance parallel structures, in particular, parallel signal processors. Digit serial arithmetic processors have digit serial data transmission combined with digit serial computation. Three digit serial arithmetic processors are presented and compared with their digit parallel counterparts. We show that by using a digit serial approach we can achieve a higher throughput than with a digit parallel processor, even though the two processors are structurally similar and have components of similar complexity.
We develop a new implementation of coupled-cluster singles and doubles (CCSD) optimized for the most recent graphical processing unit (GPU) hardware. We find that a single node with 8 NvIDIA v100 GPUs is capable of pe...
详细信息
We develop a new implementation of coupled-cluster singles and doubles (CCSD) optimized for the most recent graphical processing unit (GPU) hardware. We find that a single node with 8 NvIDIA v100 GPUs is capable of performing CCSD computations on roughly 100 atoms and 1300 basis functions in less than 1 day. Comparisons against massively parallel implementations of CCSD suggest that more than 64 CPU-based nodes (each with 16 cores) are required to match this performance.
Digital signal Processors (DSPs) have become key components for the implementation of digital signalprocessing systems. With DSPs moving into new application domains and the increasing complexity of modern DSP archit...
详细信息
ISBN:
(纸本)0818679204
Digital signal Processors (DSPs) have become key components for the implementation of digital signalprocessing systems. With DSPs moving into new application domains and the increasing complexity of modern DSP architectures, efficient programming support receives major interest. Therefore, an optimizing compiler becomes a must for future DSP-architectures. Todays DSP compilers result in significant overheads both in memory consumption and program execution time compared to hand-written assembly code. This is mainly due to an inefficient compiler support of the DSP specific architectural features, such as the modulo-addressing capability which is an enabling feature for a large class of DSP algorithms. Within this paper we analyze why existing compilers fail short in supporting the module-addressing mode and present a compiler concept that allows the efficient utilization of this feature. We describe how an advanced compiler optimization strategy allows a near optimum support of the module-addressing mode, and point out why this concept is favorable to DSP-specific language extensions.
Existing approaches to blind channel estimation and deconvolution (equalization) focus exclusively on channel or inverse-channel impulse response estimation. It is well-known that the quality of the deconvolved output...
详细信息
ISBN:
(纸本)0819416207
Existing approaches to blind channel estimation and deconvolution (equalization) focus exclusively on channel or inverse-channel impulse response estimation. It is well-known that the quality of the deconvolved output depends crucially upon the noise statistics also. Typically it is assumed that the noise is white and the signal-to-noise ratio is known. In this paper we remove these restrictions. Both the channel impulse response and the noise model are estimated from the higher-order (fourth, e.g.) cumulant function and the (second-order) correlation function of the received data via a least-squares cumulant/correlation matching criterion. It is assumed that the noise higher-order cumulant function vanishes (e.g., Gaussian noise, as is the case for digital communications). Consistency of the proposed approach is established under certain mild sufficient conditions. The approach is illustrated via simulation examples involving blind equalization of digital communications signals.
The need to construct architectures in vLSI has focused attention on unnormalized floating point arithmetic. Certain unnormalized arithmetics allow one to 'pipe on digits,' thus producing significant speed up ...
详细信息
ISBN:
(纸本)0819406945
The need to construct architectures in vLSI has focused attention on unnormalized floating point arithmetic. Certain unnormalized arithmetics allow one to 'pipe on digits,' thus producing significant speed up in computation and making the input problems of special purpose devices such as systolic arrays easier to solve. We consider the error analysis implications of using unnormalized arithmetic in numerical algorithms. We also give specifications for its implementation. Our discussion centers on the example of Gaussian elimination. We show that the use of unnormalized arithmetic requires change in the analysis of this algorithm. We will show that only for certain classes of matrices that include diagonally dominant matrices (either row or column), Gaussian elimination is as stable in unnormalized arithmetic as in normalized arithmetic. However, if the diagonal elements of the upper triangular matrix are post normalized, then Gaussian elimination is as stable in unnormalized arithmetic as in normalized arithmetic for all matrices.
An efficient algorithm is presented for computing the continuous wavelet transform and the wideband ambiguity function on a sample grid with uniform time spacing but arbitrary sampling in scale. The method is based on...
详细信息
ISBN:
(纸本)0819406945
An efficient algorithm is presented for computing the continuous wavelet transform and the wideband ambiguity function on a sample grid with uniform time spacing but arbitrary sampling in scale. The method is based on the chirp z-transform and requires the same order of computation as constant-bandwidth analysis techniques, such as the short-time Fourier transform and the narrowband ambiguity function. An alternative spline approximation method which is more efficient when the number of scale samples is large is also described.
The next generation radar systems have high performance demands on the signalprocessing chain. Examples include the advanced image creating sensor systems in which complex calculations are to be performed on huge set...
详细信息
ISBN:
(纸本)9780769551173
The next generation radar systems have high performance demands on the signalprocessing chain. Examples include the advanced image creating sensor systems in which complex calculations are to be performed on huge sets of data in real time. Manycore architectures are gaining attention as a means to overcome the computational requirements of the complex radar signalprocessing by exploiting massive parallelism inherent in the algorithms in an energy efficient manner. In this paper, we evaluate a manycore architecture, namely a 16-core Epiphany processor, by implementing two significantly large case studies, viz. an autofocus criterion calculation and the fast factorized back-projection algorithm, both key components in modern synthetic aperture radar systems. The implementation results from the two case studies are compared on the basis of achieved performance and programmability. One of the Epiphany implementations demonstrates the usefulness of the architecture for the streaming based algorithm (the autofocus criterion calculation) by achieving a speedup of 8.9x over a sequential implementation on a state-of-the-art general-purpose processor of a later silicon technology generation and operating at a 2.7x higher clock speed. On the other case study, a highly memory-intensive algorithm (fast factorized backprojection), the Epiphany architecture shows a speedup of 4.25x. For embedded signalprocessing, low power dissipation is equally important as computational performance. In our case studies, the Epiphany implementations of the two algorithms are, respectively, 78x and 38x more energy efficient.
Scale as a physical quantity is a recently developed concept. The scale transform can be viewed as a special case of the more general Mellin transform and its mathematical properties are very applicable in the analysi...
详细信息
Scale as a physical quantity is a recently developed concept. The scale transform can be viewed as a special case of the more general Mellin transform and its mathematical properties are very applicable in the analysis and interpretation of the signals subject to scale changes. A number of single-dimensional applications of scale concept have been made in speech analysis, processing of biological signals, machine vibration analysis and other areas. Recently, the scale transform was also applied in multi-dimensional signalprocessing and used for image filtering and denoising. Discrete implementation of the scale transform can be carried out using logarithmic sampling and the well-known fast Fourier transform. Nevertheless, in the case of the uniformly sampled signals, this implementation involves resampling. An algorithm not involving resampling of the uniformly sampled signals has been derived too. In this paper, a modification of the later algorithm for discrete implementation of the direct scale transform is presented. In addition, similar concept was used to improve a recently introduced discrete implementation of the inverse scale transform. Estimation of the absolute discretisation errors showed that the modified algorithms have a desirable property of yielding a smaller region of possible error magnitudes. Experimental results are obtained using artificial signals as well as signals evoked from the temporomandibular joint. In addition, discrete implementations for the separable two-dimensional direct and inverse scale transforms are derived. Experiments with image restoration and scaling through two-dimensional scale domain using the novel implementation of the separable two-dimensional scale transform pair are presented.
This paper describes two techniques for automatic recognition of surface targets from an airborne platform using an imaging laser radar sensor. The first technique rotates a three-dimensional model of the target in re...
详细信息
This paper describes a cascade decomposition of the generalized sidelobe canceller (GSC) implementation for linearly constrained minimum variance beamformers. The GSC is initially separated into an adaptive interferen...
详细信息
ISBN:
(纸本)0819406945
This paper describes a cascade decomposition of the generalized sidelobe canceller (GSC) implementation for linearly constrained minimum variance beamformers. The GSC is initially separated into an adaptive interference cancellation module followed by a non-adaptive beamformer. We prove that the adaptive interference cancellation module can be decomposed into a cascade of first (or higher) order adaptive interference cancellation modules, where the order corresponds to the number of adaptive degrees of freedom represented in the module. This distributes the computational burden associated with determining the adaptive weights over several lower order problems and facilitates simultaneous implementation of beamformers with differing numbers of adaptive degrees of freedom.
暂无评论