In this paper we explore a new number system which uses a double base. The representation of the numbers has a very simple geometric interpretation, allowing potentially fast implementation of the basic arithmetic ope...
详细信息
This conference proceeding contains 29 papers. The areas covered are: parallel processingalgorithms and architecture;parallel processing computations and methodologies;matrix processingimplementations;optical signal...
详细信息
ISBN:
(纸本)0892525304
This conference proceeding contains 29 papers. The areas covered are: parallel processingalgorithms and architecture;parallel processing computations and methodologies;matrix processingimplementations;optical signalprocessing technology. The topics discussed include: signalprocessing computations;signal flow graph computing networks;algorithms and architectures in tomography;programmable systolic chips;systolic array processors;and optical signalprocessing systems.
This study develops and evaluates a new VHDL-based performance modeling capability for multiprocessor systems.* The framework for this methodology involved modeling the following system aspects: processor characteriza...
详细信息
ISBN:
(纸本)0819429163
This study develops and evaluates a new VHDL-based performance modeling capability for multiprocessor systems.* The framework for this methodology involved modeling the following system aspects: processor characterization, task modeling, network characterization, and data set size. Initially, all aspects are specified at an abstract) level, and eventually become specified at a detailed level through the process of verification and refinement of design assumptions. Processor characterization involves modeling the processor's speed, instruction set, and memory hierarchy. Task modeling is concerned with the execution time and instruction mix of software tasks within the system. Network characterization models bus protocols, topology, and bandwidths. Data set size refers to how much data is represented by the tokens used in the models. In this study, we applied and evaluated this methodology using both two-dimensional (2D) and three-dimensional (3D) infrared search and track (IRST) algorithms. Two different candidate processors were investigated: IBM PowerPC 604 and Texas Instruments TMS320C80. For the 2D IRST algorithm, the abstract and detailed performance modeling results were obtained far both processors using partitioned data and pipelined algorithmic approaches. For the 3D IRST algorithm, abstract performance models for pipelined and parallelized implementations on the PowerPC were developed. These models examined the feasibility of the implementations, the potential risk areas, and laid the groundwork for detailed performance modeling.
The NRL FLEX processor architecture is designed for real-time radar signalprocessing and was described at last year's conference in the context of its original application, the Point Defense Demonstration Radar p...
详细信息
ISBN:
(纸本)081940943X
The NRL FLEX processor architecture is designed for real-time radar signalprocessing and was described at last year's conference in the context of its original application, the Point Defense Demonstration Radar project at NRL. This paper describes the current status of the processor and the application of the same architecture to a new problem which requires a different board configuration to satisfy a different set of latency and data flow requirements.
A hypothesis H is parametric if every distribution from the process defined by H belongs to a family of distributions characterized by a finite number of parameters; on the other hand, if the distribution can not be d...
详细信息
ISBN:
(纸本)0819422347
A hypothesis H is parametric if every distribution from the process defined by H belongs to a family of distributions characterized by a finite number of parameters; on the other hand, if the distribution can not be defined by a finite number of parameters, the hypothesis is nonparametric. In this paper, we analyze a detector based on the optimum permutation test, applied to nonparametric radar detection which provide good performances without a large computational work, and we compare it with the parametric test and rank test in the Neyman-Pearson sense. The computational complexity of the detector is high and its implementation in real time is difficult, due to the number of operations increase with the factorial of the number of samples. Also, we present an algorithm that reduces the computational work required. We also present the detectability characteristic of the optimum permutation test against rank test and parametric test under Gaussian noise environments and different types of target models (nonfluctuating, Swerling I and Swerling II). The detection probability versus signal-to-noise ratio is estimated by Monte-Carlo simulations for different parameter values (N pulse, M reference samples and false alarm probability Pfa).
Most of the signalprocessing application programs involve computationally intensive iterative steps. In such programs, various failures in the underlying hardware manifest as control-flow errors that affect the relia...
详细信息
ISBN:
(纸本)0819422347
Most of the signalprocessing application programs involve computationally intensive iterative steps. In such programs, various failures in the underlying hardware manifest as control-flow errors that affect the reliability of the computed results. Various techniques have been proposed in the past to detect and recover from such control-flow errors. Unfortunately, all these techniques need either additional hardware or modification of the hardware and are not portable across various platforms. To circumvent these limitations, recently we have developed a high-level control-flow checking approach using assertions (CCA). In CCA, branch-free intervals in a given high-level language program are identified and the entry and exit points of the intervals are fortified through pre-inserted assertions. In this paper we describe an implementation of CCA through a pre-processor that will automatically insert the necessary assertions into a high-level language program. Based on the implementation we study the fault detection capabilities of CCA with the help of fault injection experiments using FERRARI.
An adaptive algorithm and two stage filter structure were developed for adaptive filtering of certain classes of signals that exhibit cyclostationary characteristics. The new modified P-vector algorithm (mPa) eliminat...
详细信息
ISBN:
(纸本)0819422347
An adaptive algorithm and two stage filter structure were developed for adaptive filtering of certain classes of signals that exhibit cyclostationary characteristics. The new modified P-vector algorithm (mPa) eliminates the need for a separate desired signal which is typically required by conventional adaptive algorithms. It is then implemented in a time-sequenced manner to counteract the nonstationary characteristics typically found in certain radar and bioelectromagnetic signals. Initial algorithm testing is performed on evoked responses generated by the visual cortex of the human brain with the objective, ultimately, to transition the results to radar signals. Each sample of the evoked response is modeled as the sum of three uncorrelated signal components, a time-varying mean (M), a noise component (N), and a random jitter component (Q). A two stage single channel time-sequenced adaptive filter structure was developed which improves convergence characteristics by de coupling the time-varying mean component from the `Q' and noise components in the first stage. The EEG statistics must be known a priori and are adaptively estimated from the pre stimulus data. The performance of the two stage mPa time-sequenced adaptive filter approaches the performance for the ideal case of an adaptive filter having a noiseless desired response.
One of the main goals of the STAP-BOY program has been the implementation of a, space-time adaptive processing (STAP) algorithm on graphics processing units (GPUs) with the goal of reducing the processing time. Within...
详细信息
ISBN:
(纸本)9780819472946
One of the main goals of the STAP-BOY program has been the implementation of a, space-time adaptive processing (STAP) algorithm on graphics processing units (GPUs) with the goal of reducing the processing time. Within the context of GPU implementation, we have further developed algorithms that exploit data redundancy inherent in particular STAP applications. Integration of these algorithms with GPU architecture is of primary importance for fast algorithmic processing times. STAP algorithms involve solving a linear system in which the transformation matrix is a covariance matrix. A standard method involves estimating a covariance matrix from a data matrix, computing its Cholesky factors by one of several methods. and then solving the system by substitution. Some STAP applications have redundancy in successive data matrices from which the covariance matrices are formed. For STAP applications in which a data matrix is updated with the addition of a new data row at the bottom and the elimination of the oldest data in the top of the matrix, a sequence of data matrices have multiple rows in common. Two methods have been developed for exploiting this type of data redundancy when computing Cholesky factors. These two methods are referred to as 1) Fast QR factorizations of successive data matrices 2) Fast Cholesky factorizations of successive covariance matrices. We have developed GPU implementations of these two methods. We show that these two algorithms exhibit reduced computational complexity when compared to benchmark algorithms that do not exploit data, redundancy. More importantly, we show that when these algorithmic improvements are optimized for the GPU architecture, the processing times of a GPU implementation of these matrix factorization algorithms may be greatly improved.
In this paper, a new wave front sensor design that utilizes the benefits of image projections is described and analyzed. The projection-based wave front sensor is similar to a Shack-Hartman type wave front sensor, but...
详细信息
ISBN:
(纸本)0819445584
In this paper, a new wave front sensor design that utilizes the benefits of image projections is described and analyzed. The projection-based wave front sensor is similar to a Shack-Hartman type wave front sensor, but uses a correlation algorithm as opposed to a centroiding algorithm to estimate optical tilt. This allows the projection-based wave front sensor to estimate optical tilt parameters while guiding off of point sources and extended objects at very low signal to noise ratios. The implementation of the projection-based wave front sensor is described in detail showings important signalprocessing steps on and off of the focal plane array of the sensor. In this paper the design is tested in simulation for speed and accuracy by processing simulated astro-nomical data. These simulations demonstrate the accuracy of the projection-based wave front sensor and its superior performance to that of the traditional Shack-Hartman wave front sensor. Timing analysis is presented which shows how the collection and processing of image projections is computationally efficient and lends itself to a wave front sensor design that can produce adaptive optical control signals at speeds of up to 500 hz.
This paper presents a general FIR filter architecture utilizing truncated tree multipliers for computation. The average error, maximum error, and variance of error due to truncation are derived for the proposed archit...
详细信息
ISBN:
(纸本)0819445584
This paper presents a general FIR filter architecture utilizing truncated tree multipliers for computation. The average error, maximum error, and variance of error due to truncation are derived for the proposed architecture. A novel technique that reduces the average error of the filter is presented, along with equations for computing the signal-to-noise ratio of the truncation error. A software tool written in Java is described that automatically generates structural VHDL models for specific filters based on this architecture, given parameters such as the number of taps, operand lengths, number of multipliers, and number of truncated columns. We show that a 22.5 % reduction in area can be achieved for a 24-tap filter with 16-bit operands, 4 parallel multipliers, and 12 truncated columns. For this implementation, the average reduction error is only 9.18 x 10(-5) ulps, and the reduction error SNR is only 2.4 dB less than the roundoff SNR of an equivalent filter without truncation.
暂无评论