The Simplex Method, the most popular method for solving Linear Programs (LPs), has two major variants. They are the revised method and the standard, or full tableau method. Today, virtually all serious implementations...
The Simplex Method, the most popular method for solving Linear Programs (LPs), has two major variants. They are the revised method and the standard, or full tableau method. Today, virtually all serious implementations are of the revised method because it is more efficient for sparse LPs which are the most common. However, the full tableau method has advantages as well. First, the full tableau can be very effective for dense problems. Second, a full tableau method can easily and effectively be extended to a coarse grained distributed algorithm. While dense problems are uncommon in general, they occur frequently in some important applications such as digital filter design, text categorization, imageprocessing and relaxations of scheduling problems. We implement two full tableau algorithms. The first, a serial implementation, is effective for small to moderately sized dense problems. The second, a simple extension of the first, is a distributed algorithm, which is effective for large problems of all densities. We developed performance models that predict running times per iteration for the serial version of our method, the parallel version of our method and the revised method for problems of different sizes, aspect ratios and densities. We also developed methods for choosing the number of processors to optimize the tradeoff between computation and communication in distributed computations. We tested our algorithms on practical (Netlib) and synthetic problems.
Product requirements often dictate the use of off-the-shelf processors for very fast signal processing applications. Additionally, restrictions on cost, power, or size/weight may preclude the use of specialized vector...
详细信息
ISBN:
(纸本)0780370414
Product requirements often dictate the use of off-the-shelf processors for very fast signal processing applications. Additionally, restrictions on cost, power, or size/weight may preclude the use of specialized vector processors for implementation of the algorithms. We discuss a new method for performing signed parallelprocessing in scalar, off-the-shelf processors for integerized signal processing algorithms. Uniform data precision may be used, but is not required for the method. It is shown that the reduction in execution cycles resulting from this implementation is approximately linear in the size of the registers, divided by the precision required.
A Toeplitz-block-Toeplitz (TBT) matrix is block Toeplitz with Toeplitz blocks. TBT systems of equations arise in 2-D interpolation, 2-D linear prediction and 2-D least-squares deconvolution problems. Although the doub...
详细信息
ISBN:
(纸本)0780370414
A Toeplitz-block-Toeplitz (TBT) matrix is block Toeplitz with Toeplitz blocks. TBT systems of equations arise in 2-D interpolation, 2-D linear prediction and 2-D least-squares deconvolution problems. Although the doubly Toeplitz structure should be exploitable in a fast algorithm, existing fast algorithms only exploit the block Toeplitz structure, not the Toeplitz structure of the blocks. Iterative algorithms can employ the 2-D FFT, but usually take thousands of iterations to converge. We develop a new fast algorithm that assumes a smoothness constraint (described in the text) on the matrix entries. For an M-2 X M-2 TBT matrix with M M x M Toeplitz blocks along each edge, the algorithm requires only O(6M(3)) operations to solve an M-2 X M-2 linear system of equations;parallel computing on 2M processors can be performed on the algorithm as given. Two examples show the operation and performance of the algorithm.
In real speaker verification applications, additive or convolutive noise creates a mismatch between training and recognition environments, degrading performance. parallel Model Combination (PMC) is used successfully t...
详细信息
ISBN:
(纸本)0780370414
In real speaker verification applications, additive or convolutive noise creates a mismatch between training and recognition environments, degrading performance. parallel Model Combination (PMC) is used successfully to improve the noise robustness of Hidden Markov Model (HMM) based speech recognisers [5]. This paper presents the results of applying PMC to compensate for additive noise in HMM-based text-dependent speaker verification. Speech and noise data were obtained from the YOHO [6] and NOISEX-92 databases [131 respectively. Speaker recognition Equal Error Rates (EER) are presented for noise-contaminated speech at different signal-to-noise ratios (SNRs) and different noise sources. For example, average EER for speech in operations room noise at 6dB SNR dropped from approximately 20% un-compensated to less than 5% using PMC. Finally, it is shown that speaker recognition performance is relatively insensitive to the exact value of the parameter that determines the relative amplitudes of the speech and noise components of the PMC model.
In this paper a scheme for efficient system partitioning of computation in wireless sensor networks is presented. Local computation of the sensor data in wireless networks can be highly energy-efficient, because redun...
详细信息
ISBN:
(纸本)0780370414
In this paper a scheme for efficient system partitioning of computation in wireless sensor networks is presented. Local computation of the sensor data in wireless networks can be highly energy-efficient, because redundant communication costs can be reduced. It is important to develop energy-efficient signal processing algorithms to be run at the sensor nodes. This paper presents a technique to optimize system energy by parallelizing computation through the network and by exploiting underlying hooks for power management. By parallelizing computation, the voltage supply level and clock frequency of the nodes can be lowered, which reduces energy dissipation. A 60% energy reduction for a sensor application of source localization is demonstrated. The results are generalized for finding optimal voltage and frequency operating points that lead to minimum system energy dissipation.
Neutron radiography (NR) provides a very efficient tool in the field of non-destructive testing as well as for many applications in fundamental research. A neutron beam penetrating a specimen is attenuated by the samp...
详细信息
Neutron radiography (NR) provides a very efficient tool in the field of non-destructive testing as well as for many applications in fundamental research. A neutron beam penetrating a specimen is attenuated by the sample material and detected by a two-dimensional (2D) imaging device. The image contains information about materials and structure inside the sample because neutrons are attenuated according to the basic law of radiation attenuation. Contrary to Xrays, neutrons can be attenuated by some light materials, as for example, hydrogen and boron, but penetrate many heavy materials. Therefore, NR can yield important information not obtainable by more traditional methods. Nevertheless, there are many aspects of structure, both quantitative and qualitative, that are not accessible from 2D transmission images. Hence, there is an interest in three-dimensional neutron imaging. At the 250 kW TRIGA Mark ii reactor of the Atominstitut in Austria a neutron tomography facility has been installed. The neutron flux at this beam position is 1.3 x 10(5) neutrons/cm(2)s and the beam diameter is 8 cm. For a 3D tomographic reconstruction of the sample interior, transmission images of the object taken from different view angles are required. Therefore, a rotary table driven by a step motor connected to a computerized motion control system has been installed at the sample position. In parallel a suitable electronic imaging device based on a neutron sensitive scintillator screen and a CCD-camera has been designed. It can be controlled by a computer in order to synchronize the software of the detector and of the rotary table with the aim of an automation of measurements. Reasonable exposure times can get as low as 20 s per image. This means that a complete tomography of a sample can be performed within one working day. Calculation of the 3D voxel array is made by using the filtered backprojection algorithm. (C) 2001 Published by Elsevier Science B.V.
A combined Kalman filter (KF) and natural gradient algorithm (NGA) approach is proposed to address the problem of blind source separation (BSS) in time-varying environments, in particular for binary distributed signal...
详细信息
ISBN:
(纸本)0780370414
A combined Kalman filter (KF) and natural gradient algorithm (NGA) approach is proposed to address the problem of blind source separation (BSS) in time-varying environments, in particular for binary distributed signals. In situations where the mixing channel is non-stationary, the performance of NGA is often poor. Typically, in such cases, an adaptive learning rate is used to help NGA track the changes in the environment. The Kalman filter, on the other hand, is the optimal minimum mean square error method for tracking certain non-stationarity. Experimental results are presented, and suggest that the combined approach performs significantly better than NGA in the presence of both continuous and abrupt non-stationarities.
Evolving networks of ad-hoc, wireless sensing nodes rely heavily on the ability to establish position information. The algorithms presented herein rely on range measurements between pairs of nodes and the a priori coo...
详细信息
ISBN:
(纸本)0780370414
Evolving networks of ad-hoc, wireless sensing nodes rely heavily on the ability to establish position information. The algorithms presented herein rely on range measurements between pairs of nodes and the a priori coordinates of sparsely located anchor nodes. Clusters of nodes surrounding anchor nodes cooperatively establish confident position estimates through assumptions, checks, and iterative refinements. Once established, these positions are propagated to more distant nodes, allowing the entire network to create an accurate map of itself. Major obstacles include overcoming inaccuracies in range measurements as great as +/-50%, as well as the development of initial guesses for node locations in clusters with few or no anchor nodes. Solutions to these problems are presented and discussed, using position error as the primary metric. Algorithms are compared according to position error, scalability, and communication and computational requirements. Early simulations yield average position errors of 5% in the presence of both range and initial position inaccuracies.
The binary-swap and the parallel-pipelined methods are two popular image composition methods for volume rendering on distributed memory multicomputers. However, these methods either restrict the number of processors t...
详细信息
ISBN:
(纸本)0769509908
The binary-swap and the parallel-pipelined methods are two popular image composition methods for volume rendering on distributed memory multicomputers. However, these methods either restrict the number of processors to a power of two or require many steps to transform image data that results in high communication overheads. In this paper, we present an efficient image composition method, the rotate-tiling (RT), for parallel volume rendering on distributed memory multicomputers. The RT method can fully utilize all available processors and minimize the communication overheads. In addition, we provide data compression method, the template run-length encoding (TRLE), to further reduce the communication data size. To evaluate the performance of the RT method, we compare the proposed method with the binary-swap method and the parallel-pipelined method. Both theoretical analysis and experimental test are conducted. In the theoretical analysis, we analyze the best performance bound of the RT method in terms of the startup time, the data transmission time, the number of processors, and the number of initial block of a sub-image. In the experimental test, we have implemented these three methods on an SP2 parallel machine. Three volume datasets are used as test samples. The experimental results show that our method outperforms the binary-swap and the parallel-pipelined methods for all test samples and match the results analyzed in the theoretical analysis. For the TRLE method, the experimental results show that the TRLE method can further reduce the composition time for these three methods.
This paper presents techniques for parallel multimedia retrieval by considering an image database as an example. The discussed cluster architecture depicts one possible solution for the performance problem. The distri...
详细信息
This paper presents techniques for parallel multimedia retrieval by considering an image database as an example. The discussed cluster architecture depicts one possible solution for the performance problem. The distribution of the image data over a large number of nodes enables a parallelprocessing of the compute intensive operations for dynamic image retrieval. Thus, the partitioning of the data and the applied strategies for workload balancing have a decisive impact on the performance, efficiency, and the usability of such image databases.
暂无评论