Through the research of the parallel computational model based on the principal and subordinate mode and the basic theory of Gmres algorithm in Krylov subspace, this essay raises a new parallel PCGMRES algorithm which...
详细信息
Through the research of the parallel computational model based on the principal and subordinate mode and the basic theory of Gmres algorithm in Krylov subspace, this essay raises a new parallel PCGMRES algorithm which possesses PC pattern, and shows the computing examples for linear equations. After the comparison with the result from the parallel GMRES (m) algorithm, it shows that this designed parallel algorithm can reduce the iteration frequency, shorten the computing time and obtain better speedup ratio and computing efficiency at the premise of assuring the computation precision.s.
A new cache-efficient algorithm for reduction from block Hessenberg form to Hessenberg form is presented and evaluated. The algorithm targets parallel computers with shared memory. One level of look-ahead in combinati...
详细信息
ISBN:
(纸本)9783642281440;9783642281457
A new cache-efficient algorithm for reduction from block Hessenberg form to Hessenberg form is presented and evaluated. The algorithm targets parallel computers with shared memory. One level of look-ahead in combination with a dynamic load-balancing scheme significantly reduces the idle time and allows the use of coarse-grained tasks. The coarse tasks lead to high-performance computations on each processor/core. Speedups close to 13 over the sequential unblocked algorithm have been observed on a dual quad-core machine using one thread per core.
A generalized eigensystem problem is usually transformed, utilizing Cholesky decomposition, to a standard eigenproblem. The latter is then solved efficiently by a matrix reduction approach based on Householder tridiag...
详细信息
ISBN:
(纸本)9783642310201
A generalized eigensystem problem is usually transformed, utilizing Cholesky decomposition, to a standard eigenproblem. The latter is then solved efficiently by a matrix reduction approach based on Householder tridiagonalization method. We present parallel implementation of an integrated transformation-reduction algorithm on GPU accelerator using CUBLAS. Experimental results clearly demonstrate the potential of data-parallel coprocessors for scientific computations. When comparing against the CPU implementation, the GPU implementations achieve above 16-fold and 26-fold speedups in double precision for reduction and transformation respectively.
This paper proposes a screen partition update method (SPUM) for embedded multi-window systems with the purpose of improving their display performance. In this method the whole screen is partitioned into multiple indep...
详细信息
ISBN:
(纸本)9781467356800
This paper proposes a screen partition update method (SPUM) for embedded multi-window systems with the purpose of improving their display performance. In this method the whole screen is partitioned into multiple independent sub-regions according to the position and size information of application windows at first and the overlap degree of each sub-region is calculated afterwards. Each window has an associated bitmap used to mark which sub-regions on the whole screen are contained by this window and which are not. When one application window updates, sub-regions of this window are updated step by step. In order to reduce the probability of conflict, the free sub-region with bigger overlap degree is updated preferentially. This method increases the probability of parallel update. When we apply the SPUM algorithm into an actual DirectFB graphics system, the total window update time cost is reduced by 35% and the conflict number is decreased by 72% in our experiment. Further experiment shows that with the increase of refresh rate the performance improvement introduced by the algorithm is more notable.
This article presents the parallelization of seismic ray trace algorithm. The chosen Urdaneta's algorithm is shortly described. It provides wavelength dependent smoothing and frequency dependant scattering thanks ...
详细信息
ISBN:
(纸本)9783642314995;9783642315008
This article presents the parallelization of seismic ray trace algorithm. The chosen Urdaneta's algorithm is shortly described. It provides wavelength dependent smoothing and frequency dependant scattering thanks to the implementation of Lomax's method for approximating broad-band wave propagation. It also includes Vinje et al. wavefront propagation technique that provides fairly constant density of rays. Then the parallelized algorithm is preliminarily tested on synthetic data and the results are presented and discussed.
Splines and wavelets have been finding increasing use in the theory of information. Wavelet decompositions are used in designing efficient algorithms for processing (compression) of large information flows. If one suc...
详细信息
Splines and wavelets have been finding increasing use in the theory of information. Wavelet decompositions are used in designing efficient algorithms for processing (compression) of large information flows. If one succeeds in establishing the embeddability of spaces of splines on a sequence of sparsing/refining grids, in representing the chain of embedded spaces as a direct sum of wavelet spaces, and in realizing the base functions with the minimum length of their support, then this suggests a wavelet decomposition of the information flow, leading, in turn, to substantial savings in the computational cost. This being so, it proves possible to resolve the initial information flow into components to single out the principal and refining information flows, depending on the needs. For uniform grids on the real line, wavelet decompositions are well known. In this case, there applies the powerful technique of harmonic analysis, as well as the lifting scheme or the wavelet scheme. However, many applications require considering bounded intervals and nonuniform grids. For example, for efficient compression of nonuniform flows of information (featuring singularities or rapidly fluctuating characteristics), it is expedient to employ an adaptive nonuniform grid, which takes account of the singularities of the flow being processed. This renders possible to improve approximation of functions without complicating the computations. The previously obtained results pertained to splines on infinite grids. Making both the grid and the corresponding numerical flow infinite renders theoretical studies simpler;however, in practice, one has to deal with finite flows. This paper continues the studies initiated for finite-dimensional spaces. The purpose of this work is to built a wavelet decomposition (compression) on a nonuniform grid and develop the corresponding decomposition and reconstruction algorithms for infinite flows (with a grid on an open interval) and finite flows (with a grid o
The transmission power of analog TV transmitter is always measured as visual peak power, that is, the power level reaches while the synchronizing pulses are being transmitted, and so ordinary power meter cannot measur...
详细信息
ISBN:
(纸本)9783642340406
The transmission power of analog TV transmitter is always measured as visual peak power, that is, the power level reaches while the synchronizing pulses are being transmitted, and so ordinary power meter cannot measure the value of analog TV transmitter power. The paper proposes a new measurement method;a parallel algorithm running in FPGA control high-speed AD, which can measure three analog TV RF signal powers simultaneously, and the paper also provides the signal type recognition and corresponding filter method to achieve digital and analog signals compatible.
This paper designs and implements an authentication system of client/server architecture based on voiceprint recognition. The client uses MFCC method to extract feature vectors from the speaker's voice and the ser...
详细信息
ISBN:
(纸本)9781467322379
This paper designs and implements an authentication system of client/server architecture based on voiceprint recognition. The client uses MFCC method to extract feature vectors from the speaker's voice and the server uses VQ method for recognition. We propose a new method of endpoint detection named short-term variance which can effectively pick up the voice signal from the original signal in actual environment. With the improved endpoint detection algorithm, the system can effectively resist the noise and achieve a higher recognition rate. In order to boost the server's performance, we implemented a new parallel algorithm for VQ codeword search on the SMP system. By using this method, the server improved the processor load rate and the speed of operation, as well as reduced the system response time. In the experiment, we evaluated the system recognition accuracy and efficiency of the VQ parallel algorithms.
The methods of the tolerance ellipsoidal estimation for the tasks of synthesis of the tolerances to parameters of radio-electronic circuits and possibility of its parallelization are considered. These methods are the ...
详细信息
The methods of the tolerance ellipsoidal estimation for the tasks of synthesis of the tolerances to parameters of radio-electronic circuits and possibility of its parallelization are considered. These methods are the result of the task of estimation the solutions of an interval system of linear algebraic equations (ISLAE) which is built according to given criteria of optimality. The numerical algorithm is proposed for solving the tolerance ellipsoidal estimation tasks with a possibility of parallelization.
In this paper, a new block interface domain decomposition method (BI-DDM) with non-overlapping subdomains for the numerical solution of a two-dimensional convection-diffusion equation is presented. The block interface...
详细信息
In this paper, a new block interface domain decomposition method (BI-DDM) with non-overlapping subdomains for the numerical solution of a two-dimensional convection-diffusion equation is presented. The block interface formulation is derived from the idea of using small groups of a certain number of mesh points where this group is treated explicitly similar to the way a single point is treated in the point method. The BI-DDM is incorporated with a correction phase which is able to economize further on the computing cost. The performance analysis of this method on several recently developed group iterative schemes implemented on a message-passing architecture are presented and discussed.
暂无评论