Based on two-grid discretizations, three kinds of local and parallel finite element algorithms for the stationary Navier-Stokes equations are introduced and discussed. The main technique is first to use a standard fin...
详细信息
Based on two-grid discretizations, three kinds of local and parallel finite element algorithms for the stationary Navier-Stokes equations are introduced and discussed. The main technique is first to use a standard finite element discretization on a coarse grid to approximate low frequencies of the solution, then to apply some linearized discretizations on a fine grid to correct the resulted residual (which contains mostly high frequencies) by some local and parallel procedures. Three approaches to linearization are discussed. Under the uniqueness condition, error estimates of the finite element solution are derived. Numerical results show that among the three kinds of parallel algorithms, the Oseen-linearized algorithm is preferable if we both consider the computational time and the accuracy of the approximate solution. (C) 2010 Elsevier Ltd. All rights reserved.
We consider numeration systems where digits are integers and the base is an algebraic number beta such that vertical bar beta vertical bar > 1 and beta satisfies a polynomial where one coefficient is dominant in a ...
详细信息
We consider numeration systems where digits are integers and the base is an algebraic number beta such that vertical bar beta vertical bar > 1 and beta satisfies a polynomial where one coefficient is dominant in a certain sense. For this class of bases beta, we can find an alphabet of signed-digits on which addition is realizable by a parallel algorithm in constant time. This algorithm is a kind of generalization of the one of Avizienis. We also discuss the question of cardinality of the used alphabet, and we are able to modify our algorithm in order to work with a smaller alphabet. We then prove that beta satisfies this dominance condition if and only if it has no conjugate of modulus 1. When the base beta is the Golden Mean, we further refine the construction to obtain a parallel algorithm on the alphabet {-1, 0, 1}. This alphabet cannot be reduced any more. (C) 2011 Elsevier B.V. All rights reserved.
A Monte Carlo simulation model of thin film growth based on parallel algorithm is presented. Non-smooth substrate with special defect mode is introduced in such a model. The method of regionalizing is used to divide t...
详细信息
A Monte Carlo simulation model of thin film growth based on parallel algorithm is presented. Non-smooth substrate with special defect mode is introduced in such a model. The method of regionalizing is used to divide the substrate into sub-regions. This method is supposed to be modulated according to the defect mode. The effects of surface defect mode and substrate temperature, such as the nucleation ratio and the average island size, are studied through parallel Monte Carlo method. The kinetic process of thin film growth in the defect mode is also discussed. Results show that surface defect mode contributes to crystal nucleation. Analyzing parallel simulation results we find that density defect points, substrate temperature and the number of processors contribute decisively to the parallel efficiency and speedup. According to defect mode we can obtain large grain size more feasibly and the parallel algorithm of this model can guide the non-smooth substrate simulation work. (C) 2011 Elsevier B. V. All rights reserved.
This paper describes in detail a numerical scheme designed for direct numerical simulation (DNS) of turbulent drag reduction. The hybrid spatial scheme includes Fourier spectral accuracy in two directions and sixth-or...
详细信息
This paper describes in detail a numerical scheme designed for direct numerical simulation (DNS) of turbulent drag reduction. The hybrid spatial scheme includes Fourier spectral accuracy in two directions and sixth-order compact finite differences for first and second-order wall-normal derivatives, while time marching can be up to fourth-order accurate. High-resolution and high-drag reduction viscoelastic DNS are made possible through domain decomposition with a two-dimensional MPI Cartesian grid alternatively splitting two directions of space ('pencil' decomposition). The resulting algorithm has been shown to scale properly up to 16384 cores on the Blue Gene/P at IDRIS-CNRS, France. Drag reduction is modeled for the three-dimensional wall-bounded channel flow of a FENE-P dilute polymer solution which mimics injection of heavy-weight flexible polymers in a Newtonian solvent. We present results for four high-drag reduction viscoelastic flows with friction Reynolds numbers Re-tau 0= 180, 395, 590 and 1000, all of them sharing the same friction Weissenberg number We(tau 0) = 115 and the same rheological parameters. A primary analysis of the DNS database indicates that turbulence modification by the presence of polymers is Reynolds-number dependent. This translates into a smaller percent drag reduction with increasing Reynolds number, from 64% at Re-tau 0= 180 down to 59% at Re-tau 0= 1000, and a steeper mean current at small Reynolds number. The Reynolds number dependence is also visible in second-order statistics and in the vortex structures visualized with iso-surfaces of the Q-criterion. (C) 2010 Elsevier Ltd. All rights reserved.
As the sizes of FPGA device grow,the long run-time of the placement is becoming a great challenge for the FPGA design *** annealing is the best-known method applied to this problem due to the good quality of result(...
详细信息
As the sizes of FPGA device grow,the long run-time of the placement is becoming a great challenge for the FPGA design *** annealing is the best-known method applied to this problem due to the good quality of result(QoR),but its computation time seems not *** this paper,we propose a parallel placement algorithm named MPP-SA(Multi-core parallel Placement algorithm based on Simulated Annealing).Our goal is to provide a fast placement algorithm with high ***-SA has the same annealing schedule as the traditional simulated annealing,but it uses the parallel approach to move blocks concurrently by multiple threads that are run on different cores of the same *** ensure the correctness of the results,MPP-SA also uses synchronization technology and lock mechanism,which brings some ***,experiment results show that these overheads have not seriously affected the performance of our algorithm,especial for large *** with the placement algorithm of TPlace in VPR5.0,MPP-SA is able to decrease the run-time of 5 different size benchmark circuits by an average of 32%-42% without losing QoR.
Image thinning is one of important steps of fingerprint preprocessing. Most of fingerprint recognition algorithms checked the characteristic points on thinning image. In this paper, we discover some shortages in OPTA ...
详细信息
ISBN:
(纸本)9783037851579
Image thinning is one of important steps of fingerprint preprocessing. Most of fingerprint recognition algorithms checked the characteristic points on thinning image. In this paper, we discover some shortages in OPTA and mathematical morphology thinning algorithm and find out the reasons for some shortages such as many glitches and snags, defective thinning, and so on. A new improved algorithm is proposed in the paper, which is an ideal algorithm because it is faster, produces less glitch, and thins completely.
parallel algorithms for direct methods of analysis and solution of linear algebra problems with sparse symmetric irregularly structured matrices are considered. The performance of such algorithms is investigated. Uppe...
详细信息
parallel algorithms for direct methods of analysis and solution of linear algebra problems with sparse symmetric irregularly structured matrices are considered. The performance of such algorithms is investigated. Upper estimates of the speedup and efficiency factors are obtained for a parallel algorithm for triangular decomposition of sparse matrices. Some results of numerical experiments carried out on a MIMD computer are given.
The new parallel incremental Support Vector Machine (SVM) algorithm aims at classifying very large datasets on graphics processing units (GPUs). SVM and kernel related methods have shown to build accurate models but t...
详细信息
ISBN:
(纸本)9783540874768
The new parallel incremental Support Vector Machine (SVM) algorithm aims at classifying very large datasets on graphics processing units (GPUs). SVM and kernel related methods have shown to build accurate models but the learning task usually needs a quadratic programming , so that the learning task for large datasets requires big memory capacity and a long time. We extend the recent finite Newton classifier for building a parallel incremental algorithm. The new algorithm uses graphics processors to gain high performance at low cost. Numerical test results on UCI. Delve dataset repositories showed that our parallel incremental algorithm using GPUs is about 45 times faster than a CPU implementation and often significantly over 100 times faster than state-of-the-art algorithms LibSVM, SVM-perf and CB-SVM.
For the purpose of accelerating deblocking filter, which accounts for a significant percentage of H.264/AVC decoding time, some researchers use multi-core platforms to achieve the required performance. We study the pr...
详细信息
ISBN:
(纸本)9781612843490
For the purpose of accelerating deblocking filter, which accounts for a significant percentage of H.264/AVC decoding time, some researchers use multi-core platforms to achieve the required performance. We study the problem under the context of many-core systems. parallelization of deblocking filter on many-core platform is challenging not only because deblocking filter has complicated data dependencies which provides insufficient parallelism for so many cores but also because parallelization may have significant synchronization overhead. We present a new method to exploit the implicit parallelism and reduce the synchronization overhead. We apply our implementation to the deblocking filter of the H.264/AVC reference software JM15.1 on Tile64 platform. The proposed method achieves up to 817%, 604% and 532% speedup for CIF, SD and HD videos compared to the well-known wavefront method using 62 cores, respectively.
For the purpose of accelerating deblocking filter, which accounts for a significant percentage of H.264/AVC decoding time, some studies use wavefront method to achieve the required performance on multi-core platforms....
详细信息
ISBN:
(纸本)9783642178313
For the purpose of accelerating deblocking filter, which accounts for a significant percentage of H.264/AVC decoding time, some studies use wavefront method to achieve the required performance on multi-core platforms. We study the problem under the context of many-core systems and present a new method to exploit the implicit parallelism. We apply our implementation to the deblocking filter of the H.264/AVC reference software JM15.1 on a 64-core TILERA and achieve more than eleven times speedup for 1280*720(HD) videos. Meanwhile the proposed method achieves an overall decoding speedup of 140% for the HD videos. Compared to the wavefront method, we also have a significant speedup 200% for 720*576(SD) videos.
暂无评论