The lattice Boltzmann method is employed for simulating the binary flow of oxygen/nitrogen mixture passing through a highly dense bed of spherical particles. Simulations are performed based on the latest proposed entr...
详细信息
The lattice Boltzmann method is employed for simulating the binary flow of oxygen/nitrogen mixture passing through a highly dense bed of spherical particles. Simulations are performed based on the latest proposed entropic lattice Boltzmann model for multi-component flows, using the D3Q27 lattice stencil. The curved solid boundary of the particles is accurately treated via a linear interpolation. To lower the total computational cost and time of the simulations, implementation on graphics processing units (GPU) is also presented. Since the workload associated with each iteration is relatively higher than that of conventional 3D LBM simulations, special emphasis is paid in order to obtain the best computational performance on GPUs. Performance gains of one order of magnitude over optimised multi-core CPUs are achieved for the complex flow of interest on Fermi generation GPUs. Moreover, the numerical results for a three-dimensional benchmark flow show excellent agreements with the available analytical data.
In this paper, the author presents an optimisedparallel implementation of a flexible maximum a-posteriori decoder for synchronisation error correcting codes, supporting a very wide range of code sizes and channel con...
详细信息
In this paper, the author presents an optimisedparallel implementation of a flexible maximum a-posteriori decoder for synchronisation error correcting codes, supporting a very wide range of code sizes and channel conditions. On mid-range GPUs the author demonstrates decoding speedups of more than two orders of magnitude over a central processing unit implementation of the same optimisedalgorithm, and more than an order of magnitude over the author's earlier GPU implementation. The prominent challenge is to maintain high parallelisation efficiency over a wide range of code sizes and channel conditions, and different execution hardware. The author ensures this with a dynamic strategy for choosing parallel execution parameters at run-time. They also present a variant that trades off some decoding speed for significantly reduced memory requirement, with no loss to the decoder's error correction performance. The increased throughput of their implementation and its ability to work with less memory allow us to analyse larger codes and poorer channel conditions, and makes practical use of such codes more feasible.
暂无评论