检索结果-内蒙古大学图书馆

34th IEEE International Parallel and Distributed Processing Symposium (IPDPS)

作者： Vaughn, Nathan Wilson, Leighton Krasny, Robert Univ Michigan Dept Math Ann Arbor MI 48109 USA

ISBN: (数字)9781728174457

ISBN: (纸本)9781728174457

We present an MPI + OpenACC implementation of the kernel-independent barycentric Lagrange treecode (BLTC) for fast summation of particle interactions on GPUs. The distributed memory parallelization uses recursive coordinate bisection for domain decomposition and MPI remote memory access to build locally essential trees on each rank. The particle interactions are organized into target batch/source cluster interactions which efficiently map onto the GPU;target batching provides an outer level of parallelism, while the direct sum form of the barycentric particle-cluster approximation provides an inner level of parallelism. The GPU-accelerated BLTC performance is demonstrated on several test cases up to 1 billion particles interacting via the Coulomb potential and Yukawa potential.

关键词： Heterogeneous (hybrid) systems Graphics processors Load balancing and task assignment Interpolation Numerical algorithms Parallel algorithms chebyshev approximation and theory Integral Equations

来源：评论

学校读者我要写书评

暂无评论

Hardware generation of arbitrary random number distributions from uniform distributions via the inversion method

引用

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 2007年第8期15卷 952-962页

作者： Cheung, Ray C. C. Lee, Dong-U Luk, Wayne Villasenor, John D. Univ London Imperial Coll Sci Technol & Med Dept Comp London SW7 2AZ England Univ Calif Los Angeles Dept Elect Engn Los Angeles CA 90095 USA

We present an automated methodology for producing hardware-based random number generator (RNG) designs for arbitrary distributions using the inverse cumulative distribution function (ICDF). The ICDF is evaluated via piecewise polynomial approximation with a hierarchical segmentation scheme that involves uniform segments and segments with size varying by powers of two which can adapt to local function nonlinearities. Analytical error analysis is used to guarantee accuracy to one unit in the last place (ulp). Compact and efficient RNGs that can reach arbitrary multiples of the standard deviation sigma can be generated. For instance, a Gaussian RNG based on our approach for a Xilinx Virtex-4 XC4VLX100-12 field-programmable gate array produces 16-bit random samples up to 8.2 sigma. It occupies 487 slices, 2 block-RAMs, and 2 DSP-blocks. The design is capable of running at 371 MHz and generates one sample every clock cycle.

关键词： algorithms implemented in hardware automatic synthesis chebyshev approximation and theory computer arithmetic elementary function approximation error analysis gate arrays piecewise polynomial approximation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：