检索结果-内蒙古大学图书馆

A parallel implementation of the fast multipole method for Maxwell's equations

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS 2003年第8期43卷 839-864页

作者： Havé, P Univ Paris 06 Lab Jacques Louis Lions F-75252 Paris 05 France

It is well known that the resolution of Maxwell equations may provide large dense matrices, being thus a computer intensive problem. Even small problems require a huge amount of memory to manipulate matrices during the O(N-3) involved operations. The fast multipole method enables to compress and approximate matrices. Coupled with an iterative resolution of the linear system the complexity reduces to O(N-iter N log N) operations. In order to use multiprocessors machine and to reduce computation times, we propose here a parallel implementation of the fast multiple method. This article relates our first results, as well as the difficulties encountered. Copyright (C) 2003 John Wiley Sons, Ltd.

关键词： fast multipole method Maxwell integral formulation parallel implementation

来源：评论

学校读者我要写书评

暂无评论

Experimental parallel implementation of a wavelet-based still image encoder

引用

MICROPROCESSORS AND MICROSYSTEMS 2005年第4期29卷 155-167页

作者： Haapala, K Lappalainen, V Hämäläinen, TD Tampere Univ Technol Inst Digital & Comp Syst FIN-33720 Tampere Finland CSC Sci Comp Ltd FIN-02101 Espoo Finland Nokia Res Ctr FIN-33101 Tampere Finland

A still image encoder implementation is presented for a multi-DSP system called PARNEU, which has previously been developed for neural network and signal processing applications. The core of the implementation is based on experimental mappings of discrete wavelet transform (DWT) on the parallel processor architecture. PARNEU has a flexible interconnection network architecture with message passing, which allows adding more processing units (PUS) to the system whenever more computational power is needed. Program code can be written to adapt to the number of PUs. This is utilized in the presented encoder implementation with emphasis on load balancing among processors as well as on balance between communication and computation. Performance of the implementation is measured with a scaleable number of processors and compared to a sequential reference implementation. Results show that the DWT phase can be efficiently parallelized on PARNEU with 95.6% of its time spent on true parallel computation. The overall speedup with four processors is 2.25, which could be improved by further optimization of an adaptive scanning phase of the encoder. (C) 2004 Elsevier B.V. All rights reserved.

关键词： wavelet transform image coding parallel implementation

来源：评论

学校读者我要写书评

暂无评论

An area-efficient design of reconfigurable S-box for parallel implementation of block ciphersπ

引用

IEICE ELECTRONICS EXPRESS 2016年第11期13卷 20160138-20160138页

作者： Yang Jinjiang Ge Wei Cao Peng Yang Jun Southeast Univ Natl ASIC Syst Engn Res Ctr Nanjing 210096 Jiangsu Peoples R China

A LUT with Hierarchical Structure (HS-LUT) is proposed in this paper to realize the unique nonlinear component, Substitution Box (S-box), of the block ciphers. Different types of S-boxes are analyzed and four important features of them are summarized. Then, custom 4R/1W memory is proposed as the storage unit of the reconfigurable S-box, and an example set of block ciphers is put forward to describe how to achieve a satisfactory structure of reconfigurable S-box. The proposed HS-LUT is applicable for different sets of ciphers and it is implemented under TSMC 40 nm CMOS technology to compare with similar work. The comparison result shows that the proposed HS-LUT gains 6.88% to 51.76% area efficiency improvement.

关键词： reconfigurable S-box block cipher parallel implementation area cost

来源：评论

学校读者我要写书评

暂无评论

Doppler Keystone Transform: An Approach Suitable for parallel implementation of SAR Moving Target Imaging

引用

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS 2008年第4期5卷 573-577页

作者： Li, Gang Xia, Xiang-Gen Peng, Ying-Ning Tsinghua Univ Dept Elect Engn Beijing 100084 Peoples R China Univ Delaware Dept Elect & Comp Engn Newark DE 19716 USA

In this letter, a synthetic aperture radar (SAR) data reformatting approach named Doppler Keystone transform (DKT) is proposed to correct the range migration of a moving target. By using the DKT, the SAR imaging program, i.e., the 2-D matched filtering, can be transformed into separate 1-D operations along range or azimuth direction, and therefore, the DKT is suitable for the parallel implementation of SAR imaging of the moving target. Our simulations show that by combining the DKT and the Doppler phase compensation methods, the moving target can be well imaged in high signal-clutter-ratio case.

关键词： Moving target imaging parallel implementation synthetic aperture radar (SAR)

来源：评论

学校读者我要写书评

暂无评论

A structure-time parallel implementation of spike-based deep learning

引用

NEURAL NETWORKS 2019年 113卷 72-78页

作者： Wu, Xi Wang, Yixuan Tang, Huajin Yan, Rui Sichuan Univ Coll Comp Sci Neuromorph Comp Res Ctr Chengdu 610065 Sichuan Peoples R China

Motivated by the recent progress of deep spiking neural networks (SNNs), we propose a structure-time parallel strategy based on layered structure and one-time computation over a time window to speed up the prominent spike-based deep learning algorithm named broadcast alignment. Furthermore, a well-designed deep hierarchical model based on the parallel broadcast alignment is proposed for object recognition. The parallel broadcast alignment achieves a significant 137 x speedup compared to its original implementation on MNIST dataset. The object recognition model achieves higher accuracy than that of the latest spiking deep convolutional neural networks on the ETH-80 dataset. The proposed parallel strategy and the object recognition model will facilitate both the simulation of deep SNNs for studying spiking neural dynamics and also the applications of spike-based deep learning in real-world problems. (C) 2019 Elsevier Ltd. All rights reserved.

关键词： Neuromorphic computing parallel implementation Spike-based deep learning Deep spiking neural networks

来源：评论

学校读者我要写书评

暂无评论

A parallel implementation of the p-version of the finite element method

引用

SIAM JOURNAL ON SCIENTIFIC COMPUTING 1996年第5期17卷 1040-1067页

作者： Zhu, YM Katz, IN Washington University St. Louis MO 63130 United States

An iterative method based on the textured decomposition (TD) is developed in order to solve the systems of linear equations arising in the p-version of the finite element method. The iteration is used to implement the p-version in parallel on an MIMD computer NCUBE/six. The objectives are twofold: to achieve high computational efficiency (that is, computational load should be balanced among the processors) and simultaneously to achieve rapid convergence. A supereIement, consisting of four adjacent rectangular finite elements, is constructed for two-dimensional problems. Based on the structural property of the shape functions, each supereIement is partitioned into three blocks in two different ways, and a two-leaf TD is used. Computations for a superelement associated with each leaf are assigned to two processors and are performed in parallel. A new preconditioner is introduced to accelerate convergence in a preconditioned textured decomposition (PTD). A special local communication strategy is used to avoid global assembly and global communication. Two model problems including a Laplace equation on a rectangular domain with a near singular solution and a Poisson equation on an L-shaped domain, are solved. The conjugate gradient (CG) method, the TD method, the recursive textured decomposition (RTD) method, both with and without preconditioning, and the classical iterative methods (Jacobi, Gauss-Seidel (GS), successive overrelaxation (SOR)), are used to solve both model problems. Load balance, speedup ratio, and spectral radii of the various iterations are studied. The test results indicate that recursive PTD with a local communication strategy gives at least a 30% improvement in computational time over the other methods.

关键词： p-version of the finite element method parallel implementation

来源：评论

学校读者我要写书评

暂无评论

A parallel implementation of NEC for the analysis of large structures

引用

IEEE TRANSACTIONS ON ELECTROMAGNETIC COMPATIBILITY 2003年第2期45卷 177-188页

作者： Rubinstein, A Rachidi, F Rubinstein, M Reusser, B Swiss Fed Inst Technol Power Syst Lab CH-1015 Lausanne Switzerland Univ Appl Sci Western Switzerland CH-1401 Yverdon Switzerland Def Procurement Agcy NEMP Lab CH-3700 Spiez Switzerland

We present a new, parallel version of the numerical electromagnetics code (NEC). The parallelization is based on a bidimensional block-cyclic distribution of matrices on a rectangular processor grid, assuring a theoretically optimal load balance among the processors. The code is portable to any platform supporting message passing parallel environments such as message passing interface and parallel virtual machine, where it could even be executed on heterogeneous clusters of computers running on different operating systems. The developed parallel NEC was successfully implemented on two parallel supercomputers featuring different architectures to test portability. Large structures containing up to 24000 segments, which exceeds currently available computer resources were successfully executed and timing and memory results are presented. The code is applied to analyze the penetration of electromagnetic fields inside a vehicle. The computed results are validated using other numerical methods and experimental data obtained using a simplified model of a vehicle (consisting essentially of the body shell) illuminated by an electromagnetic pulse (EMP) simulator.

关键词： EMC electromagnetic pulse (EMP) simulator large structures numerical electromagnetics code (NEC) parallel implementation vehicle

来源：评论

学校读者我要写书评

暂无评论

Total FETI domain decomposition method and its massively parallel implementation

引用

ADVANCES IN ENGINEERING SOFTWARE 2013年 60-61卷 14-22页

作者： Kozubek, T. Vondrak, V. Mensik, M. Horak, D. Dostal, Z. Hapla, V. Kabelikova, P. Cermak, M. VSB Tech Univ Ostrava Ostrava 70833 Czech Republic

We describe an efficient massively parallel implementation of our variant of the FETI type domain decomposition method called Total FETI with a lumped preconditioner. A special attention is paid to the discussion of several variants of parallelization of the action of the projections to the natural coarse grid and to the effective regularization of the stiffness matrices of the subdomains. Both numerical and parallel scalability of the proposed TFETI method are demonstrated on a 2D elastostatic benchmark up to 314,505,600 unknowns and 4800 cores. The results are also important for implementation of scalable algorithms for the solution of nonlinear contact problems of elasticity by TFETI based domain decomposition method. (C) 2013 Civil-Comp Ltd and Elsevier Ltd. All rights reserved.

关键词： Domain decomposition method FETI parallel implementation Matrix regularization Coarse problem Scalability

来源：评论

学校读者我要写书评

暂无评论

A parallel implementation of ALFISH: simulating hydrological compartmentalization effects on fish dynamics in the Florida Everglades

引用

SIMULATION MODELLING PRACTICE AND THEORY 2005年第1期13卷 55-76页

作者： Immanuel, A Berry, MW Gross, LJ Palmer, M Wang, DL Univ Tennessee Dept Comp Sci Knoxville TN 37996 USA Univ Tennessee Dept Ecol & Evolutionary Biol Inst Environm Modeling Knoxville TN 37996 USA

A landscape modeling system called the Across Trophic-Level System Simulation (or ATLSS) has been developed in an effort to project the consequences of proposed water regulation plans for restoration of the South Florida Everglades. The ATLSS Landscape Fish Model (ALFISH) is a component of the ATLSS package (written in C++), which is used to provide dynamic measures of the spatially-explicit food resources available to wading birds, namely fish. The original (serial) ALFISH model requires as much as 30 h for 31-year simulations of specified scenarios. The model's execution time has been successfully improved (by a factor of 4.5) by partitioning its data input and executing the model simultaneously (in parallel) on those partitions. This paper demonstrates how the model's communications between partitioned data can be blocked to simulate compartmentalization effects on the input data. Minimal effects (below 1%) on the output of the original (serial) version are demonstrated. Regarding portability, both models (serial and parallel) have been successfully executed on two different computing environments: an SMP (Symmetric Multi-Processor) with 14 processors and a 14-processor network cluster. (C) 2004 Elsevier B.V. All rights reserved.

关键词： computational ecology parallel implementation spatially-explicit simulation symmetric multi-processor cluster network of workstations data parallelism load balancing

来源：评论

学校读者我要写书评

暂无评论

An and/or-parallel implementation of AKL

引用

NEW GENERATION COMPUTING 1996年第1期14卷 31-52页

作者： Montelius, J Ali, KAM Swedish Institute of Computer Science SICS Kista Sweden

The Agents Kernel Language (AKL) is a general purpose concurrent constraint language. It combines the programming paradigms of search-oriented languages such as Prolog and process-oriented languages such as GHC. The paper is focused on three essential issues in the parallel implementation of AKL for shared-memory multiprocessors: how to maintain multiple binding environments, how to represent the execution state and how to distribute work among workers. A simple scheme is used for maintaining multiple binding environments. A worker will immediately see conditional bindings placed on variables, all workers will have a coherent view of the constraint stores. A locking scheme is used that entails little overhead for operations on local variables. The goals in a guard are represented in a way that allows them to be inserted and removed without any locking. Continuations are used to represent sequences of untried goals. The representation keeps the granularity of work more coarse. Available work is distributed among workers in such a way that hot-spots are avoided. And- and or-tasks are distributed and scheduled in a uniform way.

关键词： logic programming concurrent language parallel implementation binding scheme

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：