检索结果-内蒙古大学图书馆

A Scalable parallel Algorithm for 3-D Magnetotelluric Finite Element Modeling in Anisotropic Media

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 2022年 60卷 1页

作者： Zhu, Xiaoxiong Liu, Jie Cui, Yian Gong, Chunye Natl Univ Def Technol Sci & Technol Parallel & Distributed Proc Lab Lab Software Engn Complex Syst Changsha 410073 Peoples R China Cent South Univ Sch Geosci & Info Phys Changsha 410083 Peoples R China

3-D magnetotelluric (MT) forward modeling has always been faced with the problems of high memory requirements and long computing time. In this article, we design a scalable parallel algorithm for 3-D MT finite element modeling in anisotropic media. The parallel algorithm is based on the distributed mesh storage, including multiple parallel granularities, and is implemented through multiple tools. Message-passing interface (MPI) is used to exploit process parallelisms for subdomains, frequencies, and solving equations. Thread parallelisms for merge sorting, element analysis, matrix assembly, and imposing Dirichlet boundary conditions are developed by Open Multi-Processing (OpenMP). We validate the algorithm through several model simulations and study the effects of topography and conductivity anisotropy on apparent resistivities and phase responses. Scalability tests are performed on the Tianhe-2 supercomputer to analyze the parallel performance of different parallel granularities. Three parallel direct solvers Supernodal LU (SUPERLU), MUltifrontal Massively parallel sparse direct Solver (MUMPS), and parallel Sparse matriX package (PASTIX) are compared in solving sparse systems of equations. As a result, reasonable parallel parameters are suggested for practical applications. The developed parallel algorithm is proven to be efficient and scalable.

关键词： Finite element analysis Mathematical model parallel algorithms Memory management Sparse matrices Conductivity Computational modeling Conductivity anisotropy finite element method (FEM) magnetotelluric (MT) parallel algorithm

来源：评论

学校读者我要写书评

暂无评论

Microwave radiation from slant cut cylindrical antennas - Modeling an experiment

引用

IEEE TRANSACTIONS ON MAGNETICS 1998年第5期34卷 2712-2715页

作者： Vollaire, C Nicolas, L Connor, KA Salon, SJ Ruth, BG Libelo, LF UPRESA CEGELY CNRS 5005 F-69131 Ecully France Rensselaer Polytech Inst Troy NY 12180 USA USA Res Lab Adelphi MD 20783 USA

A series of Vlasov-type high power microwave launchers mere investigated with several slant-cut angles. Finite element analysis using parallel computation was performed on a cluster of workstations and compared with low-power measurements made on a variety of such antennas. Good agreement between the main features of the radiation patterns were observed. However, not all details were reproduced.

关键词： microwave antennas Finite Element methods parallel algorithms distributed memory systems

来源：评论

学校读者我要写书评

暂无评论

Speeding Localization of Pulsed Signal Transitions Using Multicore Processors

引用

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT 2011年第5期60卷 1588-1593页

作者： Barford, Lee Agilent Technol Measurement Res Lab Reno NV 89503 USA

Microprocessor clock rates-which for three decades doubled about every 18 months-have essentially stopped increasing. Instead, the number of processor cores (identical processing units capable of all usual microprocessor functions) in a microprocessor is increasing exponentially with time. In order to increase performance as the number of cores increase, a measurement analysis software will have to take advantage of this parallelism. The objectives of this paper are to study one example of a measurement analysis having serial dependencies among the input data and to show that there is a practical parallel algorithm despite the data dependencies within the measured time series. The measurement analysis studied is transition localization in digital signals. A parallel scan-type algorithm is presented. The results of applying the parallel algorithm on both synthetic data and actual measured data are presented, and the speedup obtained on a twenty-four core computer analyzed. The parallel method produces exactly the same measurement results, bit for bit, as the original serial method. It is argued that what is desired for this and many other measurement processing algorithms is scalability in throughput with number of cores. Such scalability is achieved by the proposed algorithm, with throughput up to about a dozen cores.

关键词： parallel algorithms parallel programming pulse measurements signal analysis timing jitter

来源：评论

学校读者我要写书评

暂无评论

A Novel Low-Complexity and parallel Algorithm for DCT IV Transform and Its GPU Implementation

引用

APPLIED SCIENCES-BASEL 2024年第17期14卷 7491页

作者： Chiper, Doru Florin Dobrea, Dan Marius Gheorghe Asachi Tech Univ Fac Elect Telecommun & Informat Technol Iasi 700506 Romania Tech Sci Acad Romania ASTR Iasi 700050 Romania Acad Romanian Scientists AOSR Bucharest 030167 Romania

This study proposes a novel factorization method for the DCT IV algorithm that allows for breaking it into four or eight sections that can be run in parallel. Moreover, the arithmetic complexity has been significantly reduced. Based on the proposed new algorithm for DCT IV, the speed performance has been improved substantially. The performance of this algorithm was verified using two different GPU systems produced by the NVIDIA company. The experimental results show that the novel proposed DCT algorithm achieves an impressive reduction in the total processing time. The proposed method is very efficient, improving the algorithm speed by more than 4-times-that was expected by segmenting the DCT algorithm into four sections running in parallel. The speed improvements are about five-times higher-at least 5.41 on Jetson AGX Xavier, and 10.11 on Jetson Orin Nano-if we compare with the classical implementation (based on a sequential approach) of DCT IV. Using a parallel formulation with eight sections running in parallel, the improvement in speed performance is even higher, at least 8.08-times on Jetson AGX Xavier and 11.81-times on Jetson Orin Nano.

关键词： parallel algorithms discrete trigonometric transforms DCT-IV GPU

来源：评论

学校读者我要写书评

暂无评论

Implementation of a finite element and absorbing boundary conditions package on a par allel shared memory computer

引用

IEEE TRANSACTIONS ON MAGNETICS 1998年第5期34卷 3343-3346页

作者： Vollaire, C Nicolas, L Ecole Cent Lyon CEGELY UPRESA CNRS 5005 F-69131 Ecully France

A nodal-based finite element formulation coupled with absorbing boundary conditions has been developed to solve open boundary microwave problems, Only parallel computation enables to modelize large devices. We show in this paper how the code has been implemented on a parallel shared memory computer. Each step of the code is analyzed. Two types of storage for the matrix and two preconditioning methods for the conjugate gradient algorithm are particularly compared.

关键词： finite element methods parallel algorithms shared memory systems electromagnetic scattering

来源：评论

学校读者我要写书评

暂无评论

PLASMA PARTICLE SIMULATIONS ON THE MARK-III HYPERCUBE

引用

MATHEMATICAL AND COMPUTER MODELLING 1988年第C期11卷 53-54页

作者： LIEWER, PC DECYK, VK DAWSON, JD FOX, GC CALTECH JET PROP LABPASADENACA 91109 UNIV CALIF LOS ANGELES DEPT PHYSLOS ANGELESCA 90024 CALTECH DIV PHYS MATH & ASTROPASADENACA 91125

Plasma particle simulations are used extensively for the study of nonlinear phenomena in both space and laboratory plasmas. Here, a well-benchmarked plasma simulation code has been implemented on the 32-node JPL Mark III hypercube to study the applicability of parallel architecture to particle simulation models. In the sequential version of the code, about 90% of the computation time is spent updating the particle positions and velocities. When implemented in parallel on the Mark III Hypercube, this part of the code was sped up by a factor of about 27 (83% efficiency). Computation times on the Mark III have also been compared with times on a variety of other computers.

关键词： Plasma particle simulation parallel processing parallel algorithms hypercubes

来源：评论

学校读者我要写书评

暂无评论

COARSE MESH PARTITIONING FOR TREE-BASED AMR

引用

SIAM JOURNAL ON SCIENTIFIC COMPUTING 2017年第5期39卷 C364-C392页

作者： Burstedde, Carsten Holke, Johannes Rhein Friedrich Wilhelms Univ Bonn INS D-53115 Bonn Germany Rhein Friedrich Wilhelms Univ Bonn HCM D-53115 Bonn Germany

In tree-based adaptive mesh refinement, elements are partitioned between processes using a space-filling curve. The curve establishes an ordering between all elements that derive from the same root element, the tree. When representing complex geometries by connecting several trees, the roots of these trees form an unstructured coarse mesh. We present an algorithm to partition the elements of the coarse mesh such that (a) the fine mesh can be load-balanced to equal element counts per process regardless of the element-to-tree map, and (b) each process that holds fine mesh elements has access to the meta data of all relevant trees. As an additional feature, the algorithm partitions the meta data of relevant ghost (halo) trees as well. We develop in detail how each process computes the communication pattern for the partition routine without handshaking and with minimal data movement. We demonstrate the scalability of this approach on up to 917e3 MPI ranks and 371e9 coarse mesh elements, measuring run times of one second or less.

关键词： adaptive mesh refinement coarse mesh mesh partitioning parallel algorithms forest of octrees high-performance computing

来源：评论

学校读者我要写书评

暂无评论

A fine-grained loop-level parallel approach to efficient fuzzy community detection in complex networks

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2020年第5期32卷 e5537-e5537页

作者： Munoz-Caro, Camelia Nino, Alfonso Reyes, Sebastian Univ Castilla La Mancha Escuela Super Informat Paseo Univ 4 Ciudad Real 13004 Spain

Determining the inner organizational structure of sets of networked elements is of paramount importance to analyze real-world systems such as social, biological, or economic networks. To such an end, it is necessary to identify communities of interrelated nodes within the networks. Recently, a fuzzy community detection approach based on the minimization of a topological error functional has been proposed in the form of a gradient-based algorithm design pattern. However, the intrinsic quadratic algorithmic complexity of the procedure limits the problem size that can be efficiently treated. Here, we extend the ability of this approach to analyze larger networks resorting to parallelism. Thus, we identify the concurrency sources in the gradient-based algorithm design pattern. To determine the parallelization limits, we develop a two-dimensional performance model as a function of the number of processors and network size. The model permits to compute the maximum possible speedup. Another model is presented to find the maximum problem size tractable in a given amount of time. Application of the previous models to a set of benchmark networks shows that parallelization enhances the proposed fuzzy community detection approach in more than an order of magnitude. This allows treatment of networks with several hundred thousand nodes in a time frame of hours.

关键词： complex networks fuzzy communities machine learning parallel algorithms performance model

来源：评论

学校读者我要写书评

暂无评论

Fast image recovery using dynamic load balancing in parallel architectures, by means of incomplete projections

引用

IEEE TRANSACTIONS ON IMAGE PROCESSING 2001年第4期10卷 493-499页

作者： González-Castaño, FJ García-Palomares, UM Alba-Castro, JL Pousada-Carballo, JM ETSI Telecomun Dept Tecnol Comunicac Vigo 36200 Spain Univ Simon Bolivar Dept Proc & Sistemas Caracas 1080A Venezuela

This paper formulates an incomplete projection algorithm that is applied to the image recovery problem. The algorithm allows an easy implementation of dynamic load balancing for parallel architectures. Furthermore, the local computation - communication load ratio can be adjusted, since each processor performs a finite number of iterations of any projection-type technique, and this number can be provided as a parameter of the algorithm. Numerical results compare favorably with those obtained by the extrapolated method of parallel subgradient projections.

关键词： load balancing parallel algorithms recovery restoration

来源：评论

学校读者我要写书评

暂无评论

Efficient parallel Random Sampling-Vectorized, Cache-Efficient, and Online

引用

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE 2018年第3期44卷 29-29页

作者： Sanders, Peter Lamm, Sebastian Huebschle-Schneider, Lorenz Schrade, Emanuel Dachsbacher, Carsten Karlsruhe Inst Technol Kaiserstr 12 D-76131 Karlsruhe Germany

We consider the problem of sampling n numbers from the range {1,..., N} without replacement on modern architectures. The main result is a simple divide-and-conquer scheme that makes sequential algorithms more cache efficient and leads to a parallel algorithm running in expected time O(n/p + log p) on p processors, i.e., scales to massively parallel machines even for moderate values of n. The amount of communication between the processors is very small (at most O(log p)) and independent of the sample size. We also discuss modifications needed for load balancing, online sampling, sampling with replacement, Bernoulli sampling, and vectorization on SIMD units or GPUs.

关键词： Hypergeometric random deviates parallel algorithms communication efficient algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：