检索结果-内蒙古大学图书馆

Speedup of tridiagonal system solvers

JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS 2021年 381卷 112997-112997页

作者： Kacala, Viliam Torok, Csaba PJ Safarik Univ Kosice Inst Comp Sci Jesenna 5 Kosice 04001 Slovakia

The paper proposes a new approach to the solution of standard and block tridiagonal systems that appear in various areas of technical, scientific and financial practice. Its goal is to elaborate an efficient two-phase tridiagonal solver, the particular case of which is the k-step cyclic reduction. The main idea of the proposed approach to designing a two-phase tridiagonal solver lies in using new model equations for dyadic system reduction. The resulting solver differs from the known two-phase partitioning ones also in the second phase, since it uses a series of simple explicit formulas for calculation of the remaining unknown values. Computational experiments on measuring speedup confirmed the efficiency of the proposed solver. (C) 2020 Elsevier B.V. All rights reserved.

关键词： Tridiagonal systems Diagonally dominant matrix Speedup parallel algorithm

来源：评论

学校读者我要写书评

暂无评论

Integrated Analysis and Optimization of the Large Airborne RadomeEnclosed Antenna System

引用

APPLIED COMPUTATIONAL ELECTROMAGNETICS SOCIETY JOURNAL 2020年第10期35卷 1192-1199页

作者： Zhai, Chang Zhao, Xunwang Lin, Zhongchao Zhang, Yu Xidian Univ Shaanxi Key Lab Large Scale Electromagnet Comp Xian 710071 Shaanxi Peoples R China

In order to realize integrally analysis and optimization of the large airborne radome-enclosed antenna system, a novel optimization strategy is proposed based on an overlapping domain decomposition method by using higher-order MoM and out-of-core solver (HO-OC-DDM), and combining with adaptive mutation particle swarm optimization (AMPSO). The introduction of parallel out-of-core solver and DDM can effectively break the random access memory (RAM) limit. This strategy can decompose difficult-to-solve global optimization problems into multi-domain optimization problems by using domain decomposition method. Finally, take airborne Yagi antenna system as an example, the numerical results show that the design of large airborne radome-enclosed antenna system based on the proposed strategy is convenient and effective.

关键词： AMPSO HO-OC-DDM integrated analysis and optimization parallel algorithm

来源：评论

学校读者我要写书评

暂无评论

Efficient mesh generation utilizing an adaptive body centered cubic mesh

引用

JOURNAL OF COMPUTATIONAL PHYSICS 2021年 436卷 110292-110292页

作者： Yang, Hee Jun Jeon, Kiwan Kim, Hyea Hyun Kyung Hee Univ Coll Sci Dept Math 26 Kyungheedae Ro Seoul 02447 South Korea Natl Inst Math Sci Div Med Math Yuseong Daero 1689beon Gil 70 Daejeon 34047 South Korea Kyung Hee Univ Coll Appl Sci Dept Appl Math 1732 Deogyeong Daero Yongin 17104 Gyeonggi Do South Korea

To generate a mesh in a physical domain, an initial mesh of a polygonal domain that approximates the physical domain is introduced. The initial mesh is formed by using a Body Centered Cubic (BCC) lattice that can give a more efficient node ordering for the matrix vector multiplication. An optimization problem is then considered for the displacement on the initial mesh points, which maintains a good quality of triangles while aiming at fitting the initial mesh to the boundary of the physical domain. In the optimization problem, a mesh quality function is employed. The Fr & eacute;chet derivative of the objective function vanishes at the optimal solution and it gives a resulting nonlinear algebraic system for the optimal solution. The nonlinear algebraic system can be solved by using the Picard or Newton method. To resolve the complexity in the physical domain, a very fine initial mesh is often required but the solution time for the nonlinear algebraic system becomes problematic. To overcome this limitation, adaptively refined grid cells for the initial BCC mesh can be used and iterative solvers combined with a domain decomposition preconditioner can be used for solving the algebraic system in the Picard or Newton method. The use of iterative solvers with a domain decomposition preconditioner gives a parallel meshing algorithm that makes the proposed scheme more efficient for large scale problems. Numerical results for various test models are included. (C) 2021 Elsevier Inc. All rights reserved.

关键词： Mesh generation Body centered cubic mesh Mesh quality function Domain decomposition preconditioner parallel algorithm

来源：评论

学校读者我要写书评

暂无评论

Evaluating FFT-based algorithms for strided convolutions on ARMv8 architectures

引用

PERFORMANCE EVALUATION 2021年 152卷

作者： Huang, Xiandong Wang, Qinglin Lu, Shuyu Hao, Ruochen Mei, Songzhu Liu, Jie Natl Univ Def Technol Sci & Technol Parallel & Distributed Proc Lab Changsha 410073 Peoples R China Natl Univ Def Technol Sch Comp Sci Changsha 410073 Peoples R China Univ Pittsburgh Dept Biomed Informat Pittsburgh PA USA

Convolutional Neural Networks (CNNs) have been widely adopted in many kinds of artificial intelligence applications. Most of the computational overhead of CNNs is spent on convolutions. An effective approach to reducing the overhead is transforming convolutions in the time domain into multiplications in the frequency domain by means of Fast Fourier Transform (FFT) algorithms, known as FFT-based fast algorithms for convolutions. However, current FFT-based fast implementations only work for unit-strided convolutions with stride as 1, and cannot be directly applied to strided convolutions with stride size greater than 1, which are usually used as the first layer of CNNs and as an effective alternative to the pooling layers for downsampling. In this paper, we first introduce rearrangement-and sampling-based methods for applying FFT-based fast algorithms to strided convolutions, and the arithmetic complexities of these two methods and the direct method are compared in detail. Then, the highly optimized parallel implementations of the two methods on ARMv8-based many-core CPU are presented. Lastly, we benchmark these implementations against two GEMM-based implementations on this ARMv8 CPU. Our experimental results with convolutions of different kernels, feature maps, and batch sizes show that the rearrangement-based method generally exceeds the sampling-based one under the same optimizations in most cases, and these two methods can get much better performance than GEMM-based ones when the kernels, feature maps and batch sizes are large. The experimental results on the convolutional layers in popular CNNs further demonstrate the conclusions above. (C) 2021 Elsevier B.V. All rights reserved.

关键词： CNNs Strided convolutions FFT ARMv8 parallel algorithm

来源：评论

学校读者我要写书评

暂无评论

parallelization of a Bio-inspired Computational Model for the Simulation of 3-D Multicellular Tissue Growth

引用

Procedia Computer Science 2013年 20卷 391-398页

作者： Belgacem Ben Youssef Department of Computer Engineering CCIS King Saud University Riyadh 11543 Saudi Arabia

The use of parallelism may overcome some of the constraints imposed by single processor computing systems. Besides offering faster solutions, applications that are parallelized can solve bigger or more complex problems. For instance, simulations can be run at finer resolutions while physical phenomena can be potentially modeled more realistically. We describe in this paper the development of a bio-inspired parallel algorithm used in the three-dimensional simulation of multicellular tissue growth. We report on the different components of the model where cellular automata is used to model different types of cell populations that execute persistent random walks on the computational grid, collide, and proliferate until they reach confluence. We also discuss the main issues encountered in the parallelization of the model and its implementation on a parallel machine.

关键词： Bio-inspired simulation model parallel algorithm tissue growth cellular automata

来源：评论

学校读者我要写书评

暂无评论

On the Performance, Scalability and Sensitivity Analysis of a Large Air Pollution Model

引用

Procedia Computer Science 2016年 80卷 2053-2061页

作者： Tzvetan Ostromsky Vassil Alexandrov Ivan Dimov Zahari Zlatev Institute of Information and Communication Technologies Bulgarian Academy of Sciences Acad. G. Bonchev bl. 25A 1113 Sofia Bulgaria ICREA – Barcelona Supercomputing Centre (BSC-CNS) Carrer Jordi Girona 29 E-08034 Barcelona Spain Distinguished Visiting Professor in Computational Science Technologico de Monterrey (ITESM) Campus Monterrey Mexico National Centre for Environment and Energy University of Århus Frederiksborgvej 399 P.O. Box 358 DK-4000 Roskilde Denmark

Computationally efficient sensitivity analysis of a large-scale air pollution model is an important issue we focus on in this paper. Sensitivity studies play an important role for reliability analysis of the results of complex nonlinear models as those used in the air pollution modelling. There is a number of uncertainties in the input data sets, as well as in some internal coefficients, which determine the speed of the main chemical reactions in the chemical part of the model. These uncertainties are subject to our quantitative sensitivity study. Monte Carlo and quasi-Monte Carlo algorithms are used in this study. A large number of numerical experiments with some special modifications of the model must be carried out in order to collect the necessary input data for the particular sensitivity study. For this purpose we created an efficient high performance implementation SA-DEM, based on the MPI version of the package UNI-DEM. A large number of numerical experiments were carried out with SA-DEM on the IBM MareNostrum III at BSC - Barcelona, helped us to identify a severe performance problem with an earlier version of the code and to resolve it successfuly. The improved implementation appears to be quite efficient for that challenging computational problem, as our experiments show. Some numerical results with performance and scalability analysis of these results are presented in the paper.

关键词： air pollution model DEM MPI parallel algorithm scalability performance speed-up supercomputer sensitivity analysis

来源：评论

学校读者我要写书评

暂无评论

Sparsity-Exploiting Distributed Projections onto a Simplex

引用

INFORMS JOURNAL ON COMPUTING 2024年第3期36卷 820-835页

作者： Dai, Yongzheng Chen, Chen Ohio State Univ Integrated Syst Engn Columbus OH 43210 USA

Projecting a vector onto a simplex is a well-studied problem that arises in a wide range of optimization problems. Numerous algorithms have been proposed for determining the projection;however, the primary focus of the literature is on serial algorithms. We present a parallel method that decomposes the input vector and distributes it across multiple processors for local projection. Our method is especially effective when the resulting projection is highly sparse, which is the case, for instance, in large-scale problems with independent and identically distributed (i.i.d.) entries. Moreover, the method can be adapted to parallelize a broad range of serial algorithms from the literature. We fill in theoretical gaps in serial algorithm analysis and develop similar results for our parallel analogues. Numerical experiments conducted on a wide range of large-scale instances, both real world and simulated, demonstrate the practical effectiveness of the method.

关键词： simplex parallel algorithm projection algorithm

来源：评论

学校读者我要写书评

暂无评论

Optimal algorithms for interval graphs

引用

JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 1998年第2期14卷 449-459页

作者： Wang, YL Chiang, KC Yu, MS Natl Taiwan Univ Sci & Technol Dept Informat Management Taipei 107 Taiwan Natl Chung Hsing Univ Dept Appl Math Taichung 402 Taiwan

In this paper, simple optimal algorithms are presented for solving some problems related to interval graphs. These problems are the connected component problem, the spanning tree problem, the eccentricity problem, and the single source all destinations shortest path problem. All of the above four problems can be solved in linear time if the endpoints of the intervals are sorted. Moreover, our algorithms can be parallelized very easily so that the above problems can be solved in O(log n) time with O(n/log n) processors using the EREW PRAM model.

关键词： parallel algorithm spanning tree connected component single source all destinations eccentricity interval graph graph theory EREW PRAM computational model

来源：评论

学校读者我要写书评

暂无评论

Research on Machine Learning algorithm in Large Data Environment

引用

AGRO FOOD INDUSTRY HI-TECH 2017年第3期28卷 6-11页

作者： Deng Xiaodun Xian Int Univ Xian 710077 Shaanxi Peoples R China

Objective: To study the algorithm of machine learning in large data environment. Methods: using the divide and conquer strategy and parallel algorithm. Process: feature selection, classification, clustering and association analysis of large data. Results and analysis: from the experimental results, we can see, using the divide and conquer strategy and parallel algorithm, we can extract information which is hidden but valuable from large data, improve analysis and problem solving skills. This result is obtained because of the algorithm can effectively extract, retrieve, store, share, analyze and deal with complex structure and large amount of data. Conclusion: the divide and conquer strategy and parallel algorithm for large data on machine learning algorithm to actively promote around is effective, and maximize the value of the data.

关键词： big data machine learning parallel algorithm divide and conquer strategy

来源：评论

学校读者我要写书评

暂无评论

A heterogeneous computing system for coupling 3D endomicroscopy with volume rendering in real-time image visualization

引用

COMPUTERS IN INDUSTRY 2014年第2期65卷 367-381页

作者： Chiew, Wei Ming Lin, Feng Qian, Kemao Seah, Hock Soon Nanyang Technol Univ Emerging Res Lab Sch Comp Engn Singapore 639798 Singapore

Modern microscopic volumetric imaging processes lack capturing flexibility and are inconvenient to operate. Additionally, the quality of acquired data could not be assessed immediately during imaging due to the lack of a coherent real-time visualization system. Thus, to eliminate the requisition of close user supervision while providing real-time 3D visualization alongside imaging, we propose and describe an innovative approach to integrate imaging and visualization into a single pipeline called an online incrementally accumulated rendering system. This system is composed of an electronic controller for progressive acquisition, a memory allocator for memory isolation, an efficient memory organization scheme, a compositing scheme to render accumulated datasets, and accumulative frame buffers for displaying non-conflicting outputs. We implement this design using a laser scanning confocal endomicroscope, interfaced with an FPGA prototyping board through a custom hardware circuit. Empirical results from practical implementations deployed in a cancer research center are presented in this paper. (C) 2013 Elsevier B.V. All rights reserved.

关键词： Medical imaging Volume rendering Real-time parallel algorithm Field-programmable gate array

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：