检索结果-内蒙古大学图书馆

Efficient parallel Random Sampling-Vectorized, Cache-Efficient, and Online

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE 2018年第3期44卷 29-29页

作者： Sanders, Peter Lamm, Sebastian Huebschle-Schneider, Lorenz Schrade, Emanuel Dachsbacher, Carsten Karlsruhe Inst Technol Kaiserstr 12 D-76131 Karlsruhe Germany

We consider the problem of sampling n numbers from the range {1,..., N} without replacement on modern architectures. The main result is a simple divide-and-conquer scheme that makes sequential algorithms more cache efficient and leads to a parallel algorithm running in expected time O(n/p + log p) on p processors, i.e., scales to massively parallel machines even for moderate values of n. The amount of communication between the processors is very small (at most O(log p)) and independent of the sample size. We also discuss modifications needed for load balancing, online sampling, sampling with replacement, Bernoulli sampling, and vectorization on SIMD units or GPUs.

关键词： Hypergeometric random deviates parallel algorithms communication efficient algorithms

来源：评论

学校读者我要写书评

暂无评论

A COMPARISON OF PRECONDITIONED NONSYMMETRIC KRYLOV METHODS ON A LARGE-SCALE MIMD MACHINE

引用

SIAM JOURNAL ON SCIENTIFIC COMPUTING 1994年第2期15卷 440-459页

作者： SHADID, JN TUMINARO, RS SANDIA NATL LABS DIV APPL & NUMER MATH ALBUQUERQUE NM 87185 USA

Many complex physical processes are modeled by coupled systems of partial differential equations (PDEs). Often, the numerical approximation of these PDEs requires the solution of large sparse nonsymmetric systems of equations. In this paper the authors compare the parallel performance of a number of preconditioned Krylov subspace methods on a large-scale multiple instruction multiple data (MIMD) machine. These methods are among the most robust and efficient iterative algorithms tor the solution of large sparse linear systems. In this comparison, the focus is on parallel issues associated with preconditioners within the generalized minimum residual (GMRES). conjugate gradient squared (CGS), biconjugate gradient stabilized (Bi-CGSTAB), and quasi-minimal residual CGS (QMRCGS) methods. Conclusions are drawn on the effectiveness of the different schemes based on results obtained from a 1024 processor nCUBE 2 hypercube.

关键词： LINEAR SYSTEMS NONSYMMETRIC parallel algorithms KRYLOV METHODS PRECONDITIONERS MULTILEVEL METHODS MIMD

来源：评论

学校读者我要写书评

暂无评论

TECHNIQUES FOR SOLVING BLOCK TRIDIAGONAL SYSTEMS ON RECONFIGURABLE ARRAY COMPUTERS

引用

SIAM JOURNAL ON SCIENTIFIC AND STATISTICAL COMPUTING 1984年第3期5卷 701-719页

作者： KAPUR, RN BROWNE, JC UNIV TEXAS DEPT ELECT ENGNAUSTINTX 78712 UNIV TEXAS DEPT COMP SCIAUSTINTX 78712

This paper illustrates the concept of multiphase parallel structuring of algorithms on reconfigurable computers. Reconfigurable network architectured computers are described and a paradigm for programming them is defined. The execution behavior of two linear system solving techniques is determined and compared. This paper does not attempt a traditional analysis of linear system solvers: instead it presents a study of the scheduling and data flow requirements of a selected pair of algorithms.

关键词： parallel algorithms reconfigurable computers

来源：评论

学校读者我要写书评

暂无评论

A Dynamic Era-Based Time-Symmetric Block Time-Step Algorithm with parallel Implementations

引用

PUBLICATIONS OF THE ASTRONOMICAL SOCIETY OF JAPAN 2012年第3期64卷 45-45页

作者： Kaplan, Murat Saygin, Hasan Akdeniz Univ TR-07058 Antalya Turkey Istanbul Aydin Univ Istanbul Turkey

The time-symmetric block time-step (TSBTS) algorithm is a newly developed efficient scheme for N-body integrations. It is constructed on an era-based iteration. In this work, we re-designed the TSBTS integration scheme with a dynamically changing era size. A number of numerical tests were performed to show the importance of choosing the size of the era, especially for long-time integrations. Our second aim was to show that the TSBTS scheme is as suitable as previously known schemes for developing parallel N-body codes. In this work, we relied on a parallel scheme using the copy algorithm for the time-symmetric scheme. We implemented a hybrid of data and task parallelization for force calculation to handle load balancing problems that can appear in practice. Using the Plummer model initial conditions for different numbers of particles, we obtained the expected efficiency and speedup for a small number of particles. Although parallelization of the direct N-body codes is negatively affected by the communication/calculation ratios, we obtained good load-balanced results. Moreover, we were able to conserve the advantages of the algorithm (e.g., energy conservation for long-term simulations).

关键词： N-body parallel algorithms celestial mechanics stellar dynamics

来源：评论

学校读者我要写书评

暂无评论

PIANO: A fast parallel iterative algorithm for multinomial and sparse multinomial logistic regression

引用

SIGNAL PROCESSING 2022年 194卷 108459-108459页

作者： Jyothi, R. Babu, P. Indian Inst Technol Ctr Appl Res Elect Delhi India

Multinomial Logistic Regression is a well-studied tool for classification and has been widely used in fields like image processing, computer vision and, bioinformatics, to name a few. Under a supervised classification scenario, a Multinomial Logistic Regression model learns a weight vector to differentiate between any two classes by optimizing over the likelihood objective. With the advent of big data, the inundation of data has resulted in large dimensional weight vector and has also given rise to a huge number of classes, which makes the classical methods applicable for model estimation not computationally viable. To handle this issue, we here propose a parallel iterative algorithm: parallel Iterative Algorithm for MultiNomial LOgistic Regression ( PIANO ) which is based on the Majorization Minimization procedure, and can parallely update each element of the weight vectors. Further, we also show that PIANO can be easily extended to solve the Sparse Multinomial Logistic Regression problem -an extensively studied problem because of its attractive feature selection property. In particular, we work out the extension of PIANO to solve the Sparse Multinomial Logistic Regression problem with epsilon(1) and t 0 regularizations. We also prove that PIANO converges to a stationary point of the Multinomial and the Sparse Multinomial Logistic Regression problems. Simulations were conducted to compare PIANO with the existing methods, and it was found that the proposed algorithm performs better than the existing methods in terms of speed of convergence.(C) 2022 Elsevier B.V. All rights reserved.

关键词： Multinomial logistic regression Majorization minimization Sparse Parameter estimation Regularization parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel BLOCK-FINDING USING DISTANCE MATRICES

引用

parallel algorithms and Applications 1996年第1-2期9卷 1-13页

作者： Stavros D. Nikolopoulos[a] [a] Department of Computer Science University of Cyrus Nicosia Cyprus

We present a fast parallel algorithm for finding the blocks or biconnected components of an undirected graphG = (V,E) havingnvertices and m edges. Our techniques arc based on partitioning the vertex setVinto adjacency-level sets using information contained in the distance matrixDof the graph. LettDandpDbe the time and number of processors, respectively, for the computation of the distance matrix of a graphGon a CRCW-PRAM computational model. We show that the location of all cut vertices and bridges of a graph can be done in timeO(logδ +tD) by usingO(n m/td) processors, where δ is the maximum degree of a vertex inG. Based on these results, we define a digraphGdand we prove certain properties on its distance matrix leading to a parallel block-finding algorithm running in timeO(logδ +tD) withO(n m/tD) processors on the same computational model. We also show that other connectivity-related problems can be efficiently solved using distance matrices.

关键词： biconnected graphs blocks bridges complexity CRCW-PRAM cutpoints graph partition parallel algorithms F.2.2 G.2.2

来源：评论

学校读者我要写书评

暂无评论

Electromagnetic transient parallel simulation optimisation based on GPU

引用

JOURNAL OF ENGINEERING-JOE 2019年第16期2019卷 1737-1742页

作者： Yao, Shujun Zhang, Shuo Guo, Wanhua North China Elect Power Univ Sch Elect & Elect Engn Beijing Peoples R China

The development of smart grid and the increasing scale of power system bring much pressure to the electromagnetic transient simulation of a power system. The graphic processing unit (GPU), which features the massive concurrent threads and excellent floating point performance, brings a new chance to the area of power system simulation. This study introduces a parallel lower triangular and upper triangular decomposition algorithm and calculation strategy of electromagnetic transient simulation based on GPU. In this scheme, the GPU is mainly used to do the computationally intensive part of the simulation in parallel on its built-in multiple processing cores, and the CPU is assigned for updating history terms and flow control of the simulation. By comparing with the results simulating by the CPU-only implementations, the validity and efficiency of the proposed method are verified.

关键词： parallel algorithms smart power grids power system simulation optimisation EMTP graphics processing units microprocessor chips multi-threading CPU-only implementations history terms flow control built-in multiple processing cores floating point performance parallel LU decomposition algorithm power system simulation massive concurrent threads graphic processing unit smart grid GPU electromagnetic transient parallel simulation optimisation

来源：评论

学校读者我要写书评

暂无评论

A parallel FAST MULTIPOLE METHOD FOR THE HELMHOLTZ EQUATION

引用

parallel Processing Letters 1995年第2期5卷 263-274页

作者： MARK A. STALZER Optical Physics Laboratory Hughes Research Laboratories 3011 Malibu Canyon Road Malibu CA 90265 USA

Presented is a parallel algorithm based on the fast multipole method (FMM) for the Helmholtz equation. This variant of the FMM is useful for computing radar cross sections and antenna radiation patterns. The FMM decomposes the impedance matrix into sparse components, reducing the operation count of the matrix-vector multiplication in iterative solvers to O(N 3/2 ) (where N is the number of unknowns). The parallel algorithm divides the problem into groups and assigns the computation involved with each group to a processor node. Careful consideration is given to the communications costs. A time complexity analysis of the algorithm is presented and compared with empirical results from a Paragon XP/S running the lightweight Sandia/University of New Mexico operating system (SUNMOS). For a 90,000 unknown problem running on 60 nodes, the sparse representation fits in memory and the algorithm computes the matrix-vector product in 1.26 seconds. It sustains an aggregate rate of 1.4 Gflop/s. The corresponding dense matrix would occupy over 100 Gbytes and, assuming that I/O is free, would require on the order of 50 seconds to form the matrix-vector product.

关键词： parallel algorithms fast multipole method (FMM) iterative solvers Helmholtz equation Paragon SUNMOS

来源：评论

学校读者我要写书评

暂无评论

AN EVALUATION OF THE MEMORY REFERENCE BEHAVIOR OF ENGINEERING/SCIENTIFIC APPLICATIONS IN parallel SYSTEMS

引用

International Journal of High Speed Computing 1989年第4期1卷 603-641页

作者： SANDRA J. BAYLOR BHARAT D. RATHI IBM T.J. Watson Research Center P.O. Box 704 Yorktown Heights NY 10598 USA

This paper presents the results of a study conducted to evaluate the inherent memory reference behavior of several engineering/scientific applications, executing on shared memory, MIN-based, parallel systems. In this study, system sizes of two to 64 processors were evaluated. A trace-driven simulation model was used to obtain dynamic reference characteristics of the code. Included in this code were explicit declarations of shared variables. Our results indicate that a significant amount of explicitly declared shared data is accessed as either readonly by several processors, or read-write by a single processor. Furthermore, lines containing synchronization variables tend to see small ownership times at a processor and are accessed by several processors in the system. We also note that, as expected, relatively more references are to data with smaller ownership times, as the number of processors increase. Finally, the application data set size can have an impact on ownership time, as the number of processors increase.

关键词： parallel algorithms parallel processors private caches shared-memory multiprocessors trace-driven simulation

来源：评论

学校读者我要写书评

暂无评论

Efficient algorithms for estimating the general linear model

引用

parallel COMPUTING 2006年第2期32卷 195-204页

作者： Yanev, P Kontoghiorghes, EJ INRIA IRISA F-35042 Rennes France Univ Cyprus Dept Publ & Business Adm CY-1678 Nicosia Cyprus Univ London Birkbeck Coll Sch Comp Sci & Informat Syst London WC1E 7HX England

Computationally efficient serial and parallel algorithms for estimating the general linear model are proposed. The sequential block-recursive algorithm is an adaptation of a known Givens strategy that has Lis a main component the Generalized QR decomposition. The proposed algorithm is based on orthogonal transformations and exploits the triangular structure of the Cholesky QRD factor of the variance-covariance matrix. Specifically, it computes the estimator of the general linear model by solving recursively a series of smaller and smaller generalized linear least squares problems. The new algorithm is found to Outperform significantly the corresponding LAPACK routine. A parallel version of the new sequential algorithm which utilizes an efficient distribution of the matrices over the processors and has low inter-processor communication is developed. The theoretical computational complexity of the parallel algorithms is derived and analyzed. Experimental results are presented which confirm the theoretical analysis. The parallel strategy is found to be scalable and highly efficient for estimating large-scale general linear estimation problems. (c) 2005 Elsevier B.V. All rights reserved.

关键词： general linear model generalized QR decomposition parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：