检索结果-内蒙古大学图书馆

13th International Scientific Conference on parallel Computational Technologies (PCT)

作者： Akimova, Elena N. Misilov, Vladimir E. Tretyakov, Andrey I. RAS Ural Branch Krasovskii Inst Math & Mech 16 S Kovalevskaya St Ekaterinburg Russia Ural Fed Univ 19 Mira St Ekaterinburg Russia

ISBN: (纸本)9783030281632;9783030281625

We construct memory-optimized and time-efficient parallel algorithms (and the corresponding programs) taking advantage of regularized modified alpha-processes, namely the modified steepest descent method and the modified minimal residual method, for solving the nonlinear equation of the structural inverse gravimetry problem. Memory optimization relies on the block-Toeplitz structure of the Jacobian matrix. The algorithms are implemented on multicore CPUs and GPUs through the use of, respectively, OpenMP and NVIDIA CUDA technologies. We analyze the efficiency and speedup of the algorithms. In addition, we solve a model problem of gravimetry and conduct a comparative study regarding the number of iterations and computation time against algorithms based on conjugate gradient-type methods and the componentwise gradient method. The comparison demonstrates that the algorithms based on alpha-processes perform better, reducing the number of iterations and the computation time by as much as 50%.

关键词： Nonlinear gradient-type methods alpha-processes parallel algorithms Gravimetry problems Toeplitz matrix Multicore CPU GPU

来源：评论

学校读者我要写书评

暂无评论

Alternating criteria search: a parallel large neighborhood search algorithm for mixed integer programs

引用

COMPUTATIONAL OPTIMIZATION AND APPLICATIONS 2018年第1期69卷 1-24页

作者： Munguia, Lluis-Miquel Ahmed, Shabbir Bader, David A. Nemhauser, George L. Shao, Yufen Georgia Inst Technol Coll Comp Atlanta GA 30332 USA Georgia Inst Technol Sch Ind & Syst Engn Atlanta GA 30332 USA ExxonMobil Upstream Res Co Houston TX 77098 USA

We present a parallel large neighborhood search framework for finding high quality primal solutions for general mixed-integer programs (MIPs). The approach simultaneously solves a large number of sub-MIPs with the dual objective of reducing infeasibility and optimizing with respect to the original objective. Both goals are achieved by solving restricted versions of two auxiliary MIPs, where subsets of the variables are fixed. In contrast to prior approaches, ours does not require a feasible starting solution. We leverage parallelism to perform multiple searches simultaneously, with the objective of increasing the effectiveness of our heuristic. We computationally compare the proposed framework with a state-of-the-art MIP solver in terms of solution quality, scalability, reproducibility, and parallel efficiency. Results show the efficacy of our approach in finding high quality solutions quickly both as a standalone primal heuristic and when used in conjunction with an exact algorithm.

关键词： MIPs parallel algorithms Primal heuristics LNS

来源：评论

学校读者我要写书评

暂无评论

Finite-Difference Relaxation for parallel Computation of Ionized Field of HVDC Lines

引用

IEEE TRANSACTIONS ON POWER DELIVERY 2018年第1期33卷 119-129页

作者： Liu, Peng Dinavahi, Venkata Univ Alberta Dept Elect & Comp Engn Edmonton AB T6G 2R3 Canada

Ionized field calculations for high-voltage direct current (HVDC) transmission line is a computationally demanding problem, which can benefit from the application of massively parallel high-performance compute architectures. The finite element method (FEM) commonly employed to solve this problem is both memory and execution time intensive. In this paper, a finite-difference relaxation (FDR) method is proposed to solve a unipolar and a bipolar ionized field problem in an HVDC line. The novel FDR method has several advantages over FEM. First, the scheme is suitable for massively parallel computation and runs much faster: Compared with the commercial FEM software Comsol Multiphysics, the speed-up is more than 14 times in CPU parallelization and 35 times in graphics processor parallel implementation, while providing high accuracy. Moreover, the set of equations in FDR need not be assembled;instead, it is solved by a relaxation scheme and requires much less memory than FEM. Additionally, differentiated grid size with interpolation techniques is proposed to improve the flexibility of FDR for problem domain containing irregular geometries or disproportional sizes.

关键词： Finite-difference method graphics processors HVDC lines ionized field Jacobi method multi-core many-core parallel algorithms relaxation

来源：评论

学校读者我要写书评

暂无评论

Nesterov-Based Alternating Optimization for Nonnegative Tensor Factorization: Algorithm and parallel Implementation

引用

IEEE TRANSACTIONS ON SIGNAL PROCESSING 2018年第4期66卷 944-953页

作者： Liavas, Athanasios P. Kostoulas, Georgios Lourakis, Georgios Huang, Kejun Sidiropoulos, Nicholas D. Tech Univ Crete Sch Elect & Comp Engn Khania 73100 Greece Univ Minnesota Dept Elect & Comp Engn Minneapolis MN 55455 USA

We consider the problem of nonnegative tensor factorization. Our aim is to derive an efficient algorithm that is also suitable for parallel implementation. We adopt the alternating optimization framework and solve each matrix nonnegative least-squares problem via a Nesterov-type algorithm for strongly convex problems. We describe a parallel implementation of the algorithm and measure the attained speedup in a multicore computing environment. It turns out that the derived algorithm is a competitive candidate for the solution of very large-scale dense nonnegative tensor factorization problems.

关键词： Tensors nonnegative tensor factorization optimal first-order optimization algorithms parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Distributed Enhanced Suffix Arrays: Efficient algorithms for Construction and Querying 19

Distributed Enhanced Suffix Arrays: Efficient Algorithms for...

引用

International Conference for High Performance Computing, Networking, Storage and Analysis (SC)

作者： Flick, Patrick Aluru, Srinivas Georgia Inst Technol Atlanta GA 30332 USA

ISBN: (数字)9781450362290

ISBN: (纸本)9781450362290

Suffix arrays and trees are important and fundamental string data structures which lie at the foundation of many string algorithms, with important applications in computational biology, text processing, and information retrieval. Recent work enables the efficient parallel construction of suffix arrays and trees requiring at most O(n/p) memory per process in distributed memory. However, querying these indexes in distributed memory has not been studied extensively. Querying common string indexes such as suffix arrays, enhanced suffix arrays, and FM-Index, all require random accesses into O(n) memory - which in distributed memory settings becomes prohibitively expensive. In this paper, we introduce a novel distributed string index, the Distributed Enhanced Suffix Array (DESA). We present efficient algorithms for the construction and querying of this distributed data structure, all while requiring only O(n/p) memory per process. We further provide a scalable parallel implementation and demonstrate its performance and scalability.

关键词： Scalability High performance computing Memory management Load management Information retrieval Libraries Arrays parallel algorithms Text processing Indexing

来源：评论

学校读者我要写书评

暂无评论

A parallel Multiclassification Algorithm for Big Data Using an Extreme Learning Machine

引用

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018年第6期29卷 2337-2351页

作者： Duan, Mingxing Li, Kenli Liao, Xiangke Li, Keqin Natl Univ Def Technol Collaborat Innovat Ctr High Performance Comp Changsha 410073 Hunan Peoples R China Hunan Univ Coll Informat Sci & Engn Changsha 410082 Hunan Peoples R China SUNY Coll New Paltz Dept Comp Sci New Paltz NY 12561 USA

As data sets become larger and more complicated, an extreme learning machine (ELM) that runs in a traditional serial environment cannot realize its ability to be fast and effective. Although a parallel ELM (PELM) based on MapReduce to process large-scale data shows more efficient learning speed than identical ELM algorithms in a serial environment, some operations, such as intermediate results stored on disks and multiple copies for each task, are indispensable, and these operations create a large amount of extra overhead and degrade the learning speed and efficiency of the PELMs. In this paper, an efficient ELM based on the Spark framework (SELM), which includes three parallel subalgorithms, is proposed for big data classification. By partitioning the corresponding data sets reasonably, the hidden layer output matrix calculation algorithm, matrix U decomposition algorithm, and matrix V decomposition algorithm perform most of the computations locally. At the same time, they retain the intermediate results in distributed memory and cache the diagonal matrix as broadcast variables instead of several copies for each task to reduce a large amount of the costs, and these actions strengthen the learning ability of the SELM. Finally, we implement our SELM algorithm to classify large data sets. Extensive experiments have been conducted to validate the effectiveness of the proposed algorithms. As shown, our SELMachieves an 8.71 x speedup on a cluster with ten nodes, and reaches a 13.79 x speedup with 15 nodes, an 18.74 x speedup with 20 nodes, a 23.79 x speedup with 25 nodes, a 28.89 x speedup with 30 nodes, and a 33.81 x speedup with 35 nodes.

关键词： Big data classification extreme learning machine (ELM) matrix parallel algorithms Spark

来源：评论

学校读者我要写书评

暂无评论

Demystifying parallel and Distributed Deep Learning: An In-depth Concurrency Analysis

引用

ACM COMPUTING SURVEYS 2019年第4期52卷 65-65页

作者： Ben-Nun, Tal Hoefler, Torsten Swiss Fed Inst Technol Dept Comp Sci ETH Zurich Univ Str 6 CH-8092 Zurich Switzerland

Deep Neural Networks (DNNs) are becoming an important tool in modern computing applications. Accelerating their training is a major challenge and techniques range from distributed algorithms to low-level circuit design. In this survey, we describe the problem from a theoretical perspective, followed by approaches for its parallelization. We present trends in DNN architectures and the resulting implications on parallelization strategies. We then review and model the different types of concurrency in DNNs: from the single operator, through parallelism in network inference and training, to distributed deep learning. We discuss asynchronous stochastic optimization, distributed system architectures, communication schemes, and neural architecture search. Based on those approaches, we extrapolate potential directions for parallelism in deep learning.

关键词： Deep learning distributed computing parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Communication-avoiding CholeskyQR2 for rectangular matrices 33

Communication-avoiding CholeskyQR2 for rectangular matrices

引用

33rd IEEE International parallel and Distributed Processing Symposium (IPDPS)

作者： Hutter, Edward Solomonik, Edgar Univ Illinois Dept Comp Sci Urbana IL 61801 USA

ISBN: (纸本)9781728112466

Scalable QR factorization algorithms for solving least squares and eigenvalue problems are critical given the increasing parallelism within modern machines. We introduce a more general parallelization of the CholeskyQR2 algorithm and show its effectiveness for a wide range of matrix sizes. Our algorithm executes over a 3D processor grid, the dimensions of which can be tuned to trade-off costs in synchronization, interprocessor communication, computational work, and memory footprint. We implement this algorithm, yielding a code that can achieve a factor of Theta(P-1/6) less interprocessor communication on P processors than any previous parallel QR implementation. Our performance study on Intel Knights-Landing and Cray XE supercomputers demonstrates the effectiveness of this CholeskyQR2 parallelization on a large number of nodes. Specifically, relative to ScaLAPACK's QR, on 1024 nodes of Stampede2, our CholeskyQR2 implementation is faster by 2.6x-3.3x in strong scaling tests and by 1.1x-1.9x in weak scaling tests.

关键词： Three-dimensional displays Cost benefit analysis Symmetric matrices Synchronization Bandwidth parallel algorithms Two dimensional displays

来源：评论

学校读者我要写书评

暂无评论

Efficient GPU Stream Transform 19

Efficient GPU Stream Transform

引用

Australasian Computer Science Week Multiconference (ACSW)

作者： Olsson, Ola Univ Queensland Sch ITEE Brisbane Qld Australia

ISBN: (纸本)9781450366038

Asymmetric data patterns and workloads pose a challenge to massively parallel algorithm design, in particular for modern wide-SIMD architectures exhibiting several levels of parallelism. We propose a simple-to use primitive that enables programmers to design algorithms with arbitrary data expansion or compaction while hiding the architecture details. We evaluate and characterize the performance of the primitive for a range of workloads, both synthetic and real-world. The results demonstrate that the primitive can be an effective tool in the toolbox of designers of parallel algorithms.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Fast Methods for Speeding up the Induction Machine Simulation 22

Fast Methods for Speeding up the Induction Machine Simulatio...

引用

22nd International Conference on the Computation of Electromagnetic Fields (COMPUMAG)

作者： Lu, C. Lin, D. Chen, N. He, B. Zhou, P. ANSYS Inc Canonsburg PA 15317 USA

ISBN: (纸本)9781728155920

In induction machine simulation, it usually takes quite a long time to reach the steady state due to the large time constant. In this paper, two methods are proposed to speed up the transient process to reach the steady state. In both methods, the initial condition of the simulation is estimated from the solution of FEA model with locked rotor and equivalent conductivity/resistance. The effectiveness of the methods is validated by two examples with the comparison of the performance between two methods.

关键词： AC machines finite element analysis parallel algorithms steady state

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：