检索结果-内蒙古大学图书馆

High-Quality Fault Resiliency in Fat Trees

IEEE MICRO 2020年第1期40卷 44-49页

作者： Gliksberg, John Capra, Antoine Louvet, Alexandre Javier Garcia, Pedro Sohier, Devan Versailles St Quentin En Yvelines Univ Versailles France Atos Paris France Castilla La Mancha Univ Ciudad Real Spain Atos BXI Projects 2 Paris France Atos Multiple BXI Projects Paris France Castilla La Mancha Univ Comp Architecture & Technol Ciudad Real Spain

Coupling regular topologies with optimized routing algorithms is key in pushing the performance of interconnection networks of supercomputers. In this article, wepresent Dmodc, a fast deterministic routing algorithmfor parallel generalized fat trees (PGFTs), whichminimizes congestion risk even undermassive network degradation caused by equipment failure. Dmodc computes forwarding tables witha closed-formarithmetic formula by relying on a fast preprocessing phase. This allowscomplete rerouting of networks with tens of thousands of nodes in less than a second. In turn, this greatly helps centralized fabric management react to faults with high-quality routing tables and has no impact on running applications in current and future very large scale high-performance computing clusters.

关键词： Deterministic algorithms Fault Tolerant Computing Mainframes Minimisation Multiprocessor Interconnection Networks Network Routing parallel algorithms parallel Machines Topology Trees Mathematics High Quality Fault Resiliency Coupling Regular Topologies Optimized Routing algorithms Fast Deterministic Routing Algorithm parallel Generalized Fat Trees Massive Network Degradation Equipment Failure Closed Form Arithmetic Formula Fast Preprocessing Phase High Quality Routing Tables Very Large Scale High Performance Computing Clusters Dmodc Algorithm PGFT Interconnection Networks Supercomputers Congestion Risk Minimization Centralized Fabric Management Network Rerouting Routing Protocols Network Topology Degradation Optical Switches Clustering algorithms

来源：评论

学校读者我要写书评

暂无评论

Enhanced parallelization of the incremental 4D-Var data assimilation algorithm using the Randomized Incremental Optimal Technique

引用

QUARTERLY JOURNAL OF THE ROYAL METEOROLOGICAL SOCIETY 2020年第728期146卷 1351-1371页

作者： Bousserez, Nicolas Guerrette, Jonathan J. Henze, Daven K. Univ Colorado Mech Engn Boulder CO 80309 USA NOAA Chem Sci Div Boulder CO USA

Incremental 4D-Var is a data assimilation algorithm used routinely at operational numerical weather prediction (NWP) centres worldwide. The algorithm solves a series of quadratic minimization problems (inner-loops) obtained from linear approximations of the forward model around nonlinear trajectories (outer-loops). Since most of the computational burden is associated with the inner-loops, many studies have focused on developing computationally efficient algorithms to solve the least-square quadratic minimization problem, in particular through time parallelization. This paper presents the first implementation and testing of a recently proposed method for parallelizing incremental 4D-Var, the Randomized Incremental Optimal Technique (RIOT), which replaces the traditional sequential conjugate gradient (CG) iterations in the inner-loop of the minimization with fully parallel randomized singular value decomposition (RSVD) of the preconditioned Hessian of the cost function. RIOT is tested using the standard Lorenz-96 model (L-96) as well as two realistic high-dimensional atmospheric source inversion problems based on aircraft observations of black carbon concentrations. A new outer-loop preconditioning technique tailored to RSVD was introduced to improve convergence stability and performance. Results obtained with the L-96 system show that the performance improvement from RIOT compared to standard CG algorithms increases significantly with nonlinearities. Overall, in the realistic black carbon source inversion experiments, RIOT reduces the wall-clock time of the 4D-Var minimization by a factor of 2 to 3, at the cost of a factor of 4 to 10 increase in energy cost due to the large number of parallel cores used. Furthermore, RIOT enables reduction of the wall-clock time computation of the analysis-error covariance matrix by a factor of 40 compared to a standard iterative Lanczos approach. Finally, as evidenced in this study, implementation of RIOT in an operational NWP syste

关键词： data assimilation parallel algorithms 4D variational assimilation RIOT

来源：评论

学校读者我要写书评

暂无评论

A parallel Algorithm for Constructing Two Edge-disjoint Hamiltonian Cycles in Locally Twisted Cubes

A Parallel Algorithm for Constructing Two Edge-disjoint Hami...

引用

2020 International Computer Symposium, ICS 2020

作者： Li, Shun-Yu Chang, Jou-Ming Pai, Kung-Jui National Taipei University of Business Institute of Information and Decision Sciences Taipei Taiwan Ming Chi University of Technology Department of Industrial Engineering and Management New Taipei City Taiwan

ISBN: (纸本)9781728192550

The locally twisted cube LTQn is a variation of the hypercube Qn, and the diameter of LTQn is only about half of the diameter of Qn. For interconnection networks, some efficient communication algorithms can be designed based on the ring structure. In addition, two edge-disjoint Hamiltonian cycles also provide the edge-fault tolerant Hamiltonicity for the interconnection network. Hung [Theoretical Computer Science 412, 4747-4753, 2011] designed an O(n2n) time algorithm to construct two edge-disjoint Hamiltonian cycles in LTQn. In this paper, we provide parallel algorithms for each vertex in LTQn to determine which two edges were adopted in the first or second Hamiltonian cycles, respectively. By connecting these edges, we can construct two edge-disjoint Hamiltonian cycles in LTQn where n ≥ 4. © 2020 IEEE.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel Algorithm for Electromagnetic Field Computation Based on Particle Simulation Algorithm 1

Parallel Algorithm for Electromagnetic Field Computation Bas...

引用

1st International Conference on Computer Applied Science and Information Technology, ICCASIT 2020

作者： Zhang, Xiuping Qu, Fengcheng Qiu, Min Pi, Yanmei Heihe University Heilongjiang164300 China

Based on current serial algorithms for electromagnetic field computation, the parallel algorithm concept where the "divide and conquer" approach is adopted has designed and implemented electromagnetic field computation in 3D rectangular coordinates, cylindrical coordinates and polar coordinates of CHIPIC software. The speedup ratio of the parallel algorithm is analyzed. Finally, the correctness and efficiency of the algorithm are verified through case studies. © 2020 Published under licence by IOP Publishing Ltd.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

PREDICT-AND-RECOMPUTE CONJUGATE GRADIENT VARIANTS

引用

SIAM JOURNAL ON SCIENTIFIC COMPUTING 2020年第5期42卷 A3084-A3108页

作者： Chen, Tyler Carson, Erin Univ Washington Dept Appl Math Seattle WA 98195 USA Charles Univ Prague Fac Math & Phys Prague 13000 Czech Republic

The standard implementation of the conjugate gradient algorithm suffers from communication bottlenecks on parallel architectures, due primarily to the two global reductions required every iteration. In this paper, we study conjugate gradient variants which decrease the runtime per iteration by overlapping global synchronizations, and in the case of pipelined variants, matrix-vector products. Through the use of a predict-and-recompute scheme, whereby recursively updated quantities are first used as a predictor for their true values and then recomputed exactly at a later point in the iteration, these variants are observed to have convergence behavior nearly as good as the standard conjugate gradient implementation on a variety of test problems. We provide a rounding error analysis which provides insight into this observation. It is also verified experimentally that the variants studied do indeed reduce the runtime per iteration in practice and that they scale similarly to previously studied communication-hiding variants. Finally, because these variants achieve good convergence without the use of any additional input parameters, they have the potential to be used in place of the standard conjugate gradient implementation in a range of applications.

关键词： Krylov subspace methods conjugate gradient parallel algorithms numerical algorithms

来源：评论

学校读者我要写书评

暂无评论

Fast de Bruijn Graph Compaction in Distributed Memory Environments

引用

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020年第1期17卷 136-148页

作者： Pan, Tony Nihalani, Rahul Aluru, Srinivas Georgia Inst Technol Sch Computat Sci & Engn Atlanta GA 30332 USA

De Bruijn graph based genome assembly has gained popularity as short read sequencers become ubiquitous. A core assembly operation is the generation of unitigs, which are sequences corresponding to chains in the graph. Unitigs are used as building blocks for generating longer sequences in many assemblers, and can facilitate graph compression. Chain compaction, by which unitigs are generated, remains a critical computational task. In this paper, we present a distributed memory parallel algorithm for simultaneous compaction of all chains in bi-directed de Bruijn graphs. The key advantages of our algorithm include bounding the chain compaction run-time to logarithmic number of iterations in the length of the longest chain, and ability to differentiate cycles from chains within logarithmic number of iterations in the length of the longest cycle. Our algorithm scales to thousands of computational cores, and can compact a whole genome de Bruijn graph from a human sequence read set in 7.3 seconds using 7680 distributed memory cores, and in 12.9 minutes using 64 shared memory cores. It is 3.7 x and 2.0x faster than equivalent steps in the state-of-the-art tools for distributed and shared memory environments, respectively. An implementation of the algorithm is available at https://***/ParBLiSS/bruno.

关键词： De Bruijn graph genome assembly graph compaction parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A randomized parallel algorithm for efficiently finding near-optimal universal hitting sets 24th

A randomized parallel algorithm for efficiently finding near...

引用

24th Annual Conference on Research in Computational Molecular Biology, RECOMB 2020

作者： Ekim, Barış Berger, Bonnie Orenstein, Yaron Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology CambridgeMA02139 United States Department of Mathematics Massachusetts Institute of Technology CambridgeMA02139 United States School of Electrical and Computer Engineering Ben-Gurion University of the Negev Beer-Sheva8410501 Israel

ISBN: (纸本)9783030452568

As the volume of next generation sequencing data increases, an urgent need for algorithms to efficiently process the data arises. Universal hitting sets (UHS) were recently introduced as an alternative to the central idea of minimizers in sequence analysis with the hopes that they could more efficiently address common tasks such as computing hash functions for read overlap, sparse suffix arrays, and Bloom filters. A UHS is a set of k-mers that hit every sequence of length L, and can thus serve as indices to L-long sequences. Unfortunately, methods for computing small UHSs are not yet practical for real-world sequencing instances due to their serial and deterministic nature, which leads to long runtimes and high memory demands when handling typical values of k (e.g. k > 13). To address this bottleneck, we present two algorithmic innovations to significantly decrease runtime while keeping memory usage low: (i) we leverage advanced theoretical and architectural techniques to parallelize and decrease memory usage in calculating k-mer hitting numbers;and (ii) we build upon techniques from randomized Set Cover to select universal k-mers much faster. We implemented these innovations in PASHA, the first randomized parallel algorithm for generating near-optimal UHSs, which newly handles k > 13. We demonstrate empirically that PASHA produces sets only slightly larger than those of serial deterministic algorithms;moreover, the set size is provably guaranteed to be within a small constant factor of the optimal size. PASHA’s runtime and memory-usage improvements are orders of magnitude faster than the current best algorithms. We expect our newly-practical construction of UHSs to be adopted in many high-throughput sequence analysis pipelines. © Springer Nature Switzerland AG 2020.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel methods for linear systems solution in extreme learning machines: An overview 7

Parallel methods for linear systems solution in extreme lear...

引用

7th International Conference Days of Applied Mathematics, ICDAM 2020

作者： Gelvez-Almeida, E. Baldera-Moreno, Y. Huérfano, Y. Vera, M. Mora, M. Barrientos, R. Laboratorio de Investigaciones Tecnologicas en Reconocimiento de Patrones Universidad Católica Del Maule Talca Chile Facultad de Ciencias Básicas y Biomédicas Universidad Simón Bolívar San José de Cúcuta Colombia Grupo de Investigación en Procesamiento Computacional de Datos Universidad de Los Andes San Cristóbal Venezuela

This paper aims to present an updated review of parallel algorithms for solving square and rectangular single and double precision matrix linear systems using multi-core central processing units and graphic processing units. A brief description of the methods for the solution of linear systems based on operations, factorization and iterations was made. The methodology implemented, in this article, is a documentary and it was based on the review of about 17 papers reported in the literature during the last five years (2016-2020). The disclosed findings demonstrate the potential of parallelism to significantly decrease extreme learning machines training times for problems with large amounts of data given the calculation of the Moore Penrose pseudo inverse. The implementation of parallel algorithms in the calculation of the pseudo-inverse will allow to contribute significantly in the applications of diversifying areas, since it can accelerate the training time of the extreme learning machines with optimal results. © 2020 Published under licence by IOP Publishing Ltd.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Real-Time Reconfigurable Processor to Detect Similarities in Compressed Video Using Generalized Hough Transformation

引用

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 2020年第9期30卷 2932-2946页

作者： Geninatti, Sergio R. Boemo, Eduardo I. Univ Nacl Rosario Dept Elect RA-2000 Rosario Argentina Univ Autonoma Madrid DSLab Madrid 28049 Spain

In this work, we present a FPGA-based Generalized Hough Transform custom processor to calculate similarities between arbitrary shapes. Raw data are 44 x 36 DC images extracted directly from low-resolution compressed video (352 x 288). The outputs are two numbers per frame that quantify the image similitude in terms of scale and rotation. The proposed architecture efficiently resolves the detection of pixel pairs, and the voting of distances and rotations, without memory access conflicts. These operations are inherent to Hough transformation. The paper condenses some circuit solutions suitable to hardwiring video processing. They take full advantage of using small embedded memories as look-up tables. The complete processor is validated with benchmark video samples that cover different scenarios and problems: sport, drama, and news. The final version internally operates at 100 MHz and fits inside a small FPGA chip. The highly concurrent architecture employs both pipelining and parallelism using hardware replication. The final performance is over 40 Giga fixed-point operations per second.

关键词： Image analysis parallel algorithms pipeline processing FPGA look-up table

来源：评论

学校读者我要写书评

暂无评论

A parallel algorithm based on openMP for human brain model of transcranial magnetic stimulation

A parallel algorithm based on openMP for human brain model o...

引用

2020 Asia-Pacific Conference on Image Processing, Electronics and Computers, IPEC 2020

作者： Zhang, Peixian Wang, Hongbin Zhang, Shuai Hebei University of Technology State Key Laboratory of Reliability and Intelligence of Electrical Equipment Tianjin China Tianjin Key Laboratory of Bioelectromagnetic Technology and Intelligent Health Hebei University of Technology Tianjin China

ISBN: (纸本)9781728160665

Aiming at the problems of complex and variable bio-electromagnetic computing, large amount of calculation, and insufficient calculation accuracy to meet the actual clinical needs, a parallel algorithm based on OpenMP was introduced. A multi-threaded operation of the electromagnetic computing model was realized by using a mixed programming method. The parallel method work at a single-computer with multi-core is performed on the electromagnetic calculation process of the human brain model stimulated by the transcranial magnetic stimulation coil, which improves the calculation efficiency of the model, so that the electromagnetic calculation of the brain model with higher accuracy can be satisfied. © 2020 IEEE.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：