检索结果-内蒙古大学图书馆

International Conference for High Performance Computing, Networking, Storage and Analysis (HPC)

作者： Kannan, Ramakrishnan Sao, Piyush Lu, Hao Kurzak, Jakub Schenk, Gundolf Shi, Yongmei Lim, Seung-Hwan Israni, Sharat Thakkar, Vijay Cong, Guojing Patton, Robert Baranzini, Sergio E. Vuduc, Richard Potok, Thomas Oak Ridge Natl Lab Oak Ridge TN 37830 USA Adv Micro Devices Inc Santa Clara CA USA Univ Calif San Francisco San Francisco CA USA Georgia Inst Technol Atlanta GA USA

ISBN: (纸本)9781665454445

We are motivated by newly proposed methods for mining large-scale corpora of scholarly publications (e.g., full biomedical literature), which consists of tens of millions of papers spanning decades of research. In this setting, analysts seek to discover relationships among concepts. They construct graph representations from annotated text databases and then formulate the relationship-mining problem as an all-pairs shortest paths (APSP) and validate connective paths against curated biomedical knowledge graphs (e.g., SPOKE). In this context, we present COAST (Exascale Communication-Optimized All-Pairs Shortest Path) and demonstrate 1.004 EF/s on 9,200 Frontier nodes (73,600 GCDs). We develop hyperbolic performance models ( HYPERMOD), which guide optimizations and parametric tuning. The proposed COAST algorithm achieved the memory constant parallel efficiency of 99% in the single-precision tropical semiring. Looking forward, COAST will enable the integration of scholarly corpora like PubMed into the SPOKE biomedical knowledge graph.

关键词： Shortest Path Problem High-Performance Computing parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Fully-adaptive Model for Broadcasting with Universal Lists 24

Fully-adaptive Model for Broadcasting with Universal Lists

引用

24th International Symposium on Symbolic and Numeric algorithms for Scientific Computing (SYNASC)

作者： Gholami, Saber Harutyunyan, Hovhannes A. Concordia Univ Dept Comp Sci & Software Engn Montreal PQ Canada

ISBN: (纸本)9781665465458

In classical broadcasting, a piece of information must be transmitted to all entities of a network as quickly as possible, starting from a particular member. Since this problem has an enormous number of applications and is proven to be NP-Hard, several models are defined in the literature while trying to simulate real-world situations and relax several constraints. A well-known branch of broadcasting utilizes a universal list throughout the process. That is, once a vertex is informed, it must follow its corresponding list, regardless of the originator and the neighbor it received the message. The problem of broadcasting with universal lists could be categorized into two sub-models: non-adaptive and adaptive. In the latter model, a sender will skip the vertices on its list from which it has received the message, while those vertices will not be skipped in the first model. In this study, we will present another sub-model called fully adaptive. Not only does this model benefit from a significantly better space complexity compared to the classical model, but, as will be proved, it is faster than the two other sub-models. Since the suggested model fits real-world network architectures, we will design optimal broadcast algorithms for well-known interconnection networks such as trees, grids, and cube-connected cycles under the fully-adaptive model. We also present a tight upper bound for tori under the same model.

关键词： parallel algorithms graph theory broadcasting universal lists fully-adaptive

来源：评论

学校读者我要写书评

暂无评论

Per Segment Plane Sweep Line Segment Intersection on the GPU 22

Per Segment Plane Sweep Line Segment Intersection on the GPU

引用

30th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS)

作者： Frye, Roger McKenney, Mark Southern Illinois Univ Edwardsville IL 62901 USA

ISBN: (纸本)9781450395298

Polygon overlay operations are used for various purposes such as GIS, VLSI, and geometric operations. Recent articles present algorithms using the GPU to perform the polygon overlay operation. We present two algorithms implemented on the GPU that focus on the active list of the traditional serial plane sweep algorithm. The presented results show improvement in executions time with respect to recent algorithms.

关键词： computational geometry GPU processing parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

An accelerated algorithm for ECG signal denoising 16

An accelerated algorithm for ECG signal denoising

引用

16th International Conference on Signal-Image Technology and Internet-Based Systems (SITIS)

作者： De Luca, Pasquale Galletti, Ardelio Marcellino, Livia Parthenope Univ Naples Dept Sci & Technol Naples Italy

ISBN: (纸本)9781665464956

The Electrocardiogram (ECG) signal is an important tool for cardiovascular diseases analysis. However, still today acquisition devices produce noisy signals that degrades the quality of information by corrupting important features. To improve the quality of the acquired data a filtering process is mandatory. Moreover, a real-time filtering of ECGs, in order to obtain a diagnosis as quickly as possible is a very interesting challenge. In this paper, we consider as denoising filter, the Savitzky-Golay method and we propose a parallel algorithm implementing it. The procedure exploits the computational power of Graphics Processing Units (GPUs). Results in terms of performance and quality are provided.

关键词： ECG denoising SG filter parallel algorithms GP-GPU

来源：评论

学校读者我要写书评

暂无评论

GAPS: GPU-Acceleration of PDE Solvers for Wave Simulation 22

GAPS: GPU-Acceleration of PDE Solvers for Wave Simulation

引用

36th ACM International Conference on Supercomputing (ICS)

作者： Hanindhito, Bagus Gourounas, Dimitrios Fathi, Arash Trenev, Dimitar Gerstlauer, Andreas John, Lizy K. Univ Texas Austin Austin TX 78712 USA ExxonMobil Technol & Engn Annandale NJ USA

ISBN: (纸本)9781450392815

Large-scale simulations of wave-type equations have many industrial applications, such as in oil and gas exploration. Realistic simulations, which involve a vast amount of data, are often performed on multiple nodes of an HPC cluster. Using GPUs for these simulations is attractive due to considerable parallelizability of the algorithms. Many industry-relevant simulations have characteristics in their physics or geometry that can be exploited to improve computational efficiency. Furthermore, the choice of simulation algorithm impacts computational efficiency significantly. In this work, we exploit these features to significantly improve performance for a class of problems. Specifically, we use the discontinuous Galerkin (DG) finite element method, along with the Gauss-Lobatto-Legendre (GLL) integration scheme on hexahedral elements with straight faces, which then greatly reduces the number of BLAS operations, and simplify the computations to Level-1 BLAS operations, reducing the turn around time for wave simulation. However, attaining peak performance of GPUs is often not possible in these codes that exacerbate bottlenecks caused by data movement, even when modern GPUs enjoying the latest high-bandwidth memory are being used. We have developed GAPS, an efficient and scalable, GPU-accelerated PDE solver for Wave Simulation, by using hardwareand data-movement-aware algorithms. While significant speed-up over CPUs can be achieved, data movement still limits GPU performance. We present several optimization strategies, including kernel fusion, Look-Up-Table-based neighbor search, improved shared memory utilization, and SM-occupancy-aware register allocation. They improve performance up to 84.15x over CPU implementations and 1.84x over base GPU implementations on average. We then extend GAPS to support multi-GPUs on multi-node HPC clusters for large-scale wave simulations, and perform additional optimizations to reduce communication overhead. We also investigate the perfor

关键词： GPU acceleration HPC wave simulation discontinuous Galerkin parallel algorithms optimization strategies

来源：评论

学校读者我要写书评

暂无评论

Accelerating domain propagation: An efficient GPU-parallel algorithm over sparse matrices

引用

parallel COMPUTING 2022年 109卷

作者： Sofranac, Boro Gleixner, Ambros Pokutta, Sebastian Berlin Insitute Technol Str 17 Juni 135 D-10623 Berlin Germany Zuse Inst Berlin Takustr 7 D-14195 Berlin Germany HTW Berlin Treskowallee 8 D-10318 Berlin Germany

center dot Currently, domain propagation in state-of-the-art MIP solvers is single thread only. center dot The paper presents a novel, efficient GPU algorithm to perform domain propagation. center dot Challenges are dynamic algorithmic behavior, dependency structures, sparsity patterns. center dot The algorithm is capable of running entirely on the GPU with no CPU involvement. center dot We achieve speed-ups of around 10x to 20x, up to 180x on favorably-large instances.

关键词： Mixed integer linear programming MIP GPU Domain propagation Bound tightening parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel construction of multiple independent spanning trees on highly scalable datacenter networks

引用

APPLIED MATHEMATICS AND COMPUTATION 2022年 413卷 126617-126617页

作者： Yang, Jinn-Shyong Li, Xiao-Yan Peng, Sheng-Lung Chang, Jou-Ming Natl Taipei Univ Business Dept Informat Management Taipei 10051 Taiwan Fuzhou Univ Coll Math & Comp Sci Fuzhou 350108 Peoples R China Natl Taipei Univ Business Dept Prod Innovat & Entrepreneurship Taoyuan 32462 Taiwan Natl Taipei Univ Business Inst Informat & Decis Sci Taipei 10051 Taiwan

An emerging datacenter network (DCN) with high scalability called HSDC is a server-centric DCN that can help cloud computing in supporting many inherent cloud services. For example, a server-centric DCN can initiate routing for data transmission. This paper investigates the construction of independent spanning trees (ISTs for short), a set of the rooted spanning trees associated with the disjoint-path property, in HSDC. Regarding multiple spanning trees as routing protocol, ISTs have applications in data transmission, e.g., fault-tolerant broadcasting and secure message distribution. We first establish the vertex-symmetry of HSDC. Then, by the structure that n-dimensional HSDC is a compound graph of an n-dimensional hypercube Q(n) and n-clique K-n, we amend the algorithm constructing ISTs for Q(n) to obtain the algorithm required by HSDC. Unlike most algorithms of recursively constructing tree structures, our algorithm can find every node's parent in each spanning tree directly via an easy computation relied upon only the node address and tree index. Consequently, we can implement the algorithm for constructing n ISTs in O(nN) time, where N = n2(n) is the number of vertices of n-dimensional HSDC;or parallelize the algorithm in O(n) time using Nprocessors. Remarkably, the diameter of the constructed ISTs is about twice the diameter of Q(n). (C) 2021 Elsevier Inc. All rights reserved.

关键词： Datacenter networks Independent spanning trees parallel algorithms Fault-tolerant broadcasting Secure message distribution

来源：评论

学校读者我要写书评

暂无评论

Nonnegative Tensor Completion: step-sizes for an accelerated variation of the stochastic gradient descent 30

Nonnegative Tensor Completion: step-sizes for an accelerated...

引用

30th European Signal Processing Conference (EUSIPCO)

作者： Liavas, Athanasios P. Papagiannakos, Ioannis Marios Kolomvakis, Christos Tech Univ Crete Sch Elect & Comp Engn Khania Greece Univ Mons Fac Engn Dept Math & Operat Res Mons Belgium

ISBN: (纸本)9789082797091

We consider the problem of nonnegative tensor completion. We adopt the alternating optimization framework and solve each nonnegative matrix least-squares problem via an accelerated variation of the stochastic gradient descent. The step-sizes used by the algorithm determine, to a high extent, its behavior. We propose two new strategies for the computation of step-sizes and we experimentally test their effectiveness using both synthetic and real-world data.

关键词： tensors nonnegative tensor completion stochastic gradient descent accelerated gradient step-size selection Armijo line-search parallel algorithms OpenMP

来源：评论

学校读者我要写书评

暂无评论

parallel Subgraph Isomorphism on Multi-core Architectures: A Comparison of Four Strategies Based on Tree Search

Parallel Subgraph Isomorphism on Multi-core Architectures: A...

引用

Joint IAPR International Workshop on Structural and Syntactic Pattern Recognition (SSPR) / International Workshop on Statistical Techniques in Pattern Recognition (SPR)

作者： Carletti, Vincenzo Foggia, Pasquale Greco, Antonio Vento, Mario Univ Salerno Dept Informat Engn Elect Engn & Appl Math Fisciano Italy

ISBN: (纸本)9783030739720;9783030739737

Subgraph isomorphism is one of the most challenging problems on graph-based representations. Despite many efficient sequential algorithms have been proposed over the last decades, solving this problem on large graphs is still a time demanding task. For this reason, there is a recently growing interest in realizing effective parallel algorithms able to exploit at their best the modern multi-core architectures commonly available on servers and workstations. We propose a comparison of four parallel algorithms derived from the state-of-the-art sequential algorithm VF3-Light;two of them were presented in previous works, while the other two are introduced in this paper. In order to evaluate strong points and weaknesses of each algorithm, we performed a benchmark over six datasets of random large and dense graphs, both labelled and unlabelled, measuring memory usage, speed-up and efficiency. We also add a comparison with a different parallel algorithm, named Glasgow, that is not derived from VF3-Light.

关键词： Exact graph matching Subgraph isomorphism parallel algorithms VF3

来源：评论

学校读者我要写书评

暂无评论

Interval Constraint Satisfaction: Towards Edge Acceleration 11

Interval Constraint Satisfaction: Towards Edge Acceleration

引用

11th Mediterranean Conference on Embedded Computing (MECO) / 3rd Summer School on Cyber-Physical + Systems and Internet of Things (CPS and IoT)

作者： Khun, Jiri Schmidt, Jan Czech Tech Univ Fac Informat Technol Dept Digital Design Thakurova 9 Prague 16000 Czech Republic

ISBN: (纸本)9781665468282

Interval constraint satisfaction problems (CSPs) are typically hard to solve and, therefore, desirable candidates for acceleration. Although there were successful attempts in this area, several paths remain unexplored. Let's describe, discuss, and generalize our findings among partial algorithms and approaches used for interval CSP solving. We have divided the interval CSP solving process into several levels of abstraction. We analyzed them individually to find common traits and patterns among them. These can indicate possible areas for future acceleration attempts, especially on edge systems where effectiveness plays an important role.

关键词： Constraint satisfaction problems interval CSP numerical CSP interval arithmetic parallelization acceleration effectiveness parallel algorithms numerical algorithms consistency techniques edge devices edge systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：