We are motivated by newly proposed methods for mining large-scale corpora of scholarly publications (e.g., full biomedical literature), which consists of tens of millions of papers spanning decades of research. In thi...
详细信息
ISBN:
(纸本)9781665454445
We are motivated by newly proposed methods for mining large-scale corpora of scholarly publications (e.g., full biomedical literature), which consists of tens of millions of papers spanning decades of research. In this setting, analysts seek to discover relationships among concepts. They construct graph representations from annotated text databases and then formulate the relationship-mining problem as an all-pairs shortest paths (APSP) and validate connective paths against curated biomedical knowledge graphs (e.g., SPOKE). In this context, we present COAST (Exascale Communication-Optimized All-Pairs Shortest Path) and demonstrate 1.004 EF/s on 9,200 Frontier nodes (73,600 GCDs). We develop hyperbolic performance models ( HYPERMOD), which guide optimizations and parametric tuning. The proposed COAST algorithm achieved the memory constant parallel efficiency of 99% in the single-precision tropical semiring. Looking forward, COAST will enable the integration of scholarly corpora like PubMed into the SPOKE biomedical knowledge graph.
In classical broadcasting, a piece of information must be transmitted to all entities of a network as quickly as possible, starting from a particular member. Since this problem has an enormous number of applications a...
详细信息
ISBN:
(纸本)9781665465458
In classical broadcasting, a piece of information must be transmitted to all entities of a network as quickly as possible, starting from a particular member. Since this problem has an enormous number of applications and is proven to be NP-Hard, several models are defined in the literature while trying to simulate real-world situations and relax several constraints. A well-known branch of broadcasting utilizes a universal list throughout the process. That is, once a vertex is informed, it must follow its corresponding list, regardless of the originator and the neighbor it received the message. The problem of broadcasting with universal lists could be categorized into two sub-models: non-adaptive and adaptive. In the latter model, a sender will skip the vertices on its list from which it has received the message, while those vertices will not be skipped in the first model. In this study, we will present another sub-model called fully adaptive. Not only does this model benefit from a significantly better space complexity compared to the classical model, but, as will be proved, it is faster than the two other sub-models. Since the suggested model fits real-world network architectures, we will design optimal broadcast algorithms for well-known interconnection networks such as trees, grids, and cube-connected cycles under the fully-adaptive model. We also present a tight upper bound for tori under the same model.
Polygon overlay operations are used for various purposes such as GIS, VLSI, and geometric operations. Recent articles present algorithms using the GPU to perform the polygon overlay operation. We present two algorithm...
详细信息
ISBN:
(纸本)9781450395298
Polygon overlay operations are used for various purposes such as GIS, VLSI, and geometric operations. Recent articles present algorithms using the GPU to perform the polygon overlay operation. We present two algorithms implemented on the GPU that focus on the active list of the traditional serial plane sweep algorithm. The presented results show improvement in executions time with respect to recent algorithms.
The Electrocardiogram (ECG) signal is an important tool for cardiovascular diseases analysis. However, still today acquisition devices produce noisy signals that degrades the quality of information by corrupting impor...
详细信息
ISBN:
(纸本)9781665464956
The Electrocardiogram (ECG) signal is an important tool for cardiovascular diseases analysis. However, still today acquisition devices produce noisy signals that degrades the quality of information by corrupting important features. To improve the quality of the acquired data a filtering process is mandatory. Moreover, a real-time filtering of ECGs, in order to obtain a diagnosis as quickly as possible is a very interesting challenge. In this paper, we consider as denoising filter, the Savitzky-Golay method and we propose a parallel algorithm implementing it. The procedure exploits the computational power of Graphics Processing Units (GPUs). Results in terms of performance and quality are provided.
Large-scale simulations of wave-type equations have many industrial applications, such as in oil and gas exploration. Realistic simulations, which involve a vast amount of data, are often performed on multiple nodes o...
详细信息
ISBN:
(纸本)9781450392815
Large-scale simulations of wave-type equations have many industrial applications, such as in oil and gas exploration. Realistic simulations, which involve a vast amount of data, are often performed on multiple nodes of an HPC cluster. Using GPUs for these simulations is attractive due to considerable parallelizability of the algorithms. Many industry-relevant simulations have characteristics in their physics or geometry that can be exploited to improve computational efficiency. Furthermore, the choice of simulation algorithm impacts computational efficiency significantly. In this work, we exploit these features to significantly improve performance for a class of problems. Specifically, we use the discontinuous Galerkin (DG) finite element method, along with the Gauss-Lobatto-Legendre (GLL) integration scheme on hexahedral elements with straight faces, which then greatly reduces the number of BLAS operations, and simplify the computations to Level-1 BLAS operations, reducing the turn around time for wave simulation. However, attaining peak performance of GPUs is often not possible in these codes that exacerbate bottlenecks caused by data movement, even when modern GPUs enjoying the latest high-bandwidth memory are being used. We have developed GAPS, an efficient and scalable, GPU-accelerated PDE solver for Wave Simulation, by using hardwareand data-movement-aware algorithms. While significant speed-up over CPUs can be achieved, data movement still limits GPU performance. We present several optimization strategies, including kernel fusion, Look-Up-Table-based neighbor search, improved shared memory utilization, and SM-occupancy-aware register allocation. They improve performance up to 84.15x over CPU implementations and 1.84x over base GPU implementations on average. We then extend GAPS to support multi-GPUs on multi-node HPC clusters for large-scale wave simulations, and perform additional optimizations to reduce communication overhead. We also investigate the perfor
center dot Currently, domain propagation in state-of-the-art MIP solvers is single thread only. center dot The paper presents a novel, efficient GPU algorithm to perform domain propagation. center dot Challenges are d...
详细信息
center dot Currently, domain propagation in state-of-the-art MIP solvers is single thread only. center dot The paper presents a novel, efficient GPU algorithm to perform domain propagation. center dot Challenges are dynamic algorithmic behavior, dependency structures, sparsity patterns. center dot The algorithm is capable of running entirely on the GPU with no CPU involvement. center dot We achieve speed-ups of around 10x to 20x, up to 180x on favorably-large instances.
An emerging datacenter network (DCN) with high scalability called HSDC is a server-centric DCN that can help cloud computing in supporting many inherent cloud services. For example, a server-centric DCN can initiate r...
详细信息
An emerging datacenter network (DCN) with high scalability called HSDC is a server-centric DCN that can help cloud computing in supporting many inherent cloud services. For example, a server-centric DCN can initiate routing for data transmission. This paper investigates the construction of independent spanning trees (ISTs for short), a set of the rooted spanning trees associated with the disjoint-path property, in HSDC. Regarding multiple spanning trees as routing protocol, ISTs have applications in data transmission, e.g., fault-tolerant broadcasting and secure message distribution. We first establish the vertex-symmetry of HSDC. Then, by the structure that n-dimensional HSDC is a compound graph of an n-dimensional hypercube Q(n) and n-clique K-n, we amend the algorithm constructing ISTs for Q(n) to obtain the algorithm required by HSDC. Unlike most algorithms of recursively constructing tree structures, our algorithm can find every node's parent in each spanning tree directly via an easy computation relied upon only the node address and tree index. Consequently, we can implement the algorithm for constructing n ISTs in O(nN) time, where N = n2(n) is the number of vertices of n-dimensional HSDC;or parallelize the algorithm in O(n) time using Nprocessors. Remarkably, the diameter of the constructed ISTs is about twice the diameter of Q(n). (C) 2021 Elsevier Inc. All rights reserved.
We consider the problem of nonnegative tensor completion. We adopt the alternating optimization framework and solve each nonnegative matrix least-squares problem via an accelerated variation of the stochastic gradient...
详细信息
ISBN:
(纸本)9789082797091
We consider the problem of nonnegative tensor completion. We adopt the alternating optimization framework and solve each nonnegative matrix least-squares problem via an accelerated variation of the stochastic gradient descent. The step-sizes used by the algorithm determine, to a high extent, its behavior. We propose two new strategies for the computation of step-sizes and we experimentally test their effectiveness using both synthetic and real-world data.
Subgraph isomorphism is one of the most challenging problems on graph-based representations. Despite many efficient sequential algorithms have been proposed over the last decades, solving this problem on large graphs ...
详细信息
ISBN:
(纸本)9783030739720;9783030739737
Subgraph isomorphism is one of the most challenging problems on graph-based representations. Despite many efficient sequential algorithms have been proposed over the last decades, solving this problem on large graphs is still a time demanding task. For this reason, there is a recently growing interest in realizing effective parallel algorithms able to exploit at their best the modern multi-core architectures commonly available on servers and workstations. We propose a comparison of four parallel algorithms derived from the state-of-the-art sequential algorithm VF3-Light;two of them were presented in previous works, while the other two are introduced in this paper. In order to evaluate strong points and weaknesses of each algorithm, we performed a benchmark over six datasets of random large and dense graphs, both labelled and unlabelled, measuring memory usage, speed-up and efficiency. We also add a comparison with a different parallel algorithm, named Glasgow, that is not derived from VF3-Light.
Interval constraint satisfaction problems (CSPs) are typically hard to solve and, therefore, desirable candidates for acceleration. Although there were successful attempts in this area, several paths remain unexplored...
详细信息
ISBN:
(纸本)9781665468282
Interval constraint satisfaction problems (CSPs) are typically hard to solve and, therefore, desirable candidates for acceleration. Although there were successful attempts in this area, several paths remain unexplored. Let's describe, discuss, and generalize our findings among partial algorithms and approaches used for interval CSP solving. We have divided the interval CSP solving process into several levels of abstraction. We analyzed them individually to find common traits and patterns among them. These can indicate possible areas for future acceleration attempts, especially on edge systems where effectiveness plays an important role.
暂无评论