The present work is devoted to the development and use of supercomputer technology for modeling gas-dynamic processes using molecular dynamics methods. The relevance of the work is associated with the development of n...
详细信息
The present work is devoted to the development and use of supercomputer technology for modeling gas-dynamic processes using molecular dynamics methods. The relevance of the work is associated with the development of nanotechnology for the manufacture of promising nanomaterials and nanocoatings by the method of supersonic cold gasdynamic spraying. The paper proposes to consider gasdynamic processes at the atomic-molecular level. It is proposed to use Newton's equations for particle dynamics as a mathematical model. The numerical implementation of the model is based on the well-known Verlet scheme. The software implementation of the numerical scheme is adapted for parallel computing on modern supercomputers, including those with a hybrid architecture. To illustrate the developed supercomputer technology, the problem of calculating the parameters of the flow of nitrogen in a microchannel with nickel walls is considered. The conducted computational experiments demonstrated the effectiveness of the developed technology.
K-Nearest Neighbor (k-NN) search is one of the most commonly used approaches for similarity search. It finds extensive applications in machine learning and data mining. This era of big data warrants efficiently scalin...
详细信息
ISBN:
(纸本)9781728166773
K-Nearest Neighbor (k-NN) search is one of the most commonly used approaches for similarity search. It finds extensive applications in machine learning and data mining. This era of big data warrants efficiently scaling k-NN search algorithms for billion-scale datasets with high dimensionality. In this paper, we propose a solution towards this end where we use vantage point trees for partitioning the dataset across multiple processes and exploit an existing graph-based sequential approximate k-NN search algorithm called HNSW (Hierarchical Navigable Small World) for searching locally within a process. Our hybrid MPI-OpenMP solution employs techniques including exploiting MPI one-sided communication for reducing communication times and partition replication for better load balancing across processes. We demonstrate computation of k-NN for 10,000 queries in the order of seconds using our approach on similar to 8000 cores on a dataset with billion points in an 128-dimensional space. We also show 10X speedup over a completely k-d tree-based solution for the same dataset, thus demonstrating better suitability of our solution for high dimensional datasets. Our solution shows almost linear strong scaling.
k-Core decomposition is a well-studied community detection problem in graph analytics in which each k-core of vertices induces a subgraph where all vertices have degree at least k. The decomposition is expensive to co...
详细信息
ISBN:
(纸本)9781450388146
k-Core decomposition is a well-studied community detection problem in graph analytics in which each k-core of vertices induces a subgraph where all vertices have degree at least k. The decomposition is expensive to compute on large graphs and efforts to apply massive parallelism have had limited success. This paper presents a vectorisation of the problem that reframes it as a composition of vector primitives on flat, 1d arrays. With such a formulation, we can deploy highly optimised Deep Learning GPU and SIMD frameworks. On a moderate GPU, using PyTorch, we obtain up to 8x improvement over the best parallel state-of-the-art implemented in C++ and running on an expensive 32-core machine. More importantly, our approach represents a novel abstraction showing that redesigning graph operations as a series of vectorised primitives makes highly-parallel analytics both easier and more accessible for developers. We posit that such an approach can vastly accelerate the use of cheap GPU hardware in complex graph analytics.
Automated whisker tracking is important for researching active touch in rodents. Earlier efforts to detect whiskers and represent them in a small set of parameters were either not accurate enough to enable tracking ov...
详细信息
ISBN:
(纸本)9781728134277
Automated whisker tracking is important for researching active touch in rodents. Earlier efforts to detect whiskers and represent them in a small set of parameters were either not accurate enough to enable tracking over time, or computationally expensive. In this article we propose an algorithm to cluster whisker centerline points, detected through a curvilinear structure algorithm, using the shape of smaller clusters to form bigger clusters of centerline points. After that, a least-squares approach is used to define each whisker by a set of four parameters. We implemented the algorithm in MATLAB in a parallelized fashion, and found that the processing time per frame is reasonable in MATLAB, and is likely to be short when ported to a lower-level language. When tested on a 33,634-frame segment, 89.2% of the whiskers could be represented in an abstract fashion by four parameters with a mean-squares fitting error of lower than 10 pixels, and visual inspection shows that crossing whiskers are detected and parameterized in an accurate way.
Railway alignment optimization is a large-scale and time-consuming civil engineering problem. To solve it, a three-dimensional distance transform (3D-DT) algorithm, which is a variant of the three-dimensional Euclidea...
详细信息
Railway alignment optimization is a large-scale and time-consuming civil engineering problem. To solve it, a three-dimensional distance transform (3D-DT) algorithm, which is a variant of the three-dimensional Euclidean distance transform (3D-EDT), was previously designed. However, that algorithm is quite computationally intensive. In addition, the 3D-DT is inherently sequential, and it is thus challenging to parallelize. Thus, this study focuses on improving the sequential 3D-DT by transforming it into a parallel one. First, existing representative parallel EDT methods are reviewed and assessed. Then the railway alignment optimization model and the sequential 3D-DT are described. After that, critical execution properties of the 3D-DT that significantly influence its parallelization are explored in depth. Lastly, a novel so-called parallel linkage method is presented. This parallel implementation, which is developed using the OpenMP library, is highly effective and scalable by fully exploiting the parallelism of the algorithm. Using this parallel 3D-DT method, a large-scale, real-world railway case is tested and analyzed in detail. The outcomes verify that the proposed parallel method can accelerate the optimization process significantly without reducing the quality of computation results.
Abstract—: This paper considers modeling the processes of cleaning air from finely dispersed solid contaminants clustered in the form of nanoparticles. The purification technology chosen for the study involves the us...
详细信息
In today's time, Molecular Science is increasingly dependent on Software Engineering Calculations in the department of Research and Development. Aligning sequences of DNA, RNA is becoming a major part of present-d...
详细信息
ISBN:
(纸本)9789811510847;9789811510830
In today's time, Molecular Science is increasingly dependent on Software Engineering Calculations in the department of Research and Development. Aligning sequences of DNA, RNA is becoming a major part of present-day natural sciences. A hereditary database holds a large amount of unprocessed data that resides very crucial information. A single pair of chromosomes contains approximately 3 billion DNA base pairs. To look through this information, retrieve the connections, and subconnections in it is a very slow procedure. Therefore, scientists are looking forward toward computer science algorithms for faster retrieval of information. This paper is focused upon using a parallel programming algorithm than the previous alignment algorithms.
algorithms for dynamically maintaining minimum spanning trees (MSTs) have received much attention in both the parallel and sequential settings. While previous work has given optimal algorithms for dense graphs, all ex...
详细信息
ISBN:
(纸本)9781450369350
algorithms for dynamically maintaining minimum spanning trees (MSTs) have received much attention in both the parallel and sequential settings. While previous work has given optimal algorithms for dense graphs, all existing parallel batch-dynamic algorithms perform polynomial work per update in the worst case for sparse graphs. In this paper, we present the first work-efficient parallel batch-dynamic algorithm for incremental MST, which can insert l edges in O(l lg(1 + n/l)) work in expectation and O(polylog(n)) span w.h.p. The key ingredient of our algorithm is an algorithm for constructing a compressed path tree of an edge-weighted tree, which is a smaller tree that contains all pairwise heaviest edges between a given set of marked vertices. Using our batch-incremental MST algorithm, we demonstrate a range of applications that become efficiently solvable in parallel in the sliding-window model, such as graph connectivity, approximate MSTs, testing bipartiteness, k-certificates, cycle-freeness, and maintaining sparsifiers.
Graph Analytics is important in different domains: social networks, computer networks, and computational biology to name a few. This paper describes the challenges involved in programming the underlying graph algorith...
详细信息
ISBN:
(纸本)9783030369873;9783030369866
Graph Analytics is important in different domains: social networks, computer networks, and computational biology to name a few. This paper describes the challenges involved in programming the underlying graph algorithms for graph analytics for distributed systems with CPU, GPU, and multi-GPU machines and how to deal with them. It emphasizes how language abstractions and good compilation can ease programming graph analytics on such platforms without sacrificing implementation efficiency.
Electronic structure calculations based on density-functional theory (DFT) represent a significant part of today's HPC workloads and pose high demands on high-performance computing resources. To perform these quan...
详细信息
ISBN:
(纸本)9781728199986
Electronic structure calculations based on density-functional theory (DFT) represent a significant part of today's HPC workloads and pose high demands on high-performance computing resources. To perform these quantum-mechanical DFT calculations on complex large-scale systems, so-called linear scaling methods instead of conventional cubic scaling methods are required. In this work, we take up the idea of the submatrix method and apply it to the DFT computations in the software package CP2K. For that purpose, we transform the underlying numeric operations on distributed, large, sparse matrices into computations on local, much smaller and nearly dense matrices. This allows us to exploit the full floating-point performance of modern CPUs and to make use of dedicated accelerator hardware, where performance has been limited by memory bandwidth before. We demonstrate both functionality and performance of our implementation and show how it can he accelerated with GPM and FPGAs.
暂无评论