Researchers conduct post-processing on the simulation results by running an interactive data analysis tool on a High-Performance Computing (HPC) system installed at an HPC center and retrieving the post-processed resu...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
Researchers conduct post-processing on the simulation results by running an interactive data analysis tool on a High-Performance Computing (HPC) system installed at an HPC center and retrieving the post-processed results. Certain data analysis scenarios require to transfer the simulation results directly from the center. in such scenarios, a portion of the data would usually be streamed over the network to achieve interactivity. However, there still exist two challenges in maintaining interactivity: (1) limited network bandwidth and (2) long network latency. To tackle these challenges, we propose a system to enable interactive array analysis over the network. We employ error-bounded lossy compression to increase the effective network bandwidth. Furthermore, we employ multi-level caching to hide the network latency and combine prefetching to improve the cache hit ratio. The cache replacement and prefetching policies are designed considering the data access pattern of interactive analysis. We compared our proposed system with TileDB, one of the state-of-the-art array databases, by measuring the average latency for various access patterns. Compared to TileDB, the proposed system reduces the average latency by up to 91.6% by allowing 10% of error because the cache hit ratio was improved by more than 40% due to the proper cache replacement and prefetching policy and network transfer time was reduced more than 75% by using lossy compression.
Stealth addresses protect recipient identity privacy in blockchain systems by allowing a sender to derive a stealth address using the recipient's public key, with the receiver deriving a corresponding one-time pri...
详细信息
The impedance of power source of the microgrid cannot be regarded as infinite, thus it may cause stability problems, especially when the load is a pulse power load. The black box method can not be well applied to puls...
详细信息
We identify the graph data structure, frontiers, operators, an iterative loop structure, and convergence conditions as essential components of graph analytics systems based on the native-graph approach. Using these es...
详细信息
ISBN:
(纸本)9781665497473
We identify the graph data structure, frontiers, operators, an iterative loop structure, and convergence conditions as essential components of graph analytics systems based on the native-graph approach. Using these essential components, we propose an abstraction that captures all the significant programming models within graph analytics, such as bulksynchronous, asynchronous, shared-memory, message-passing, and push vs. pull traversals. Finally, we demonstrate the power of our abstraction with an elegant modern C++ implementation of single-source shortest path and its required components.
The roulette wheel selection is a critical process in heuristic algorithms, enabling the probabilistic choice of items based on assigned fitness values. It selects an item with a probability proportional to its fitnes...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
The roulette wheel selection is a critical process in heuristic algorithms, enabling the probabilistic choice of items based on assigned fitness values. It selects an item with a probability proportional to its fitness value. This technique is commonly employed in ant-colony algorithms to randomly determine the next city to visit when solving the traveling salesman problem. Our study focuses on parallel algorithms designed to select one of multiple processors, each associated with fitness values, using random wheel selection. We propose a novel approach called logarithmic random bidding, which achieves an expected runtime logarithmic to the number of processors with non-zero fitness values, using the CRCW-PRAM model with a shared memory of constant size. Notably, the logarithmic random bidding technique demonstrates efficient performance, particularly in scenarios where only a few processors are assigned non-zero fitness values.
Powerful abstractions such as dataframes are only as efficient as their underlying runtime system. The de-facto distributed data processing framework, Apache Spark, is poorly suited for the modern cloud-based data-sci...
详细信息
ISBN:
(纸本)9781665481069
Powerful abstractions such as dataframes are only as efficient as their underlying runtime system. The de-facto distributed data processing framework, Apache Spark, is poorly suited for the modern cloud-based data-science workloads due to its outdated assumptions: static datasets analyzed using coarse-grained transformations. In this paper, we introduce the Indexed DataFrame, an in-memory cache that supports a dataframe abstraction which incorporates indexing capabilities to support fast lookup and join operations. Moreover, it supports appends with multi-version concurrency control. We implement the Indexed DataFrame as a lightweight, standalone library which can be integrated with minimum effort in existing Spark programs. We analyze the performance of the Indexed DataFrame in cluster and cloud deployments with real-world datasets and benchmarks using both Apache Spark and Databricks Runtime. In our evaluation, we show that the Indexed DataFrame significantly speeds-up query execution when compared to a nonindexed dataframe, incurring modest memory overhead.
Community detection refers to the identification of coherent partitions in networks. In this poster, we present a parallel dynamic Louvain algorithm that finds conummities in rapidly evolving graphs. Given a batch upd...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
Community detection refers to the identification of coherent partitions in networks. In this poster, we present a parallel dynamic Louvain algorithm that finds conummities in rapidly evolving graphs. Given a batch update of edge deletions or insertions, our algorithm identifies an approximate set of affected vertices in the graph with minimal overhead and updates the community membership of each vertex. This process repeats until convergence. Our approach achieves a mean speedup of 7.3 x, compared to our parallel and optimized implementation of Delta-screening combined with Louvain, a recently proposed stateof-the-art approach.
Preconditioned iterative methods based on the Krylov subspace technique are widely employed in various scientific and technical computing. When utilizing large-scale parallel computing systems, the communication overh...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
Preconditioned iterative methods based on the Krylov subspace technique are widely employed in various scientific and technical computing. When utilizing large-scale parallel computing systems, the communication overhead tends to increase with the growth in the number of nodes, making its reduction a crucial challenge. In parallel finite element methods (FEM) and finite volume methods (FVM), halo communication and computation overlapping (CC-Overlapping) are commonly employed, often in conjunction with the dynamic loop scheduling feature of OpenMP. This approach has been primarily applied to sparse matrix-vector products (SpMV) and explicit solvers. Previous studies by the author have proposed reordering techniques for applying CC-Overlapping to processes involving global data dependencies, such as the Conjugate Gradient method preconditioned by Incomplete Cholesky Factorization (ICCG). Successful implementations on massively parallel supercomputers demonstrated high parallel performance, but the application of CC-Overlapping was limited to SpMV. In the present work, the author proposes a method to apply CC-Overlapping to the forward and backward substitutions of the IC(0) smoother of the parallel Conjugate Gradient method preconditioned by Multigrid (MGCG). Using up to 4,096 nodes on Wisteria/EMEC-01 (Odyssey) with A64FX, performance improvement of approximately 40+% was achieved compared to the original implementation, while improvement was 20+% on 1,024 nodes of Oakbridge-CX system with Intel Xeon CPU's.
Tridiagonal systems are among the most fundamental computations in science, engineering, and mathematics, and one solver used in such systems is Tree Partitioning Reduction (TPR), which is a divide-and-conquer method ...
详细信息
ISBN:
(纸本)9781665497473
Tridiagonal systems are among the most fundamental computations in science, engineering, and mathematics, and one solver used in such systems is Tree Partitioning Reduction (TPR), which is a divide-and-conquer method that solves large-scale linear equations by dividing them and then computing the parts in parallel within different local memory threads. Herein, we propose an improved TPR algorithm that has a parallel cyclic reduction flavor, with which we reduced the number of algorithm steps by approximately half while simultaneously increasing arithmetic intensity and cache reusability. A performance evaluation conducted on an Intel Skylake-SP microprocessor showed a high hit ratio for the L1 cache and that our solver was as much as 31 times faster on 32 threads for 262144 equations. In the case of a Nvidia Tesla P100 GPU, our method processed 10 MRow/s more than TPR and cuSPARSE.
While parallel programming, particularly on graphics processing units (GPUs), and numerical optimization hold immense potential to tackle real-world computational challenges across disciplines, their inherent complexi...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
While parallel programming, particularly on graphics processing units (GPUs), and numerical optimization hold immense potential to tackle real-world computational challenges across disciplines, their inherent complexity and technical demands often act as daunting barriers to entry. This, unfortunately, limits accessibility and diversity within these crucial areas of computer science. To combat this challenge and ignite excitement among undergraduate learners, we developed an application-driven course, harnessing robotics as a lens to demystify the intricacies of these topics making them tangible and engaging. Our course's prerequisites are limited to the required undergraduate introductory core curriculum, opening doors for a wider range of students. Our course also features a large final-project component to connect theoretical learning to applied practice. In our first offering of the course we attracted 27 students without prior experience in these topics and found that an overwhelming majority of the students fell that they learned both technical and soft skills such that they felt prepared for future study in these fields.
暂无评论