High-fidelity flow simulations are indispensable when analyzing systems exhibiting multiphase flow phenomena. The accuracy of multiphase flow simulations is strongly contingent upon the finest mesh resolution used to ...
详细信息
ISBN:
(纸本)9798350337662
High-fidelity flow simulations are indispensable when analyzing systems exhibiting multiphase flow phenomena. The accuracy of multiphase flow simulations is strongly contingent upon the finest mesh resolution used to represent the fluid-fluid interfaces. However, the increased resolution comes at a higher computational cost. In this work, we propose algorithmic advances that aim to reduce the computational cost without compromising on the physics by selectively detecting key regions of interest (droplets/filaments) that require significantly higher resolution. The framework uses an adaptive octree-based meshing framework that is integrated with PETSc's linear algebra solvers. We demonstrate scaling of the framework up to 114,688 processes on TACC's Frontera. Finally, we deploy the framework to simulate one of the most resolved simulations of primary jet atomization. This simulation - equivalent to 35 trillion grid points on a uniform grid - is 64x larger than current state-of-the-art simulations and provides unprecedented insights into an important flow physics problem with a diverse array of engineering applications.
The increasing volume and complexity of IoT systems demand a transition from the cloud-centric model to a decentralized IoT architecture in the so called Computing Continuum, with no or minimal reliance on central ser...
详细信息
We introduce a distributed memory parallel algorithm for force-directed node embedding that places vertices of a graph into a low-dimensional vector space based on the interplay of attraction among neighboring vertice...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
We introduce a distributed memory parallel algorithm for force-directed node embedding that places vertices of a graph into a low-dimensional vector space based on the interplay of attraction among neighboring vertices and repulsion among distant vertices. We develop our algorithms using two sparse matrix operations, SDDMM and SpMM. We propose a configurable pull -push -based communication strategy that optimizes memory usage and data transfers based on computing resources and asynchronous MPI communication to overlap communication and computation. Our algorithm scales up to 256 nodes on distributed supercomputers by surpassing the performance of state-of-the-art algorithms
Non-Uniform Memory Access (NUMA) systems are preva-lent in HPC, where optimal thread and page placement are crucial for enhancing performance and minimizing energy us-age [1]-[3]. Moreover, considering that NUMA syste...
详细信息
Concurrent queue algorithms have been subject to extensive research. However, the target hardware and evaluation methodology on which the published results for any two given concurrent queue algorithms are based often...
详细信息
ISBN:
(纸本)9798350337662
Concurrent queue algorithms have been subject to extensive research. However, the target hardware and evaluation methodology on which the published results for any two given concurrent queue algorithms are based often share only minimal overlap. A meaningful comparison is, thus, exceedingly difficult. With the continuing trend towards more and more heterogeneous systems, it is becoming more and more important to not only evaluate and compare novel and existing queue algorithms across a wider range of target architectures, but to also be able to continuously re-evaluate queue algorithms in light of novel architectures and capabilities. To address this need, we present AnyQ, an evaluation framework for concurrent queue algorithms. We design a set of programming abstractions that enable the mapping of concurrent queue algorithms and benchmarks to a wide variety of target architectures. We demonstrate the effectiveness of these abstractions by showing that a queue algorithm expressed in a portable, high-level manner can achieve performance comparable to handcrafted implementations. We design a system for testing and benchmarking queue algorithms. Using the developed framework, we investigate concurrent queue algorithm performance across a range of both CPU as well as GPU architectures. In hopes that it may serve the community as a starting point for building a common repository of concurrent queue algorithms as well as a base for future research, all code and data is made available as open source software at https://***/anyq.
Exploration of different network topologies is one of the fundamental problems of distributedsystems. The problem has been studied on networks like lines, rings, tori, rectangular grids, etc. In this work, we have co...
详细信息
ISBN:
(纸本)9783031744976;9783031744983
Exploration of different network topologies is one of the fundamental problems of distributedsystems. The problem has been studied on networks like lines, rings, tori, rectangular grids, etc. In this work, we have considered a rectangle enclosed triangular grid (RETG). A RETG is a part of an infinite triangular grid and the part is enclosed by a rectangle whose one pair of parallel sides aligns with a family of parallel straight lines of the infinite triangular grid. We have studied the problem of perpetual exploration on a RETG using oblivious robots. We have considered the robots with limited visibility i.e. the robots are myopic. Infinite visibility becomes impractical for a very large network. Limited visibility is more practical than infinite visibility. The robots have neither any chirality nor any axis agreement. An algorithm is provided to explore the RETG perpetually without any collision. The algorithm works under a synchronous scheduler. The algorithm requires three robots with two hop visibility.
The rapid growth of Internet of Things (IoT) and au-tonomous systems has led to the deployment of edge devices close to the sensing data source for low-latency computation.
ISBN:
(纸本)9781665497473
The rapid growth of Internet of Things (IoT) and au-tonomous systems has led to the deployment of edge devices close to the sensing data source for low-latency computation.
The proceedings contain 3 papers. The topics discussed include: analysis and evaluation of load management strategies in a decentralized FaaS environment: a simulation-based framework;live migration of multi-container...
ISBN:
(纸本)9798400706479
The proceedings contain 3 papers. The topics discussed include: analysis and evaluation of load management strategies in a decentralized FaaS environment: a simulation-based framework;live migration of multi-container Kubernetes pods in multi-cluster serverless edge systems;and comparing actor-critic and neuroevolution approaches for traffic offloading in FaaS-powered edge systems.
With the growing complexity of modern internet networks, we introduce a distributed, asynchronous, and scalable algorithm tailored for determining the centrality and importance of various levels of the global internet...
详细信息
ISBN:
(纸本)9798350395679;9798350395662
With the growing complexity of modern internet networks, we introduce a distributed, asynchronous, and scalable algorithm tailored for determining the centrality and importance of various levels of the global internet network topology at a massive scale. By utilizing triangle formations and neighborhood densities in these complex networks, our algorithm provides valuable insights into the structural importance of individual nodes and the intricate relationships among interconnected entities. Focusing on the global internet's prime components routers, IP addresses, and Autonomous systems - the algorithm provides a scalable solution extending to the global internet infrastructure, datacenters, cloud networks, and private networks. By taking advantage of the parallelism, distributed processing, efficient communication, and fine-grained asynchronous execution through an Actor-based programming system, our algorithm is capable of efficiently processing and performing rapid computations on large-scale internet networks. We perform scalability studies on the NERSC Perlmutter supercomputer and the Georgia Tech HPC PACE cluster using three large-scale real-world network datasets and one large-scale synthetic network dataset, achieving up to 91.7% parallel efficiency scaling out to 2K cores and reducing execution time from 5.7 hours to 18.3 seconds, while performing 105.4x better on average compared to related approaches. Our research contributes significantly to facilitating enhanced network management, fault tolerance and resilience, and security, but as well to improving overall network stability, reducing latency, and optimizing energy consumption across the vast internet network devices environment.
The proceedings contain 47 papers. The topics discussed include: ElasticRoom: multi-tenant DNN inference engine via co-design with resource-constrained compilation and strong priority scheduling;efficient all-to-all c...
ISBN:
(纸本)9798400704130
The proceedings contain 47 papers. The topics discussed include: ElasticRoom: multi-tenant DNN inference engine via co-design with resource-constrained compilation and strong priority scheduling;efficient all-to-all collective communication schedules for direct-connect topologies;ESG: pipeline-conscious efficient scheduling of DNN workflows on serverless platforms with shareable GPUs;ETS: deep learning training iteration time prediction based on execution trace sliding window;IDT: intelligent data placement for multi-tiered main memory with reinforcement learning;FaaSKeeper: learning from building serverless services with ZooKeeper as an example;accelerating function-centric applications by discovering, distributing, and retaining reusable context in workflow systems;and Faast: an efficient serverless framework made snapshot-based function response fast.
暂无评论