Modeling and simulation are crucial in high-performance computing (HPC), with numerous frameworks developed for distributed computing infrastructures and their applications. Despite node-level simulation of shared-mem...
详细信息
ISBN:
(纸本)9783031407437;9783031407444
Modeling and simulation are crucial in high-performance computing (HPC), with numerous frameworks developed for distributed computing infrastructures and their applications. Despite node-level simulation of shared-memory systems and task-based parallel applications, existing works overlook non-uniform memory access (NUMA) effects, a critical characteristic of current HPC platforms. In this work, we introduce a modeling for complex NUMA architectures and enhance a simulator for dependency-based task-parallel applications. this facilitates experiments with varied data locality models: we refine a communication-oriented model leveraging topology information for data transfers, and devise a more intricate model incorporating a cache mechanism for last-level cache data storage. Dense linear algebra test cases are used to validate both models, demonstrating that our simulator reliably predicts execution time with minimal relative error.
In this work, we introduce and study a set of tree-based algorithms for resources allocation considering group dependencies between their parameters. Real world distributed and high-performance computing systems often...
详细信息
Nodes in the Lightning Network synchronise routing information through a gossip protocol that makes use of a staggered broadcast mechanism. In this work, we show that the convergence delay in the network is larger tha...
详细信息
Serverless computing has become a popular cloud computing paradigm. By default, when a serverless function fails, the serverless platform re-executes the function to tolerate the failure. However, such a retry-based a...
ISBN:
(纸本)9781939133342
Serverless computing has become a popular cloud computing paradigm. By default, when a serverless function fails, the serverless platform re-executes the function to tolerate the failure. However, such a retry-based approach requires functions to be idempotent, which means that functions should expose the same behavior regardless of retries. this requirement is challenging for developers, especially when functions are stateful. Failures may cause functions to repeatedly read and update shared states, potentially corrupting data consistency. this paper presents Flux, the first toolkit that automatically verifies the idempotence of serverless applications. It proposes a new correctness definition, idempotence consistency, which stipulates that a serverless function's retry is transparent to users. To verify idempotence consistency, Flux defines a novel property, idempotence simulation, which decomposes the proof for a concurrent serverless application into the reasoning of individual functions. Furthermore, Flux extends existing verification techniques to realize automated reasoning, enabling Flux to identify idempotence-violating operations and fix them with existing log-based methods. We demonstrate the efficacy of Flux with 27 representative serverless applications. Flux has successfully identified previously unknown issues in 12 applications. Developers have confirmed 8 issues. Compared to state-of-the-art systems (namely Beldi and Boki) that log every operation, Flux achieves up to 6x lower latency and 10x higher peak throughput, as it logs only the identified idempotence-violating ones.
Withthe development of deep reinforcement learning and multi-agent modeling, Multi-Agent Reinforcement Learning (MARL) has become a very active research topic recently. Q-Mix is a popular algorithm for solving MARL t...
详细信息
ISBN:
(纸本)9783030954055;9783030954048
Withthe development of deep reinforcement learning and multi-agent modeling, Multi-Agent Reinforcement Learning (MARL) has become a very active research topic recently. Q-Mix is a popular algorithm for solving MARL tasks where the individual agents are allowed to be trained in a centralized manner. As the scale and complexity of MARL tasks grow, there is an urging requirement for a more efficient training strategy. As a consequence, it is demanding to develop a Q-Mix training algorithm which can benefit from parallel computation. However, how classic distributed machine learning frameworks work with Q-Mix is a less studied problem. In this paper, we propose the PS-Qmix algorithm to apply the Parameter Server framework to training QMix agents in parallel. Our algorithm employs multiple distributed worker threads for data generation and model learning, where these two processes are decoupled and executed in alternation. To cater for different simulation speed of the environment, the proposed algorithm allows the user to tune the relative proportion of computation allocated to data generation and model learning. We evaluate the PS-Qmix algorithm on a StarCraft II micro-combat task. As we increase the number of worker threads, we observe significant speed-up in both data generation and model learning. the evaluation results indicate that our method is effective in utilizing distributed computation resources to train Q-Mix agents.
the increased number of distributed generation sources in distribution networks in recent years has emphasized the importance of their accurate and computationally efficient modelling. the development of efficient mod...
the increased number of distributed generation sources in distribution networks in recent years has emphasized the importance of their accurate and computationally efficient modelling. the development of efficient models becomes more difficult and complex withthe introduction of novel controllers that aim to support the grid, often emulating the operation of passive circuit elements. One of such controllers has been recently proposed that emulates the operation of an admittance to support the voltage at the point of the converter grid connection. In this paper, an aggregate model of distributed energy resources controlled using virtual admittance approach and connected in parallel is derived. the resemblance of the controller to classical circuit elements is used to obtain an equivalent circuit for each converter including its controller part and the distribution line connecting it to the main grid. then, circuit analysis techniques are applied to synthesize the aggregate model of all the parallel converters in the network, simplifying the assessment of their combined effect on the rest of the network and reducing the computational burden. simulation results from Matlab and its SimPowerSystems toolbox are used to validate the accuracy of the proposed model.
this paper investigates discretization chattering effects in a distributed-parameter process governed by the diffusion PDE and equipped with a sliding-mode-based boundary controller. the collocated measurement is samp...
详细信息
ISBN:
(数字)9798350353686
ISBN:
(纸本)9798350353693
this paper investigates discretization chattering effects in a distributed-parameter process governed by the diffusion PDE and equipped with a sliding-mode-based boundary controller. the collocated measurement is sampled in time and the boundary control signal is applied through a zero-order-hold element. the limit cycle modes constituting the chattering effects are determined exactly by exploiting the difference equation method, yielding their period and spatially dependent amplitude. simulation results show that the theoretically determined limit cycle modes occur in the closed-loop system.
this paper describes: 1) the use of feed rate scheduling software to predict the radial depth of cut variation for three-axis milling toolpaths and;2) the use of this radial depth profile in a time-domain simulation t...
详细信息
this paper describes: 1) the use of feed rate scheduling software to predict the radial depth of cut variation for three-axis milling toolpaths and;2) the use of this radial depth profile in a time-domain simulation to predict dynamic cutting forces. the time-domain simulation, which also includes the tool tip frequency response functions and force model (which relates the cutting force components to the chip geometry) as inputs, enables dynamic force profiles to be predicted and parameter combinations that cause chatter to be identified. A ramp geometry is selected that provides constantly varying radial depth and force predictions are completed at multiple axial depths for comparison to measured *** (c) 2022 the Authors. this is an open access article under the CC BY-NC-ND license (https://***/licenses/by-nc-nd/4.0/)
Scientific parallel applications often use MPI for inter-node communications and OpenMP for intra-node orches-tration. parallel applications such as particle transport, seismic wave propagation simulator, or Finite-El...
详细信息
暂无评论