The United Nations' (UN) 2030 Agenda for Sustainable Development 6th goal is of special interest here: "Ensure availability and sustainable management of water and sanitation for all". To manage consumpt...
详细信息
ISBN:
(纸本)9783031777370;9783031777387
The United Nations' (UN) 2030 Agenda for Sustainable Development 6th goal is of special interest here: "Ensure availability and sustainable management of water and sanitation for all". To manage consumption water, it's surely fundamental to be able to measure, diagnose and control the whole path from it's origin in natural reservoirs till the end customers, requiring a net of sensors along the Water Distribution System (WDS). Such a network of sensors, will generate data over time constituting a dataset on which computer intelligence algorithms can work to detect abnormal behaviour, especially leakages. Unfortunately in Brazil the initiatives in monitoring WDS through sensors are very incipient. Since we don't have a real dataset with sensor information over time yet, in this work we aim to provide a solution to this part of the problem through simulation. In 2018, LeakDB [1] was presented as a "benchmark dataset for leakage diagnosis in water distribution systems", in fact a very important and interesting work. However it has some drawbacks, especially inconsistent data, which prevents its use as a solid basis. In this paper we improve LeakDB implementing LeakG3PD, opening the path to analyze the viability of different solutions in water management, with special focus on leakages.
Calculation of many-body correlation functions is one of the critical kernels utilized in many scientific computing areas, especially in Lattice Quantum Chromodynamics (Lattice QCD). It is formalized as a sum of a lar...
详细信息
ISBN:
(纸本)9781665481069
Calculation of many-body correlation functions is one of the critical kernels utilized in many scientific computing areas, especially in Lattice Quantum Chromodynamics (Lattice QCD). It is formalized as a sum of a large number of contraction terms each of which can be represented by a graph consisting of vertices describing quarks inside a hadron node and edges designating quark propagations at specific time intervals. Due to its computation- and memory-intensive nature, real-world physics systems (e.g., multi-meson or multi-baryon systems) explored by Lattice QCD prefer to leverage multi-GPUs. Different from general graph processing, many-body correlation function calculations show two specific features: a large number of computation/data-intensive kernels and frequently repeated appearances of original and intermediate data. The former results in expensive memory operations such as tensor movements and evictions. The latter offers data reuse opportunities to mitigate the dataintensive nature of many-body correlation function calculations. However, existing graph-based multi-GPU schedulers cannot capture these data-centric features, thus resulting in a sub-optimal performance for many-body correlation function calculations. To address this issue, this paper presents a multi-GPU scheduling framework, MICCO, to accelerate contractions for correlation functions particularly by taking the data dimension (e.g., data reuse and data eviction) into account. This work first performs a comprehensive study on the interplay of data reuse and load balance, and designs two new concepts: local reuse pattern and reuse bound to study the opportunity of achieving the optimal trade-off between them. Based on this study, MICCO proposes a heuristic scheduling algorithm and a machine-learning-based regression model to generate the optimal setting of reuse bounds. Specifically, MICCO is integrated into a real-world Lattice QCD system, Redstar, for the first time running on multiple GPUs.
The increasing volume of data in digital forensic investigations has outpaced the capabilities of existing forensic systems. Traditional systems such as The Sleuth Kit Hadoop, OCFA, and Hansken, though capable of hand...
详细信息
ISBN:
(数字)9798350355000
ISBN:
(纸本)9798350355017
The increasing volume of data in digital forensic investigations has outpaced the capabilities of existing forensic systems. Traditional systems such as The Sleuth Kit Hadoop, OCFA, and Hansken, though capable of handling large-scale data, fail to offer real-time analysis, which is crucial for prioritizing critical evidence in investigations. This paper addresses this gap by proposing a modern, real-time analysis approach and introducing Forework, a proof-of-concept implementation. Forework aims to deliver real-time, prioritized analysis of forensic artifacts, providing an interactive and scalable solution. It leverages parallel and distributed computing to manage large datasets efficiently, ensuring investigators can quickly focus on the most relevant evidence. This system represents a significant step forward in digital forensics, aligning with contemporary needs for swift and effective data analysis.
The particle Markov-chain Monte Carlo (PMCMC) method is a stochastic algorithm that combines Particle Filters (PFs) and Markov-chain Monte Carlo (MCMC) techniques. This approach is widely used in Bayesian inference fo...
详细信息
ISBN:
(纸本)9798350350920
The particle Markov-chain Monte Carlo (PMCMC) method is a stochastic algorithm that combines Particle Filters (PFs) and Markov-chain Monte Carlo (MCMC) techniques. This approach is widely used in Bayesian inference for high-dimensional state spaces and nonlinear, non-Gaussian dynamic systems. However, current PMCMC accelerators face significant challenges due to their intensive computational complexity and the intricate particle routing, limiting their application in real-time scenarios. To address these challenges, we propose a novel distributed PMCMC method that leverages parallel computing to enhance hardware execution speed. Additionally, our method introduces a particle exchange scheme that not only resolves the accuracy issues caused by particle routing in distributed PMCMC but also achieves faster computing speed. Our design is implemented on a Xilinx Kintex-7 xc7k480t FPGA device. Experimental results demonstrate that our accelerator is nearly 65x faster than CPU performance, and provides speedups up to 5x compared to existing FPGA-based accelerators.
作者:
Zhang, QiLiu, YiLiu, TaoBeihang Univ
Sch Comp Sci Sino German Joint Software Inst Beijing Peoples R China Qilu Univ Technol
Shandong Acad Sci Shandong Prov Key Lab Comp Networks Jinan 250014 Shandong Peoples R China
Achieving microsecond-scale tail latency poses an extreme challenge to the conventional architecture of "NIC-OS-Application" in the face of high concurrent requests. Existing kernel-bypass network systems im...
详细信息
Achieving microsecond-scale tail latency poses an extreme challenge to the conventional architecture of "NIC-OS-Application" in the face of high concurrent requests. Existing kernel-bypass network systems improve this situation significantly. Still, they cannot achieve load-aware in-server requests distribution, which in turn not only harms resource efficiency but, more importantly, beats the goal of squeezing tail latency. This paper proposes iBalancer, an in-server proactive load balancer for the kernel-bypass system, which aggressively handles NIC-side flow scheduling according to the load of threads on the processor-side. Furthermore, we propose a novel metric, "polling time interval (PTI)," to quantify the load of worker threads, which not only indicates utilization of the core bound to the worker thread but also reflects the differences in the processing time of different flows. By scheduling flows according to the metric PTI, iBalancer tends to average the queueing latencies of different flows, such as Set & Get operations for an in-memory key-value store. In addition, by decoupling flow scheduling from packet steering, iBalancer achieves a tail latency aware flow-to-core binding and preserves hardware-based request distribution among cores. The proposed system is evaluated and compared to mTCP and Shenango using two representative microsecond-scale network applications: Memcached KVS and a real-time deep-learning-based financial fraud identification application. Experimental results show that iBalancer can process up to 4.75x and 1.55x higher load over mTCP and Shenango under 500 mu s 99(th) percentile tail latency limit on Memcached. For the financial fraud identification application, iBalancer is able to process 4.56x and 1.16x higher load than mTCP and Shenango considering 900 mu s tail latency.
Stateful serverless systems commonly adopt an architectural paradigm characterized by compute and storage separation within cloud data centers. Nevertheless, guaranteeing prompt response for real-time tasks at the edg...
详细信息
ISBN:
(数字)9798331509712
ISBN:
(纸本)9798331509729
Stateful serverless systems commonly adopt an architectural paradigm characterized by compute and storage separation within cloud data centers. Nevertheless, guaranteeing prompt response for real-time tasks at the edge becomes challenging due to network overheads. This paper introduces a Low-Latency state management framework for real-time tasks in edge serverless systems called LoLa, which adaptively places states proximate to functions, thereby mitigating delays in accessing states within edge serverless systems. Our approach aims at mitigating network latency and optimizing resource utilization by co-locating functions and states, thereby enhancing the system’s overall efficiency. We introduce an adaptive strategy to coordinate the migration of states. It dynamically adjusts the positions of states based on historical data and real-time feedback. Additionally, we designed an in-memory state storage mechanism to facilitate low-latency access and implement a lightweight and fine-grained state management to ensure stored state consistency. Evaluation results showcase the efficacy of LoLa in reducing state read and write latency within edge serverless systems. Specifically, the average response latency is observed to decrease by 65.2% and 38.1% in the best and worst-case scenarios, respectively.
This paper presents a DC microgrid testbed setup that consists of various distributed Energy Resources (DERs) including solar Photovoltaics (PV), supercapacitors for voltage regulation, and Battery Energy Storage Syst...
详细信息
ISBN:
(数字)9798350362848
ISBN:
(纸本)9798350362855
This paper presents a DC microgrid testbed setup that consists of various distributed Energy Resources (DERs) including solar Photovoltaics (PV), supercapacitors for voltage regulation, and Battery Energy Storage systems (BESS). The DC microgrid accommodates both non-flexible and flexible loads which can be dynamically adjusted based on PV power availability. The integration of the setup with the Hyphae Autonomous Power Interchange System (APIS) framework automates energy transfer within the BESS, ensuring efficient power management and optimizing the overall efficiency of the DC microgrid. Furthermore, the setup is validated in terms of the efficacy of the proposed model via real-time simulation, facilitated by the Speedgoat baseline real-time target Hardware-in-the-Loop (HIL) machine. The results demonstrate the model's adeptness in efficiently managing power sharing, emphasizing the capabilities of the DC microgrid setup in terms of performance and reliability in dynamic energy scenarios as well as enhancing the resilience of the grid amidst PV uncertainties.
We propose an improved real-space parallel strategy for the density matrix renormalization group (DMRG) method,where boundaries of separate regions are adaptively distributed during DMRG *** scheme greatly improves th...
详细信息
We propose an improved real-space parallel strategy for the density matrix renormalization group (DMRG) method,where boundaries of separate regions are adaptively distributed during DMRG *** scheme greatly improves the parallel efficiency with shorter waiting time between two adjacent tasks,compared with the original real-space parallel DMRG with fixed *** implement our new strategy based on the message passing interface (MPI),and dynam-ically control the number of kept states according to the truncation error in each DMRG *** study the performance of the new parallel strategy by calculating the ground state of a spin-cluster chain and a quantum chemical Hamiltonian of the water *** maximum parallel efficiencies for these two models are 91% and 76% in 4 nodes,which are much higher than the real-space parallel DMRG with fixed boundaries.
In this work, we introduce and study a set of tree-based algorithms for resources allocation considering group dependencies between their parameters. real world distributed and high-performance computing systems often...
详细信息
暂无评论