In cloud ecosystems, distributed block storage systems are used to provide a persistent block storage service, which is the fundamental building block for operating cloud native services. However, existing distributed...
详细信息
ISBN:
(纸本)9781665445139
In cloud ecosystems, distributed block storage systems are used to provide a persistent block storage service, which is the fundamental building block for operating cloud native services. However, existing distributedstorage systems performed poorly for random write workloads in an all-NVMe storage configuration, becoming CPU-bottlenecked. Our roofline-based approach to performance analysis on a conventional distributed block storage system with NVMe SSDs reveals that the bottleneck does not lie in one specific software module, but across the entire software stack;(1) tightly coupled I/O processing, (2) inefficient threading architecture, and (3) local backend data store causing excessive CPU usage. To this end, we re-architect a modern distributed block storage system for improving random write performance. The key ingredients of our system are (1) decoupled operation processing using non-volatile memory, (2) prioritized thread control, and (3) CPU-efficient backend data store. Our system emphasizes low CPU overhead and high CPU efficiency to efficiently utilize NVMe SSDs in a distributedstorage environment. We implement our system in Ceph. Compared to the native Ceph, our prototype system delivers more than 3x performance improvement for small random write I/Os in terms of both IOPS and latency by efficiently utilizing CPU cores.
In modern power systems, the measurement infrastructure represents the backbone of any monitoring and control application. Indeed, the ever-increasing penetration of renewable energy sources and distributed generation...
详细信息
Wireless sensor Network (WSN) has been widely recognized as one of the most important technology for low power wireless communication and also used in variety of applications like medical, military, industrial, agricu...
详细信息
Social airborne sensing (SAS) is emerging as a new sensing paradigm that leverages the complementary aspects of social sensing and airborne sensing (i.e., UAVs) for reliable information collection. In this paper, we p...
详细信息
ISBN:
(纸本)9781665439299
Social airborne sensing (SAS) is emerging as a new sensing paradigm that leverages the complementary aspects of social sensing and airborne sensing (i.e., UAVs) for reliable information collection. In this paper, we present HeteroSAS, a heterogeneous resource management framework for "all-in-the-air" SAS in disaster response applications. Current SAS approaches use UAVs to only capture data, but carry out computation on ground-based processing nodes that may be unavailable in disaster scenarios and thus consider a single model of UAV along with only one type of task (i.e., data capture). In this paper, we explore the opportunity to exploit the complementary strengths of different UAV models to accomplish all stages of sensing tasks (i.e., data capturing, maneuvering, and computation) exclusively "in-the-air". However, several challenges exist in developing such a resource management framework: i) handling the uncertain social signals in presence of the heterogeneity of UAVs and tasks;and ii) adapting to constantly changing cyber-physical-social environments. The HeteroSAS framework addresses these challenges by building a novel resource management framework that observes the environment and learns the optimal strategy for each UAV using techniques from multi-agent reinforcement learning, game theory, and ensemble learning. The evaluation with a real-world case study shows that HeteroSAS outperforms the state-of-the-art in terms of detection effectiveness, deadline hit rate, and robustness on heterogeneity.
The proliferation of distributed generation (DG) in the emerging microgrid system is undermined due to substantial challenges associated with the protection. Due to the penetration of DGs, fault level in the microgrid...
详细信息
Graph Mining has been the most demanding research area for the last few decades in different fields, such as biological networks, the world wide web, mobile applications, sensors, online, social networks, etc. Frequen...
详细信息
The rapid proliferation of AI into extensive global data systems has brought new challenges in cybersecurity, mainly because such environments have grown inherently complex, large, and distributed. Existing cybersecur...
详细信息
Due to employee carelessness or negligence, accidents are frequently happening to workers in feeders. The person working in the Feeder will be impacted by the mal operation of the isolators (Double Pole Switch). To av...
详细信息
In the current years, the Home Automation organizations takes to see a rapid changes due to introduction of many wireless technologies. The detonation in the wireless expertise has gotten the arrival of countless ethi...
详细信息
distributed training with synchronous stochastic gradient descent (SGD) on GPU clusters has been widely used to accelerate the training process of deep models. However, SGD only utilizes the first-order gradient in mo...
详细信息
ISBN:
(纸本)9781665445139
distributed training with synchronous stochastic gradient descent (SGD) on GPU clusters has been widely used to accelerate the training process of deep models. However, SGD only utilizes the first-order gradient in model parameter updates, which may take days or weeks. Recent studies have successfully exploited approximate second-order information to speed up the training process, in which the Kronecker-Factored Approximate Curvature (KFAC) emerges as one of the most efficient approximation algorithms for training deep models. Yet, when leveraging GPU clusters to train models with distributed KFAC (D-KFAC), it incurs extensive computation as well as introduces extra communications during each iteration. In this work, we propose D-KFAC (SPD-KFAC) with smart parallelism of computing and communication tasks to reduce the iteration time. Specifically, 1) we first characterize the performance bottlenecks of D-KFAC, 2) we design and implement a pipelining mechanism for Kronecker factors computation and communication with dynamic tensor fusion, and 3) we develop a load balancing placement for inverting multiple matrices on GPU clusters. We conduct real-world experiments on a 64-GPU cluster with 100Gb/s InfiniBand interconnect. Experimental results show that our proposed SPD-KFAC training scheme can achieve 10%-35% improvement over state-of-the-art algorithms.
暂无评论