In distributed training, deep neural networks (DNNs) are launched over multiple workers concurrently and aggregate their local updates on each step in bulk-synchronous parallel (BSP) training. However, BSP does not li...
详细信息
ISBN:
(纸本)9798350307924
In distributed training, deep neural networks (DNNs) are launched over multiple workers concurrently and aggregate their local updates on each step in bulk-synchronous parallel (BSP) training. However, BSP does not linearly scale-out due to high communication cost of aggregation. To mitigate this overhead, alternatives like Federated Averaging (FedAvg) and Stale-Synchronous Parallel (SSP) either reduce synchronization frequency or eliminate it altogether, usually at the cost of lower final accuracy. In this paper, we present SelSync, a practical, low-overhead method for DNN training that dynamically chooses to incur or avoid communication at each step either by calling the aggregation op or applying local updates based on their significance. We propose various optimizations as part of SelSync to improve convergence in the context of semi-synchronous training. Our system converges to the same or better accuracy than BSP while reducing training time by up to 14x.
Connected and Automated Vehicles (CAVs) have the potential to reduce traffic congestion, travel delay, and fuel consumption on urban roads. In this paper, the Model Predictive Control (MPC) approach is employed for CA...
详细信息
ISBN:
(纸本)9798350399462
Connected and Automated Vehicles (CAVs) have the potential to reduce traffic congestion, travel delay, and fuel consumption on urban roads. In this paper, the Model Predictive Control (MPC) approach is employed for CAV platooning in the vicinity of signalized intersections in centralized and distributed manners, respectively. The delay-compensating strategy is designed to explicitly incorporate sensor and actuator delays. The performance of the proposed centralized and distributed control approaches is verified in simulation experiments taking into account multiple signal plans, traffic demands, and human driving vehicles. The results demonstrate the sensor and actuator delays can be compensated in the proposed MPC approaches and show the distributed MPC outperforms the centralized MPC. Finally, the proposed superiority of the MPC approaches in travel delay, fuel consumption, and throughput is verified in comparison with the realistic human-driven vehicle trajectory data.
Growing concerns about air pollution highlight the need for efficient solutions in air quality monitoring. This study presents a novel air quality assessment system integrating Wireless sensor Network (WSN) with Narro...
详细信息
ISBN:
(数字)9798350385922
ISBN:
(纸本)9798350385939;9798350385922
Growing concerns about air pollution highlight the need for efficient solutions in air quality monitoring. This study presents a novel air quality assessment system integrating Wireless sensor Network (WSN) with Narrowband Internet of Things (NB-IoT), Long Range (LoRa) communication, and Message Queuing Telemetry Transport (MQTT) technologies. The system adopts a star topology with primary and secondary nodes, utilizing LoRa for primary-secondary communication and NB-IoT for cloud connectivity. Data are processed on the IoT analytics cloud platform ThingsBoard. Deployed on a campus, the system accurately measured varying concentrations and Air Quality Index (AQI) readings. Switching from a frequency of 125kHz to 500kHz improves latency across various scenarios. Investigated improvements include 5.13% with Spreading Factor 7 (SF7) and 4.44% with SF12 at 0 meters, as well as 5.77% with SF7 and 8.33% with SF12 at 80 meters. These findings guide the optimization of air quality monitoring systems for efficient remote monitoring.
Communication overhead is one of the major bottlenecks for large-scale distributed model training. Sparse gradient has been proposed to reduce the communication volume dramatically without affecting the model accuracy...
详细信息
ISBN:
(纸本)9798350339864
Communication overhead is one of the major bottlenecks for large-scale distributed model training. Sparse gradient has been proposed to reduce the communication volume dramatically without affecting the model accuracy. However, high-performance implementation of sparse gradient is still hindered by the overheads of gradient sparsification and inefficient implementation of sparse allreduce. For a sparse allreduce operation, the density of intermediate results may dynamically increase as summations progress. Meanwhile, high-performance sparse allreduce is further complicated with heterogeneous bandwidths. To tackle these challenges, we propose bbTopk for efficient sparse gradient training, which includes a new blocked top-k sparsification technique and a novel bandwidth-aware sparse allreduce algorithm. In particular, to alleviate the sparsification overhead, we design a blocked top-k method, which can reduce the overhead without sacrificing model accuracy. We then build a heterogeneity-aware communication model combined with the dynamic workload feature of the sparse allreduce. Guided by that, a new sparse allreduce algorithm is proposed that can take advantage of the network resources by improving bandwidth utilization, reducing cross-node hops, and adjusting the round order. Experiments are conducted on a variety of typical neural networks and distributed environments. Results show that bbTopk can substantially outperform the previous state-of-the-art work in most test cases with up to 2.57x speedup while achieving similar accuracy with the dense model empirically.
To accelerate distributed training, many gradient compression methods have been proposed to alleviate the communication bottleneck in synchronous stochastic gradient descent (S-SGD), but their efficacy in real-world a...
详细信息
ISBN:
(纸本)9798350339864
To accelerate distributed training, many gradient compression methods have been proposed to alleviate the communication bottleneck in synchronous stochastic gradient descent (S-SGD), but their efficacy in real-world applications still remains unclear. In this work, we first evaluate the efficiency of three representative compression methods (quantization with Sign-SGD, sparsification with Top-k SGD, and low-rank with Power-SGD) on a 32-GPU cluster. The results show that they cannot always outperform well-optimized S-SGD or even worse due to their incompatibility with three key system optimization techniques (all-reduce, pipelining, and tensor fusion) in S-SGD. To this end, we propose a novel gradient compression method, called alternate compressed Power-SGD (ACP-SGD), which alternately compresses and communicates low-rank matrices. ACP-SGD not only significantly reduces the communication volume, but also enjoys the three system optimizations like S-SGD. Compared with Power-SGD, the optimized ACP-SGD can largely reduce the compression and communication overheads, while achieving similar model accuracy. In our experiments, ACP-SGD achieves an average of 4.06x and 1.43x speedups over S-SGD and Power-SGD, respectively, and it consistently outperforms other baselines across different setups (from 8 GPUs to 64 GPUs and from 1Gb/s Ethernet to 100Gb/s InfiniBand).
The integration of computational capabilities with the electrical infrastructure of the grid can be envisioned as a societal scale Cyber Physical System (CPS). Middleware frameworks can act as a layer of abstraction t...
详细信息
ISBN:
(纸本)9798350322811
The integration of computational capabilities with the electrical infrastructure of the grid can be envisioned as a societal scale Cyber Physical System (CPS). Middleware frameworks can act as a layer of abstraction that manages the interaction between disparate applications to facilitate intelligent control and management of energy production and consumption. This demonstration showcases Resilient Information Architecture Platform for Smart Grid (RIAPS), a distributed software platform that combines a domain specific modeling language with framework-level services such as communication, remote deployment of applications, distributed coordination, time synchronization, and fault tolerance, to develop and run distributed applications. An example of an energy demand curtailment scheme called load shedding is presented, to highlight how the RIAPS framework can be used to implement distributed algorithms to control elements of a power system, which runs as a simulation using OpenDSS.
Modern smart sensor-based energy management systems leverage non-intrusive load monitoring (NILM) to predict and optimize appliance load distribution in real-time. NILM, or energy disaggregation, refers to the decompo...
详细信息
ISBN:
(纸本)9781665495127
Modern smart sensor-based energy management systems leverage non-intrusive load monitoring (NILM) to predict and optimize appliance load distribution in real-time. NILM, or energy disaggregation, refers to the decomposition of electricity usage conditioned on the aggregated power signals (i.e., smart sensor on the main channel). Based on real-time appliance power prediction using sensory technology, energy disaggregation has great potential to increase electricity efficiency and reduce energy expenditure. With the introduction of transformer models, NILM has achieved significant improvements in predicting device power readings. Nevertheless, transformers are less efficient due to O(l2) complexity w.r.t. sequence length l. Moreover, transformers can fail to capture local signal patterns in sequenceto-point settings due to the lack of inductive bias in local context. In this work, we propose an efficient localness transformer for non-intrusive load monitoring (ELTransformer). Specifically, we leverage normalization functions and switch the order of matrix multiplication to approximate self-attention and reduce computational complexity. Additionally, we introduce localness modeling with sparse local attention heads and relative position encodings to enhance the model capacity in extracting short-term local patterns. To the best of our knowledge, ELTransformer is the first NILM model that addresses computational complexity and localness modeling in NILM. With extensive experiments and quantitative analyses, we demonstrate the efficiency and effectiveness of the the proposed ELTransformer with considerable improvements compared to state-of-the-art baselines.
The modern computing scenario of the computing Continuum exhibits large and complex applications with heterogeneous requirements running on distributed infrastructure. Still, when it comes to coordinating and controll...
详细信息
ISBN:
(纸本)9798331539580
The modern computing scenario of the computing Continuum exhibits large and complex applications with heterogeneous requirements running on distributed infrastructure. Still, when it comes to coordinating and controlling such applications and infrastructures, it is common to rely on centralized or ad-hoc solutions. While these approaches are robust, scaling management solutions, managing local changes, and having a holistic perspective can be challenging. Additionally, they could be better suited for addressing new problems in dynamic environments. Therefore, new approaches are needed. In this paper, we present DICT, a novel method for managing the computing Continuum, i.e., the infrastructure and the applications. The proposed approach encompasses a series of modules for automatic management. The core idea is to develop a method for applying the intents coming from the infrastructure and application managers in an autonomic and dynamic way. The modules can communicate through coordinators that take observable inputs and send them back predictions on the next actions to take. These coordinators have the role of summarizing the sensed observation and extracting high-level information in light of the AI advancement that shows how discrete space representation of inputs improves generalization. Thus, they can have models that build their own semantics and "language." We envision that, through DICT, both the application and the infrastructure management will only have to specify high-level intents and not focus on defining encoded and difficult-to-change strategies.
Virtualization in cellular networks is one of the key areas of research where technologies, infrastructure and challenges are rapidly changing as 5G system architecture demands a paradigm shift. This paper aims to stu...
详细信息
ISBN:
(纸本)9798350333398
Virtualization in cellular networks is one of the key areas of research where technologies, infrastructure and challenges are rapidly changing as 5G system architecture demands a paradigm shift. This paper aims to study the viability and the performance of cloud-native infrastructures for hosting network functions. The selected frameworks implement both the 4G and the 5G stacks and their network functions. This work considers a variety of scenarios for enabling the deployment of a distributed and open-source cellular network: a baremetal setup, an all-docker-based setup and the proposed Kubernetes setup. Moreover, an analysis of the impact that the Radio Access Network (RAN) and the Core Network (CN) have on computational resource utilization is presented as the network conditions vary. The design proposed in this work has been validated and analyzed using the proposed prototype and testbed. This paper proposes a design to increase resource usage flexibility and performance and reduction of deployment time. The analysis of the gathered data reveals that the deployments of containerized cellular networks display better performance in terms of flexibility, low startup times, and ease of deployment while consuming the same resources as the non-containerized.
In this paper, we consider the problem of using a drone to collect information within orchards in order to detect bugs. An orchard can be modeled as an aisle-graph, which is a regular data structure formed by consecut...
详细信息
ISBN:
(纸本)9781665495127
In this paper, we consider the problem of using a drone to collect information within orchards in order to detect bugs. An orchard can be modeled as an aisle-graph, which is a regular data structure formed by consecutive aisles where trees are arranged in a straight line. For monitoring the presence of bugs, a drone flies close to the trees and takes videos and/or pictures that will be analyzed offline. As the drone's energy is limited, only a subset of locations in the orchard can be visited with a fully charged battery. Those places that are most likely to be infested should be selected to promptly detect the parasite. We study the budgeted constrained position selection problem in the orchard from an algorithmic point of view. We present the Single-drone Orienteering Aisle-graph Problem (SOAP), a variant of the well-known orienteering problem where the finite resource is the drone's battery. We first show that SOAP can be optimally solved for aisle-graphs in polynomial time. However, the optimal solution is not efficient for large orchards. Then, we propose two efficient heuristics that work even for large (orchard) instances. After a thorough analysis of the proposed solutions, we evaluate their performance by simulation experiments on both synthetic and real data sets.
暂无评论