The distributed embedded systems paradigm is a promising platform for high-performance embedded applications. We present a distributed algorithm and system based on cost-effective devices. The proof of concept shows h...
详细信息
ISBN:
(纸本)9798400705977
The distributed embedded systems paradigm is a promising platform for high-performance embedded applications. We present a distributed algorithm and system based on cost-effective devices. The proof of concept shows how a parallelized approach leveraging a distributed embedded platform can address the computational of the Machine Learning K-Nearest Neighbors (K-NN) algorithm with large and heterogeneous datasets.
For distributed training (DT) based on the parameter servers (PS) architecture, the communication overhead is huge in the network for servers synchronizing parameters. In the PS architecture, the workers send gradient...
详细信息
ISBN:
(纸本)9798350302936
For distributed training (DT) based on the parameter servers (PS) architecture, the communication overhead is huge in the network for servers synchronizing parameters. In the PS architecture, the workers send gradients over the network to PS for aggregation. With the development of programmable switches, in-network aggregation (INA) is proposed to accelerate distributed training by utilizing the programmable switches in the network to implement gradients aggregation, not only at PS. However, the existing routing methods can not fully utilize the capability of INA, resulting in load imbalance and long communication time. This paper analyzes and models the routing problem in INA under the constraint of network resources. And we propose a routing algorithm named AggTree to solve this problem by searching the high-rate routing path. The result of simulations shows that AggTree can reduce communication time by 4.1%-37.9% for a single DT job and 12.7%-74.0% for multiple DT jobs compared with state-of-the-art solutions.
Today, many important internet services are provided on cloud computing platforms. The ever-increasing expansion of services and user requirements has necessitated the optimal use of resources. Therefore, several algo...
详细信息
This paper addresses the challenges of optimizing task scheduling for a distributed, task-based execution model in OpenMP for cluster computing environments. Traditional OpenMP implementations are primarily designed f...
详细信息
With the rising popularity of Unmanned Aircraft systems (UASs), soon large amounts of UAS are projected to inhabit the low-level airspace. As the skies are getting more and more crowded, it is essential and mandatory ...
详细信息
ISBN:
(纸本)9798350304367;9798350304374
With the rising popularity of Unmanned Aircraft systems (UASs), soon large amounts of UAS are projected to inhabit the low-level airspace. As the skies are getting more and more crowded, it is essential and mandatory to provide up-to-date information on vehicles' identities, positions and intentions. One of the main use cases of communications among UASs is the coordination and guidance of vehicles, which is referred to as Unmanned Aircraft System Traffic Management (UTM). Despite numerous proposed data link technologies for intra-UAS communication, there is often a lack of clarity regarding the underlying performance requirements. Therefore, the need arises to quantify the required data rate, delay budget and communication range so that suitable data link technologies can be selected. To gain insight into these requirements, we developed a stochastic communication model and applied it to future UAS traffic scenarios for major German cities. These scenarios are based on predictions of UAS traffic demand generated by specific applications, such as parcel delivery. The proposed model estimates that a communication range of less than 500m is sufficient. The delay budget strongly depends on the diameter and the spatial density of the network but remains lower than 110 ms. For large cities like Berlin, a data rate of 19 Mbps is predicted for the year 2035 which is challenging for many current communication technologies.
Data sharing and computing are integral aspects of modern power grid networks. It involves the transmission of information about electricity generation, consumption, transmission, and distribution, and should be opera...
详细信息
ISBN:
(纸本)9798350358810;9798350358803
Data sharing and computing are integral aspects of modern power grid networks. It involves the transmission of information about electricity generation, consumption, transmission, and distribution, and should be operated in an efficient and secure way. Traditional approaches conduct mutual authentication and authorization among networks. The interoperability is an issue with the increase of network interconnectivity and complexity of grid data. The paper addresses this issue by introducing a secure data sharing and computing approach that leverages enclaves and blockchain technology, referred to as Encblock. Encblock has a unique design to achieve an enclave-based trusted and confidential SGX computing environment. It is built up with a Dynamic Distribution System (DDS) Pub/Sub (Publisher/Subscriber) middleware and power grid common data model for data compatibility and flexibility for data sharing. A remote attestation protocol is developed to maintain the enclave integrity and authenticity with the external Intel attestation server. Meanwhile, the attestation and data computing results can be securely managed in a blockchain to support various power grid businesses. Substantial experiments are conducted with the data simulations to verify the Encblock performance such as remote attestation latency and the blockchain data transaction capability.
This work considers the optimal design of MapReduce-based coded distributedcomputing (CDC) with nonuniform input file sizes. We propose an efficient heterogeneous CDC (HetCDC) scheme capable of handling an arbitrary ...
详细信息
distributed data-parallel (DDP) training improves overall application throughput as multiple devices train on a subset of data and aggregate updates to produce a globally shared model. The periodic synchronization at ...
详细信息
ISBN:
(纸本)9798350304817
distributed data-parallel (DDP) training improves overall application throughput as multiple devices train on a subset of data and aggregate updates to produce a globally shared model. The periodic synchronization at each iteration incurs considerable overhead, exacerbated by the increasing size and complexity of state-of-the-art neural networks. Although many gradient compression techniques propose to reduce communication cost, the ideal compression factor that leads to maximum speedup or minimum data exchange remains an open-ended problem since it varies with the quality of compression, model size and structure, hardware, network topology and bandwidth. We propose GraVAC, a framework to dynamically adjust compression factor throughout training by evaluating model progress and assessing gradient information loss associated with compression. GraVAC works in an online, black-box manner without any prior assumptions about a model or its hyperparameters, while achieving the same or better accuracy than dense SGD (i.e., no compression) in the same number of iterations/epochs. As opposed to using a static compression factor, GraVAC reduces end-to-end training time for ResNet101, VGG16 and LSTM by 4.32x, 1.95x and 6.67x respectively. Compared to other adaptive schemes, our framework provides 1.94x to 5.63x overall speedup.
Operations on large file trees in a DFS (distributed File System)-server are a bottleneck in large-scale cloud computing, such as distributed build systems for large software projects. Such operations take much longer...
详细信息
In 2018, Yang et al. introduced a novel and effective approach, using maximum distance separable (MDS) codes, to mitigate the impact of elasticity in cloud computingsystems. This approach is referred to as coded elas...
详细信息
ISBN:
(纸本)9781728190549
In 2018, Yang et al. introduced a novel and effective approach, using maximum distance separable (MDS) codes, to mitigate the impact of elasticity in cloud computingsystems. This approach is referred to as coded elastic computing. Some limitations of this approach include that it assumes all virtual machines have the same computing speeds and storage capacities, and it cannot tolerate stragglers for matrix-matrix multiplications. In order to resolve these limitations, in this paper, we introduce a new combinatorial optimization framework, named uncoded storage coded transmission elastic computing (USCTEC), for heterogeneous speeds and storage constraints, aiming to minimize the expected computation time for matrix-matrix multiplications, under the consideration of straggler tolerance. Within this framework, we propose optimal solutions with straggler tolerance under relaxed storage constraints. Moreover, we propose a heuristic algorithm that considers heterogeneous storage constraints. Our results demonstrate that the proposed algorithm outperforms baseline solutions utilizing cyclic storage placements, in terms of both expected computation time and storage size.
暂无评论