In this digital era, humans are intended to access huge amount of data to fulfill their needs. The so called big data has been generated across the globe from various sources such as e-commerce, social networking site...
详细信息
Federated learning (FL) has emerged as a new paradigm that enables distributed mobile devices to learn a global model collaboratively. Since mobile devices (a.k.a, clients) exhibit diversity in model training quality,...
详细信息
ISBN:
(纸本)9781450397339
Federated learning (FL) has emerged as a new paradigm that enables distributed mobile devices to learn a global model collaboratively. Since mobile devices (a.k.a, clients) exhibit diversity in model training quality, client selection (CS) becomes critical for efficient FL. CS faces the following challenges: First, the client’s availability, the training data volumes, and the network connection status are time-varying and cannot be easily predicted. Second, clients for training and the number of local iterations would seriously affect the model accuracy. Thus, selecting a subset of available clients and controlling local iterations should guarantee model quality. Third, renting clients for model training needs cost. It is necessary to dynamically administrate the use of the long-term budget without knowledge of future inputs. To this end, we propose a federated edge learning (FedL) framework, which can select appropriate clients and control the number of training iterations in real-time. FedL aims to reduce the completion time while reaching the desired model convergence and satisfying the long-term budget for renting clients. FedL consists of two algorithms: i) the online learning algorithm makes CS and iteration decisions according to historic learning results; ii) the online rounding algorithm translates fractional decisions derived by the online learning algorithm into integers to satisfy feasibility constraints. Rigorous mathematical proof reveals that dynamic regret and dynamic fit have sub-linear upper-bounds with time for a given budget. Extensive experiments based on realistic datasets suggest that FedL outperforms multiple state-of-the-art algorithms. In particular, FedL reduces at least 38% completion time compared with others.
This article discusses the results of practical research on the transfer of the architecture of the integration middleware layer to the distributed stream processing platform. The selection of the framework for stream...
详细信息
ISBN:
(纸本)9781728117393
This article discusses the results of practical research on the transfer of the architecture of the integration middleware layer to the distributed stream processing platform. The selection of the framework for streaming processing is based on a comparative analysis of the available open source solutions.
In recent work, the decentralized algorithm has received more attention. In the centralized network, the worker nodes need to communicate with the central nodes, which results in the growth of communication traffic wi...
详细信息
In recent work, the decentralized algorithm has received more attention. In the centralized network, the worker nodes need to communicate with the central nodes, which results in the growth of communication traffic with the network expansion. based on the purpose of reducing the communication costs in the distributed system, we proposed a decentralized algorithm based on ADMM - Grouping Ring All-Reduce ADMM (GR-ADMM) in this paper. First, GR-ADMM adopts decentralized architecture to avoid the problem of communication bottleneck in the central network. Second, to ensure the scalability of the distributed system, GR-ADMM introduces the Ring All-Reduce to the ADMM. Ring All-Reduce architecture has the advantage of its constant communication overhead. However, its performance is bounded by the stragglers (i.e., slow nodes). Third, GR-ADMM adopts the grouping strategy to alleviate the problem of stragglers. Experiments show that our algorithm has better convergence performance than QSGD and GADMM, especially in massive clusters. Compared with GADMM's, the overall communication cost of GR-ADMM is reduced by 72%.
As the basis of many knowledge graph completion tasks, the embedding representation of entities and relations in knowledge graph (KG) is an important task in the fields of Natural Language processing (NLP) and Artific...
详细信息
Converters play an imperative part in joining conveyed generators and directing control stream of microgrid and active distributednetwork (ADN). The hybrid active power filter is proposed to improve responsive contro...
详细信息
Converters play an imperative part in joining conveyed generators and directing control stream of microgrid and active distributednetwork (ADN). The hybrid active power filter is proposed to improve responsive control with a moo amplitude of operation's voltage. Its control extends and characteristics are distinctive based on a routine voltage source converter (VSC). Regulation of parallel-connected HAPF and VSC in active distributednetworks is examined in this work. To completely utilize the control efficiency of every sort of converter, the power output allocation calculation among these two devices is considered for reduce add up to converter rating. A control reference assurance strategy is hence proposed and inserted in various leveled control system of the active distributednetwork. Comes about demonstrate that converter capacity is decreased by combining HAPF and VSC. Decoupling regulation method is utilized as the essential regulation strategy to every converter. Reenactment comes about are given to appear the legitimacy of the proposed strategy.
Knowledge representation learning (KRL) is one of the important research topics in artificial intelligence and Natural language processing. It can efficiently calculate the semantics of entities and relations in a low...
详细信息
Binary neural network (BNN) is widely used in speech recognition, image processing and other fields to save memory and speed up computing. However, the accuracy of the existing binarization scheme in the realistic dat...
详细信息
Binary neural network (BNN) is widely used in speech recognition, image processing and other fields to save memory and speed up computing. However, the accuracy of the existing binarization scheme in the realistic dataset is obviously low, and the input layer uses 32-bit floating point to avoid excessive precision loss, which requires additional computing units and increases the computational burden. Therefore, it is very important to improve the input layer, save computing resources and reduce precision loss. In this paper, we propose a parallel convolution binary neural network accelerator architecture (PC-BNA). based on our proposed BNN model, we design an efficient field programmable gate array (FPGA) based accelerator. The input of the first layer is binarized, the traditional binary convolution layer is replaced by a parallel binary convolution, and the network building blocks are improved. The experimental results show that the proposed PC-BNA has higher accuracy and better performance on CIFAR-10. The image recognition accuracy is as high as 91.4%, which is superior to the state-of-art of the BNN accelerator. Compared to the state-of-the-art BNN model, when using the same model size, the use of look-up-table (LUT) saves 9.08%, and the digital signal processor (DSP) saves 27.7%. The results suggest that it is promising for PC-BNA in future high performance mobile artificial intelligence applications.
Blockchain is a emerging decentralized infrastructure and distributed computing paradigm. However, the low TPS performance of blockchain technology can not meet the performance requirements of large-scale and high con...
详细信息
ISBN:
(纸本)9783030602482;9783030602475
Blockchain is a emerging decentralized infrastructure and distributed computing paradigm. However, the low TPS performance of blockchain technology can not meet the performance requirements of large-scale and high concurrency in application reality. A polling discrete event simulation platform is designed to investigate the performance of PoW based block generation algorithm. The operation of block generation algorithm is simulated from three aspects: network topology level, message queue of communication and protocol of PoW algorithm. The result shows that when the block size is 1 MB, the average relative error between the experimental results and the fixed TPS is 13.00%, and when the block size is 4 MB, the average relative error between the experimental results and the fixed TPS is 15.25%. Experiment result shows that the simulation platform can be use to investigate the transaction performance effectively.
The proceedings contain 71 papers. The topics discussed include: implementing a comprehensive networks-on-chip generator with optimal configurations;ECS2: A fast erasure coding library for GPU-accelerated storage syst...
ISBN:
(纸本)9781728166773
The proceedings contain 71 papers. The topics discussed include: implementing a comprehensive networks-on-chip generator with optimal configurations;ECS2: A fast erasure coding library for GPU-accelerated storage systems with parallel & direct IO;autoscaling high-throughput workloads on container orchestrators;Grade10: a framework for performance characterization of distributed graph processing;data life aware model updating strategy for stream-based online deep learning;energy optimization and analysis with EAR;parallel particle advection bake-off for scientific visualization workloads;HAN: a hierarchical AutotuNed collective communication framework;and evaluating worksharing tasks on distributed environments.
暂无评论