In this research, we propose a distributed Search Engine Query Optimization (DSEQO) based sensor network concept for instantaneous forest fire exposure. The sensor network may identify and predict forest fire more sha...
详细信息
Cell-free massive multiple-input multiple-output (MIMO) can resolve the inter-cell interference issue in cellular networks through cooperative beamforming of the distributed access points (APs). This paper focuses on ...
详细信息
Cell-free massive multiple-input multiple-output (MIMO) can resolve the inter-cell interference issue in cellular networks through cooperative beamforming of the distributed access points (APs). This paper focuses on an uplink cell-free massive MIMO network and investigates novel methods to train the central processing unit (CPU), the APs, and the users in the network. To reduce the communication burden posed on the fronthaul, each AP applies receive beamforming to compress the vector signals into scalar ones before passing them to the CPU for centralized processing. By drawing analogies between an uplink cell-free network and a quasi-neuralnetwork and borrowing the idea of backpropagation algorithm, we propose a novel scheme named the distributed learning for uplink cell-free massive MIMO beamforming (DLCB), which can achieve the multi-AP cooperation without explicit estimation of their channel state information (CSI). The DLCB has low computational complexity and is applicable to various objective functions, such as the minimum mean squared error criterion and the maximum sum rate criterion. Extensive simulations verify that the proposed scheme achieves superior performance over the state-of-the-art methods.
This paper proposes a distributed learning-based framework to tackle the sum ergodic rate maximization problem in cell-free massive multiple-input multiple-output (MIMO) systems by utilizing the graph neuralnetwork (...
详细信息
Solar photovoltaic (PV) power prediction is easily affected by weather factors. In order to reduce the solar photovoltaic (PV) power prediction deviation and improve the prediction accuracy, a distributed solar photov...
详细信息
In this article, we propose a complementary deep-neural-network (C-DNN) processor by combining convolutional neuralnetwork (CNN) and spiking neuralnetwork (SNN) to take advantage of them. The C-DNN processor can sup...
详细信息
In this article, we propose a complementary deep-neural-network (C-DNN) processor by combining convolutional neuralnetwork (CNN) and spiking neuralnetwork (SNN) to take advantage of them. The C-DNN processor can support both complementary inference and training with heterogeneous CNN and SNN core architecture. In addition, the C-DNN processor is the first DNN accelerator application-specific integrated circuit (ASIC) that can support CNN-SNN workload division by using their magnitude-energy tradeoff. The C-DNN processor integrates the CNN-SNN workload allocator and attention module to find a more energy-efficient network domain for each workload in DNN. They enable the C-DNN processor to operate at the energy optimal point. Moreover, the SNN processing element (PE) array with distributed L1 cache can reduce the redundant memory access for SNN processing, resulting in a 42.2%-49.1% reduction. For high energy-efficient DNN training, the C-DNN processor integrates the global counter and local delta-weight (LDW) unit to eliminate power-consuming counters for a forward delta-weight generation. Furthermore, the forward delta-weight-based sparsity generation (FDWSG) is proposed to reduce the number of operations for training by 31%-79% The C-DNN processor achieves an energy efficiency of 85.8 and 79.9 TOPS/W for inference with CIFAR-10 and CIFAR-100, respectively (VGG-16). Moreover, the C-DNN processor achieves ImageNet classification with state-of-the-art energy efficiency of 24.5 TOPS/W (ResNet-50). For training, the C-DNN processor achieves the state-of-the-art energy efficiency of 84.5 and 17.2 TOPS/W for CIFAR-10 and ImageNet, respectively. Furthermore, it achieves 77.1% accuracy for ImageNet training with ResNet-50.
Hyperspectral (HS) pansharpening refers to fusing low spatial resolution HS (LRHS) images with the corresponding panchromatic (PAN) images to create high spatial resolution HS (HRHS) images. Most of the existing HS pa...
详细信息
Hyperspectral (HS) pansharpening refers to fusing low spatial resolution HS (LRHS) images with the corresponding panchromatic (PAN) images to create high spatial resolution HS (HRHS) images. Most of the existing HS pansharpening methods overlook the spatial and spectral imbalance of the ground objects of different types in the observed scenes. To address the dilemma, in this article we develop a novel tree-structured neuralnetwork (Tree-SNet) to form an adaptive spatial-spectral processing for HS pansharpening. The Tree-SNet method maps a convolutional neuralnetwork (CNN) onto a hierarchical tree structure, where routing nodes automatically tune the data distributed to tree paths, which is adaptive to the local characteristics of the data, while spatial enhancement (SpatE) and spectral enhancement (SpecE) modules are dynamically performed in the tree paths to further strengthen the adaptive processing. The proposed Tree-SNet is evaluated on several datasets, and the experimental results verify its superiority.
Modern deep learning has significantly improved performance and has been used in a wide variety of applications. Since the amount of computation required for the inference process of the neuralnetwork is large, it is...
详细信息
Modern deep learning has significantly improved performance and has been used in a wide variety of applications. Since the amount of computation required for the inference process of the neuralnetwork is large, it is processed not by the data acquisition location like a surveillance camera but by the server with abundant computing power installed in the data center. Edge computing is getting considerable attention to solve this problem. However, edge computing can provide limited computation resources. Therefore, we assumed a divided/distributedneuralnetwork model using both the edge device and the server. By processing part of the convolution layer on edge, the amount of communication becomes smaller than that of the sensor data. In this paper, we have evaluated AlexNet and the other eight models on the distributed environment and estimated FPS values with Wi-Fi, 3G, and 5G communication. To reduce communication costs, we also introduced the compression process before communication. This compression may degrade the object recognition accuracy. As necessary conditions, we set FPS to 30 or faster and object recognition accuracy to 69.7% or higher. This value is determined based on that of an approximation model that binarizes the activation of neuralnetwork. We constructed performance and energy models to find the optimal configuration that consumes minimum energy while satisfying the necessary conditions. Through the comprehensive evaluation, we found that the optimal configurations of all nine models. For small models, such as AlexNet, processing entire models in the edge was the best. On the other hand, for huge models, such as VGG16, processing entire models in the server was the best. For medium-size models, the distributed models were good candidates. We confirmed that our model found the most energy efficient configuration while satisfying FPS and accuracy requirements, and the distributed models successfully reduced the energy consumption up to 48.6%, and
distributed DNN inference is becoming increasingly important as the demand for intelligent services at the network edge grows. By leveraging the power of distributed computing, edge devices can perform complicated and...
详细信息
ISBN:
(数字)9798350351255
ISBN:
(纸本)9798350351262
distributed DNN inference is becoming increasingly important as the demand for intelligent services at the network edge grows. By leveraging the power of distributed computing, edge devices can perform complicated and resource-hungry inference tasks previously only possible on powerful servers, enabling new applications in areas such as autonomous vehicles, industrial automation, and smart homes. However, it is challenging to achieve accurate and efficient distributed edge inference due to the fluctuating nature of the actual resources of the devices and the processing difficulty of the input data. In this work, we propose DistrEE, a distributed DNN inference framework that can exit model inference early to meet specific quality of service requirements. In particular, the framework firstly integrates model early exit and distributed inference for multi-node collaborative inferencing scenarios. Furthermore, it designs an early exit policy to control when the model inference terminates. Extensive simulation results demonstrate that DistrEE can efficiently realize efficient collaborative inference, achieving an effective trade-off between inference latency and accuracy.
Mobile augmented reality (MAR) has gained increased attention thanks to its potential to transform applications in different domains. One of the challenges to realizing MAR systems is the processing of video frames ef...
详细信息
Mobile augmented reality (MAR) has gained increased attention thanks to its potential to transform applications in different domains. One of the challenges to realizing MAR systems is the processing of video frames efficiently. MAR user devices are often resource-constrained and unsuitable for real-time object detection and recognition from video streams. Edge computing has tremendous potential to enable MAR systems, where processing instances (e.g., serverless functions, containers, or virtual machines) can implement and manage the execution of convolutional neuralnetworks (CNNs) for processing MAR offloaded video frames. One of the challenges is how to balance the video frames across the edge servers and processing instances. In this article, we proposed the LAOS orchestrator for resource management and load balancing of distributed edge servers for MAR systems. The LAOS orchestrator balances incoming video frames among processing instances at the edge servers that process video frames. It also determines when to spawn new instances of the CNN functions aimed at ensuring a predefined latency threshold for the processing of video frames. Besides, we devised a novel queuing-based framework for modeling the resource management problem of distributed edge servers for MAR systems. The obtained numerical results show that the proposed LAOS orchestrator reduces the latency and efficiently manages the edge computing resources when dynamic workload peaks are considered.
processing-in-memory (PIM) is the most promising paradigm to address the bandwidth bottleneck in deep neuralnetwork (DNN) accelerators. However, the algorithmic and dataflow structure of DNNs still necessitates movin...
详细信息
processing-in-memory (PIM) is the most promising paradigm to address the bandwidth bottleneck in deep neuralnetwork (DNN) accelerators. However, the algorithmic and dataflow structure of DNNs still necessitates moving a large amount of data across banks inside the memory device to bring input data and their corresponding model parameters together, negatively shifting part of the bandwidth bottleneck to the in-memory data communication infrastructure. To alleviate this bottleneck, we present Smart Memory, a highly parallel in-memory DNN accelerator for 3D memories that benefits from a scalable high-bandwidth in-memory network. Whereas the existing PIM designs implement the compute units and network-on-chip on the logic die of the underlying 3D memory, in Smart Memory the computation and data transmission tasks are distributed across the memory banks. To this end, each memory bank is equipped with (1) a very simple processing unit to run neuralnetworks, and (2) a circuit-switched router to interconnect memory banks by a 3D network-on-memory. Our evaluation shows 44% average performance improvement over state-of-the-art in-memory DNN accelerators.
暂无评论