This paper presents the development and evaluation of a distributed system employing low-latency embedded field-programmable gate arrays (FPGAs) to optimize scheduling for deep learning (DL) workloads and to configure...
详细信息
This paper presents the development and evaluation of a distributed system employing low-latency embedded field-programmable gate arrays (FPGAs) to optimize scheduling for deep learning (DL) workloads and to configure multiple deep learning accelerator (DLA) architectures. Aimed at advancing FPGA applications in real-time edge computing, this study focuses on achieving optimal latency for a distributed computing system. A novel methodology was adopted, using configurable hardware to examine clusters of DLAs, varying in architecture and scheduling techniques. The system demonstrated its capability to parallel-process diverse neuralnetwork (NN) models, manage compute graphs in a pipelined sequence, and allocate computational resources efficiently to intensive NN layers. We examined five configurable DLAs-Versatile Tensor Accelerator (VTA), Nvidia DLA (NVDLA), Xilinx Deep processing Unit (DPU), Tensil Compute Unit (CU), and Pipelined Convolutional neuralnetwork (PipeCNN)-across two FPGA cluster types consisting of Zynq-7000 and Zynq UltraScale+ System-on-Chip (SoC) processors, respectively. Four deep neuralnetwork (DNN) workloads were tested: Scatter-Gather, AI Core Assignment, Pipeline Scheduling, and Fused Scheduling. These methods revealed an exponential decay in processing time up to 90% speedup, although deviations were noted depending on the workload and cluster configuration. This research substantiates FPGAs' utility in adaptable, efficient DL deployment, setting a precedent for future experimental configurations and performance benchmarks.
processing of power system data containing outliers and noise is important for state estimation. This article aims to improve the quality of data to the state estimator. It addresses noises in the data, viz., normally...
详细信息
processing of power system data containing outliers and noise is important for state estimation. This article aims to improve the quality of data to the state estimator. It addresses noises in the data, viz., normally distributed noise and bias. Along with this, it also handles the outliers, missing data, and time stamping error. In the first stage, outliers, missing data, and time stamping error are handled. In the second stage, data from the first stage pass through the proposed weighted deep neuralnetwork that makes use of measurement variance information to reduce noises and bias present in the data. The data after noise reduction are utilized by the state estimation program to find the system states. The proposed method is tested on the IEEE 13-node test feeder.
This research paper explores the realm of fault detection in distributed motors through the vision of the Internet of electrical drives. This paper aims at employing artificial neuralnetworks supported by the data co...
详细信息
This research paper explores the realm of fault detection in distributed motors through the vision of the Internet of electrical drives. This paper aims at employing artificial neuralnetworks supported by the data collected by the Internet of distributed devices. Cross-verification of results offers reliable diagnosis of industrial motor faults. The proposed methodology involves the development of a cyber-physical system architecture and mathematical modeling framework for efficient fault detection. The mathematical model is designed to capture the intricate relationships within the cyber-physical system, incorporating the dynamic interactions between distributed motors and their edge controllers. Fast Fourier transform is employed for signal processing, enabling the extraction of meaningful frequency features that serve as indicators of potential faults. The artificial neuralnetwork based fault detection system is integrated with the solution, utilizing its ability to learn complex patterns and adapt to varying motor conditions. The effectiveness of the proposed framework and model is demonstrated through experimental results. The experimental setup involves diverse fault scenarios, and the system's performance is evaluated in terms of accuracy, sensitivity, and false positive rates.
Split learning (SL) is a popular distributed machine learning (ML) method used to enable ML. It divides a neuralnetwork based model into subnetworks. Then, it separately trains the subnetworks on distributed parties ...
详细信息
distributed Machine learning has delivered considerable advances in training neuralnetworks by leveraging parallel processing, scalability, and fault tolerance to accelerate the process and improve model performance....
详细信息
distributed Machine learning has delivered considerable advances in training neuralnetworks by leveraging parallel processing, scalability, and fault tolerance to accelerate the process and improve model performance. However, training of large-size models has exhibited numerous challenges, due to the gradient dependence that conventional approaches integrate. To improve the training efficiency of such models, gradient-free distributed methodologies have emerged fostering the gradient-independent parallel processing and efficient utilization of resources across multiple devices or nodes. However, such approaches, are usually restricted to specific applications, due to their conceptual limitations: computational and communicational requirements between partitions, limited partitioning solely into layers, limited sequential learning between the different layers, as well as training a potential model in solely synchronous mode. In this paper, we propose and evaluate, the Neuro-distributed Cognitive Adaptive Optimization (ND-CAO) methodology, a novel gradient-free algorithm that enables the efficient distributed training of arbitrary types of neuralnetworks, in both synchronous and asynchronous manner. Contrary to the majority of existing methodologies, ND-CAO is applicable to any possible splitting of a potential neuralnetwork, into blocks (partitions), with each of the blocks allowed to update its parameters fully asynchronously and independently of the rest of the blocks. Most importantly, no data exchange is required between the different blocks during training with the only information each block requires is the global performance of the model. Convergence of ND-CAO is mathematically established for generic neuralnetwork architectures, independently of the particular choices made, while four comprehensive experimental cases, considering different model architectures and image classification tasks, validate the algorithms' robustness and effectiveness in both syn
Finding the optimal hyperparameters of a neuralnetwork is a challenging task, usually done through a trial-and-error approach. Given the complexity of just training one neuralnetwork, particularly those with complex...
详细信息
Finding the optimal hyperparameters of a neuralnetwork is a challenging task, usually done through a trial-and-error approach. Given the complexity of just training one neuralnetwork, particularly those with complex architectures and large input sizes, many implementations accelerated with GPU (Graphics processing Unit) and distributed and parallel technologies have come to light over the past decade. However, whenever the complexity of the neuralnetwork used is simple and the number of features per sample is small, these implementations become lackluster and provide almost no benefit from just using the CPU (Central processing Unit). As such, in this paper, we propose a novel parallelized approach that leverages GPU resources to simultaneously train multiple neuralnetworks with different hyperparameters, maximizing resource utilization for smaller networks. The proposed method is evaluated on energy demand datasets from Spain and Uruguay, demonstrating consistent speedups of up to 1164x over TensorFlow and 410x over PyTorch.
With space electromagnetic environments becoming increasingly complex, the direction of arrival (DOA) estimation based on the point source model can no longer meet the requirements of spatial target location. Based on...
详细信息
With space electromagnetic environments becoming increasingly complex, the direction of arrival (DOA) estimation based on the point source model can no longer meet the requirements of spatial target location. Based on the characteristics of the distributed source, a new DOA estimation algorithm based on deep learning is proposed. The algorithm first maps the distributed source model into the point source model via a generative adversarial network (GAN) and further combines the subspace-based method to achieve central DOA estimation. Second, by constructing a deep neuralnetwork (DNN), the covariance matrix of the received signals is used as the input to estimate the angular spread of the distributed source. The experimental results show that the proposed algorithm can achieve better performance than the existing methods for a distributed source.
In the Energy Conversion for Next-Generation Smart Cities, intelligent substation plays an important role in the power conversion. As an important guarantee for the stable operation of intelligent substation, the rese...
详细信息
In the Energy Conversion for Next-Generation Smart Cities, intelligent substation plays an important role in the power conversion. As an important guarantee for the stable operation of intelligent substation, the research on fault diagnosis technology is particularly important. In this paper, the acoustic characteristic diagnosis of substation equipment (take transformers for example) is researched and the application of "Voice Recognition + artificial neuralnetwork (ANN) " technology in substation fault diagnosis is analyzed. At the same time, the continuous online monitoring of the intelligent substation equipment will produce a large amount of monitoring data, which needs to be analyzed timely and effectively to understand the operating status of the equipment accurately. Because of this, this paper adopts distributed computing by establishing a real-time distributed computing platform, using open source technology to store the online monitoring of sound data into the computing platform for data processing to achieve the purpose of automatic fault detection and analysis. The results show that distributed computing can realize the intelligent analysis, storage, and visualization of equipment data in the substation, which provides data support for fault diagnosis. Besides, the fitting accuracy rates of ANN model are 95.123% for training process and the fitting accuracy rates of ANN model are 99.353% for training process and the overall fitting accuracy rates of ANN model are 95.478% and the error between the predicted value and the actual value of the 5 sound signals is within 5% in the fault diagnosis process. Consequently, the ANN model can accurately identify each fault sound of substation and achieve the purpose of fault diagnosis.
Deep learning has emerged as a cornerstone technology across various domains, from image classification to natural language processing. However, the computational and data demands of training large-scale neural networ...
详细信息
Deep learning has emerged as a cornerstone technology across various domains, from image classification to natural language processing. However, the computational and data demands of training large-scale neuralnetworks pose significant challenges. distributed learning approaches, particularly those leveraging data parallelism, have become critical to addressing these challenges. Among these, the parameter server architecture stands out as a widely adopted and scalable solution, enabling efficient training of large models across distributed systems. This survey provides a comprehensive exploration of the parameter server architecture, detailing its design principles and operation. It categorizes and critically analyzes research advancements across five key aspects: consistency control, network optimization, parameter management, straggler handling, and fault tolerance. By synthesizing insights from a wide range of studies, this work highlights the trade-offs and practical effectiveness of various approaches while identifying open challenges and future research directions. The survey aims to serve as a foundational resource for researchers and practitioners striving to enhance the performance and scalability of distributed deep learning systems.
distributed DNN inference is becoming increasingly important as the demand for intelligent services at the network edge grows. By leveraging the power of distributed computing, edge devices can perform complicated and...
详细信息
暂无评论