Graph neuralnetworks (GNNs) operate on data represented as graphs, and are useful for a wide variety of tasks from chemical reaction and protein structure prediction to content recommendation systems. However, traini...
详细信息
ISBN:
(纸本)9781665497473
Graph neuralnetworks (GNNs) operate on data represented as graphs, and are useful for a wide variety of tasks from chemical reaction and protein structure prediction to content recommendation systems. However, training for large graphs and improving training performance remain significant challenges. Existing distributed training systems partition a graph among all compute nodes to train for large graphs;however, this results in a communication overhead to degrade training performance. In this study, to solve these two problems, we propose a scalable data-parallel distributed GNN training system designed to partition a graph redundantly. It is implemented using remote direct memory access (RDMA) and nonblocking active messages to efficiently utilize network performance and hide communication overhead by overlapping with the training computation. Experimental results are presented to show the strong scalability of the proposed approach, which achieved parallel efficiencies of 0.93 using eight compute nodes for the ogbn-products dataset in the Open Graph Benchmark (OGB) and 0.95 based on two compute nodes using 32 compute nodes for the ogbn-papersl00M dataset. The proposed system exhibited a training performance 18.9% better than the state-of-the-art DistDGL, even with only a single compute node. The results demonstrate that the proposed approach may be considered a promising method to achieve scalable training performance for large graphs.
In this article, we tackle the dual challenges of efficiency and quality in music generation. We aim to create a model that produces high-quality music efficiently while keeping the model lightweight and the music aut...
详细信息
In this paper, a competitive energy scheduling strategy game of N-microgrids (MGs) inside a distributednetwork is considered. Each microgrid (MG) aims to maximize its profit under the noncooperative game frame. The s...
详细信息
In this paper, a competitive energy scheduling strategy game of N-microgrids (MGs) inside a distributednetwork is considered. Each microgrid (MG) aims to maximize its profit under the noncooperative game frame. The strategy-making of each MG depends on its equipment constraints, the aggregate energy supplies of all MGs, and the energy balance of supplies and demands. To solve above discussed problem, a noncooperative game with linear coupled constraints and a distributed neurodynamic algorithm are proposed to seek the generalized Nash equilibrium (GNE). Besides, the correctness and convergence of the proposed algorithm are analyzed in detail. The effectiveness and feasibility of the proposed method are also illustrated via the simulation example.
Facial expression recognition (FER) plays an important role in human-computer interaction and has been introduced in fatigue detection, human-computer interactive games, social robot, teaching effect analysis, and so ...
详细信息
Facial expression recognition (FER) plays an important role in human-computer interaction and has been introduced in fatigue detection, human-computer interactive games, social robot, teaching effect analysis, and so on. However, not all the features extracted from facial images are suitable for FER. Moreover, individual spatial features or temporal features have certain limitations in characterizing facial expressions. Therefore, it is necessary to extract effective features suitable for facial expression recognition and to leverage effective fusion method to improve the accuracy of FER. In this paper, we propose a facial expression recognition method based on spatial-temporal fusion with attention mechanism(STAFER) which is composed of the spatial feature extractor (SFE), the temporal feature extractor (TFE), and spatial-temporal fusion (STF). Firstly, a 10 layers neuralnetwork of pretrained VGG16 is applied to the backbone of SFE to extract the spatial features of facial expression. To filter out the features that are irrelevant of facial expression, the attention mechanism is applied to the first three convolutional blocks. Secondly, TFE is constructed with convolutional blocks and LSTM. Facial expression sequences are fed to the convolutional modules to extract low-level features, and then these features are fed into the LSTM to obtain temporal features. Finally, a decision-level fusion strategy is utilized to fuse spatial and temporal features. The experiemntal results demonstrated that our proposed method achieves an accuracy of 98.05% on CK+ and 88.34% on Oulu-CASIA, which is competitive in comparison with some state-of-art methods.
To achieve low probability of intercept (LPI) in radar networks for multiple target detection, it is necessary to find the optimal assignment of distributed radars to targets. The multi-radar to multi-target assignmen...
详细信息
To achieve low probability of intercept (LPI) in radar networks for multiple target detection, it is necessary to find the optimal assignment of distributed radars to targets. The multi-radar to multi-target assignment (MRMTA) problem aims to find the best radar combination, but its brute-force (BF)-based approach over all possible sensor combinations has exponential complexity, making it challenging to implement in networks with a large number of radars or targets. This limits the implementation of the BF approach in networks that prioritize low latency and complexity. To address this challenge, we propose a supervised machine-learning (ML)-based solution for the MRMTA problem. Our proposed implementation scheme performs the training procedure offline, leading to a significant reduction in assignment complexity and processing latency. We conducted extensive numerical simulations to design an ML structure with high accuracy, convergence speed, and scalability. Simulation results demonstrate the efficiency and effectiveness of our proposed ML-based MRMTA solution, which achieves near-optimal LPI performance with considerably lower computation time than benchmark schemes. Our proposed solution has the potential to optimize the assignment of distributed radars to targets in LPI radar networks and improve the performance of complex networks with low latency and complexity requirements.
Communication efficiency is crucial for accelerating distributed deep neuralnetwork (DNN) training. All-reduce, a vital communication primitive, is responsible for reducing model parameters in distributed DNN trainin...
详细信息
ISBN:
(纸本)9798400708435
Communication efficiency is crucial for accelerating distributed deep neuralnetwork (DNN) training. All-reduce, a vital communication primitive, is responsible for reducing model parameters in distributed DNN training. However, most existing All-reduce algorithms, designed for traditional electrical interconnect systems, fall short due to bandwidth limitations. Optical interconnects, with superior bandwidth, low transmission delay, and less power consumption, emerge as viable alternatives. We propose Wrht (Wavelength Reused Hierarchical Tree), an efficient scheme for implementing the All-reduce operation in optical interconnect systems. Wrht leverages wavelength-division multiplexing (WDM) to minimize the communication time in distributed data-parallel DNN training. We calculate the required wavelengths, minimum communication steps, and optimal communication time, considering optical communication constraints. Simulations with real-world DNN models indicate that Wrht notably reduces communication time. On average, compared with three conventional All-reduce algorithms, Wrht achieves reductions of 65.23%, 43.81%, and 82.22% respectively in optical interconnect systems, and 61.23% and 55.51% compared with two algorithms in electrical systems. This highlights Wrht's potential to enhance communication efficiency in DNN training using optical interconnects.
This article presents the development and implementation of an-optical fiber integrated smart environment with heterogeneous opto-electronic approaches. In this case, the so-called opto-electronic smart home is compos...
详细信息
This article presents the development and implementation of an-optical fiber integrated smart environment with heterogeneous opto-electronic approaches. In this case, the so-called opto-electronic smart home is composed of three different optical fiber sensor system, which are also based on different optical fibers, resulting in more than 50 integrated sensors. The proposed smart environment is capable of detecting the location of the patient inside the home environment, recognize patient's activities and provide the gait analysis through kinematics and spatio-temporal parameters of the gait. The heterogeneity of the system is verified by the use of the transmission-reflection analysis (TRA) using nanoparticle (NP)-doped optical fibers for the patient localization in Layer 1. Then, in Layer 2 a polymer optical fiber (POF) integrated pants is used by the patient, where the activity detection, especially walking, sitting and lying down is performed by the multiplexed intensity variation-based sensor integrated in the pants (with 30 sensors at each leg of the pants). Layer 3 comprises a fiber Bragg grating (FBG)-embedded smart carpet, where ten FBGs are inscribed in a single mode silica optical fiber. In addition, a graphical interface is developed for the sensors integration and cloud connectivity, where the signal processing is performed using the feedforward neuralnetwork (FFNN) approach for the location of mechanical perturbation along the optical fiber (for patient localization), activity classification and footsteps location along the FBG-embedded smart carpet. The implementation results show the feasibility of the proposed system, where the location of the patients, their activities and gait analysis.
In this contribution, the Helmholtz decomposition of a compressible flow velocity field into vortical and compressible structures is implemented using a finite element framework and physics-informed neuralnetworks. T...
详细信息
Load forecasting has a significant impact on energy management and planning, facilitating efficient allocation of resources and grid operations. In this study, a comparative analysis of traditional statistical methods...
详细信息
ISBN:
(纸本)9798350369458;9798350369441
Load forecasting has a significant impact on energy management and planning, facilitating efficient allocation of resources and grid operations. In this study, a comparative analysis of traditional statistical methods and deep learning techniques is conducted utilizing a real-world dataset from the Ikaria islanded grid. This paper focuses on four different forecasting approaches: Autoregressive Integrated Moving Average (ARIMA), Seasonal Autoregressive Integrated Moving Average with Exogenous Variables (SARIMAX), Long Short-Term Memory (LSTM) networks, and Deep neuralnetworks (DNN). Through the appropriate processing of the data, extensive experimentation took place, aiming to capture the complex and nonlinear patterns of the dataset. The results indicated that LSTM and DNN outperformed both ARIMA and SARIMAX in all three evaluation metrics, achieving 0.13, 0.09, and 2.11%, RMSE, MAE, and MAPE, respectively. As a result, this study validates the superiority of deep learning techniques in real-world islanded grid environments being capable of accurately predicting future load values based on historical data.
Convolution has been widely employed in image processing and computer vision applications such as picture augmentation, smoothing, and structure extraction. In addition, convolution operations are the most prevalent c...
详细信息
ISBN:
(纸本)9798400700439
Convolution has been widely employed in image processing and computer vision applications such as picture augmentation, smoothing, and structure extraction. In addition, convolution operations are the most prevalent computing patterns in machine learning domains. Convolutions, for example, are used in a substantial chunk of state-of-the-art convolutional neuralnetwork operations. Therefore, effectively mapping convolution operations onto hardware architectures is crucial for achieving superior performance while accelerating convolutional neuralnetworks. In this paper, we proposed various algorithms to efficiently map the 2-D convolution operation onto a dynamically reconfigurable resource array and distributed memory architecture. Furthermore, we have discussed the mapping of 2-D convolution on the target architecture for an input matrix of arbitrary size, as well as the generalization of the proposed approaches for multi-column DRRA architectures.
暂无评论