distributed array consisting of multiple subarrays is attractive for high-resolution direction-of-arrival (DOA) estimation when a large-scale array is infeasible. To achieve effective distributed DOA estimation, it is...
详细信息
ISBN:
(纸本)9781665405409
distributed array consisting of multiple subarrays is attractive for high-resolution direction-of-arrival (DOA) estimation when a large-scale array is infeasible. To achieve effective distributed DOA estimation, it is required to transmit information observed at the subarrays to the fusion center, where DOA estimation is performed. For noncoherent data fusion, the covariance matrices are used for subarray fusion. To address the complexity involved with the large array size, we propose a compression framework consisting of multiple parallel encoders and a classifier. The parallel encoders at the distributed subarrays are trained to compress the respective covariance matrices. The compressed results are sent to the fusion center where the signal DOAs are estimated using a classifier based on the compressed covariance matrices.
Drowsiness Detection (DD) is the procedure of identifying signs of drowsiness in individuals, especially in critical situations like driving, heavy machinery operation, or aircraft piloting. Hybrid Bi-directional Long...
详细信息
The long-term memory (LTM) is generally considered as the unlimited and permanent storage of information in the human brain. This concept has spurred numerous researches focusing on associative memory models, such as ...
详细信息
In this paper, we present design, implementation and evaluation of a novel predictive control framework to enable reliable distributed stream data processing, which features a Deep Recurrent neuralnetwork (DRNN) mode...
详细信息
ISBN:
(纸本)9781728112466
In this paper, we present design, implementation and evaluation of a novel predictive control framework to enable reliable distributed stream data processing, which features a Deep Recurrent neuralnetwork (DRNN) model for performance prediction, and dynamic grouping for flexible control. Specifically, we present a novel DRNN model, which makes accurate performance prediction with careful consideration for interference of co-located worker processes, according to multi-level runtime statistics. Moreover, we design a new grouping method, dynamic grouping, which can distribute/re-distribute data tuples to downstream tasks according to any given split ratio on the fly. So it can be used to re-direct data tuples to bypass misbehaving workers. We implemented the proposed framework based on a widely used distributed Stream Data processing System (DSDPS), Storm. For validation and performance evaluation, we developed two representative stream data processing applications: Windowed URL Count and Continuous Queries. Extensive experimental results show: 1) The proposed DRNN model outperforms widely used baseline solutions, ARIMA and SVR, in terms of prediction accuracy;2) dynamic grouping works as expected;and 3) the proposed framework enhances reliability by offering minor performance degradation with misbehaving workers.
Spiking neuralnetwork(SNN)simulation is very important for studying brain function and validating the hypotheses for neuroscience,and it can also be used in artificial ***,GPU-based simulators have been developed to ...
详细信息
Spiking neuralnetwork(SNN)simulation is very important for studying brain function and validating the hypotheses for neuroscience,and it can also be used in artificial ***,GPU-based simulators have been developed to support the real-time simulation of ***,these simulators’simulating performance and scale are severely limited,due to the random memory access pattern and the global communication between ***,we propose an efficient distributed heterogeneous SNN simulator based on the Sunway accelerators(including SW26010 and SW26010pro),named SWsnn,which supports accurate simulation with small time step(1/16 ms),randomly delay sizes for synapses,and larger scale network *** with existing GPUs,the Local Dynamic Memory(LDM)(similar to cache)in Sunway is much bigger(4 MB or 16 MB in each core group).To improve the simulation performance,we redesign the network data storage structure and the synaptic plasticity flow to make most random accesses occur in *** hides Message Passing Interface(MPI)-related operations to reduce communication costs by separating SNN general ***,SWsnn relies on parallel Compute processing Elements(CPEs)rather than serial Manage processing Element(MPE)to control the communicating buffers,using Register-Level Communication(RLC)and Direct Memory Access(DMA).In addition,SWsnn is further optimized using vectorization and DMA hiding *** results show that SWsnn runs 1.4−2.2 times faster than state-of-the-art GPU-based SNN simulator GPU-enhanced Neuronal networks(GeNN),and supports much larger scale real-time simulation.
Scalable data management is essential for processing large scientific dataset on HPC platforms for distributed deep learning. In-memory distributed storage is preferred for its speed, enabling rapid, random, and frequ...
详细信息
The continued development of neuralnetwork architectures continues to drive demand for computing power. While data center scaling continues, inference away from the cloud will increasingly rely on distributed inferen...
详细信息
Graph neuralnetworks (GNNs) are a computationally efficient method to learn embeddings and classifications on graph data. However, GNN training has low computational intensity, making communication costs the bottlene...
详细信息
ISBN:
(纸本)9798400717932
Graph neuralnetworks (GNNs) are a computationally efficient method to learn embeddings and classifications on graph data. However, GNN training has low computational intensity, making communication costs the bottleneck for scalability. Sparse-matrix dense-matrix multiplication (SpMM) is the core computational operation in full-graph training of GNNs. Previous work parallelizing this operation focused on sparsity-oblivious algorithms, where matrix elements are communicated regardless of the sparsity pattern. This leads to a predictable communication pattern that can be overlapped with computation and enables the use of collective communication operations at the expense of wasting significant bandwidth by communicating unnecessary data. We develop sparsity-aware algorithms that tackle the communication bottlenecks in GNN training with three novel approaches. First, we communicate only the necessary matrix elements. Second, we utilize a graph partitioning model to reorder the matrix and drastically reduce the amount of communicated elements. Finally, we address the high load imbalance in communication with a tailored partitioning model, which minimizes both the total communication volume and the maximum sending volume. We further couple these sparsity-exploiting approaches with a communication-avoiding approach (1.5D parallel SpMM) in which submatrices are replicated to reduce communication. We explore the tradeoffs of these combined optimizations and show up to 14x improvement on 256 GPUs and on some instances reducing communication to almost zero resulting in a communication-free parallel training relative to a popular GNN framework based on communication-oblivious SpMM.
Internet of Things (IoT)-enabled Smart Energy Management (SEM) in distributed Energy Resources (DERs), while crucial for optimizing energy distribution and resource management faces challenges such as data inconsisten...
详细信息
ISBN:
(纸本)9798331523923
Internet of Things (IoT)-enabled Smart Energy Management (SEM) in distributed Energy Resources (DERs), while crucial for optimizing energy distribution and resource management faces challenges such as data inconsistencies and the variability of renewable energy generation. These issues result in inaccurate demand forecasts, leading to suboptimal allocation of resources and inefficient Energy Management (EM). Additionally, as energy demand and generation patterns are influenced by factors like weather conditions, time of day, and energy consumption behaviors, prediction models may struggle to capture these dynamic changes, affecting the reliability of forecasts. To address these limitations, this manuscript proposes a novel approach for energy demand prediction. Data is collected from IoT-enabled sensors monitoring DERs. The data undergoes pre-processing, where the Fast Resampled Iterative Filtering (FRIF) method is used to eliminate missing values and normalize the inputs. The Self-Adaptive Physics-Informed neuralnetwork (SAPINN) model then utilizes the processed data to forecast energy demand, renewable energy generation, and storage levels. Green Anaconda Optimization (GAO) is applied to optimize the weight parameters of the SAPINN model. The proposed SAPINN-GAO method is implemented using the MATLAB platform and compared with existing models, such Stacked Convoluted Bi-Directional Gated Attention network-Hybrid Darts Seagull Optimizer (SConBGAN-HDSO), Recurrent neuralnetwork (RNN), and Support Vector Machine-Particle Swarm Optimization (SVM-PSO). The SAPINN-GAO method achieves an accuracy of 99.2%, precision of 99.2%, and a Root Mean Square Error (RMSE) of 2.2%, demonstrating its superior performance in energy demand prediction. The SAPINN-GAO method's higher accuracy and precision, coupled with its robust performance, make it a reliable and efficient solution for energy demand, renewable energy generation, and storage levels forecasting in IoT-enabled SEM syst
The scale of neural language models has been increasing significantly over recent years. As a result, the time complexity of training larger language models and resource utilization has been increasing at a higher rat...
详细信息
ISBN:
(纸本)9781665484534
The scale of neural language models has been increasing significantly over recent years. As a result, the time complexity of training larger language models and resource utilization has been increasing at a higher rate as well. In this research, we propose a distributed implementation of a Graph Attention neuralnetwork model with 120 million parameters and train it on a cluster of eight GPUs. We demonstrate three times speedup in model training while keeping the stability of accuracy and loss rates during training and testing compared to single GPU instance training.
暂无评论