Wireless-based positioning with large antenna arrays is a promising enabler of the high accuracy positioning services envisioned for 6G. These systems provide high spatial resolution due to the large number of antenna...
详细信息
Accurate forecasting of buildings' energy demand is essential for building operators to manage loads and resources efficiently, and for grid operators to balance local production with demand. However, nowadays mod...
详细信息
Accurate forecasting of buildings' energy demand is essential for building operators to manage loads and resources efficiently, and for grid operators to balance local production with demand. However, nowadays models still struggle to capture nonlinear relationships influenced by external factors like weather and consumer behavior, assume constant variance in energy data over time, and often fail to model sequential data. To address these limitations, we propose a hybrid Transformer-based model with Liquid neuralnetworks and learnable encodings for building energy forecasting. The model leverages Dense Layers to learn non-linear mappings to create embeddings that capture underlying patterns in time series energy data. Additionally, a Convolutional neuralnetwork encoder is integrated to enhance the model's ability to understand temporal dynamics through spatial mappings. To address the limitations of classic attention mechanisms, we implement a reservoir processing module using Liquid neuralnetworks which introduces a controlled non-linearity through dynamic reservoir computing, enabling the model to capture complex patterns in the data. For model evaluation, we utilized both pilot data and state-of-the-art datasets to determine the model's performance across various building contexts, including large apartment and commercial buildings and small households, with and without on-site energy production. The proposed transformer model demonstrates good predictive accuracy and training time efficiency across various types of buildings and testing configurations. Specifically, SMAPE scores indicate a reduction in prediction error, with improvements ranging from 1.5 % to 50 % over basic transformer, LSTM and ANN models while the higher R2 values further confirm the model's reliability in capturing energy time series variance. The 8 % improvement in training time over the basic transformer model, highlights the hybrid model computational efficiency without compromising ac
The past decade has witnessed great progress in automatic speech recognition (ASR) due to advances in deep learning. The improvements in performance can be attributed to both improved models and large-scale training d...
详细信息
The past decade has witnessed great progress in automatic speech recognition (ASR) due to advances in deep learning. The improvements in performance can be attributed to both improved models and large-scale training data. The key to training such models is the employment of efficient distributed learning techniques. In this article, we provide an overview of distributed training techniques for deep neuralnetwork (DNN) acoustic models used for ASR. Starting with the fundamentals of data parallel stochastic gradient descent (SGD) and ASR acoustic modeling, we investigate various distributed training strategies and their realizations in high-performance computing (HPC) environments with an emphasis on striking a balance between communication and computation. Experiments are carried out on a popular public benchmark to study the convergence, speedup, and recognition performance of the investigated strategies.
Deep learning, with increasingly large datasets and complex neuralnetworks, is widely used in computer vision and natural language processing. A resulting trend is to split and train large-scale neuralnetwork models...
详细信息
Deep learning, with increasingly large datasets and complex neuralnetworks, is widely used in computer vision and natural language processing. A resulting trend is to split and train large-scale neuralnetwork models across multiple devices in parallel, known as parallel model training. Existing parallel methods are mainly based on expert design, which is inefficient and requires specialized knowledge. Although automatically implemented parallel methods have been proposed to solve these problems, these methods only consider a single optimization aspect of run time. In this paper, we present Trinity, an adaptive distributed parallel training method based on reinforcement learning, to automate the search and tuning of parallel strategies. We build a multidimensional performance evaluation model and use proximal policy optimization to cooptimize multiple optimization aspects. Our experiment used the CIFAR10 and PTB datasets based on InceptionV3, NMT, NASNet and PNASNet models. Compared with Google's Hierarchical method, Trinity achieves up to 5% reductions in runtime, communication, and memory overhead, and up to a 40% increase in parallel strategy search speeds.
CPU is a powerful, pervasive, and indispensable platform for running deep learning (DL) workloads in systems ranging from mobile to extreme-end servers. In this article, we present a survey of techniques for optimizin...
详细信息
CPU is a powerful, pervasive, and indispensable platform for running deep learning (DL) workloads in systems ranging from mobile to extreme-end servers. In this article, we present a survey of techniques for optimizing DL applications on CPUs. We include the methods proposed for both inference and training and those offered in the context of mobile, desktop/server, and distributed systems. We identify the areas of strength and weaknesses of CPUs in the field of DL. This article will interest practitioners and researchers in the area of artificial intelligence, computer architecture, mobile systems, and parallel computing.
distributed acoustic sensing (DAS) is an optoelectronic technology that utilizes fibre optic cables to detect disturbances caused by seismic waves. Using DAS, seismologists can monitor geophysical phenomena at high sp...
详细信息
distributed acoustic sensing (DAS) is an optoelectronic technology that utilizes fibre optic cables to detect disturbances caused by seismic waves. Using DAS, seismologists can monitor geophysical phenomena at high spatial and temporal resolutions over long distances in inhospitable environments. Field experiments using DAS, are typically associated with large volumes of observations, requiring algorithms for efficient processing and monitoring capabilities. In this study, we present a supervised classifier trained to recognize seismic activity from other sources of hydroacoustic energy. Our classifier is based on a 2-D convolutional neuralnetwork architecture. The 55-km-long ocean-bottom fibre optic cable, located off Cape Muroto in southwest of Japan, was interrogated using DAS. Data were collected during two different monitoring time periods. Optimization of the model's hyperparameters using Gaussian Processes Regression was necessary to prevent issues associated with small sizes of training data. Using a test set of 100 labeled images, the highest-performing model achieved an average classification accuracy of 92 per cent, correctly classifying 100 per cent of instances in the geophysical class, 80 per cent in the non-geophysical class and 96 per cent in ambient noise class. This performance demonstrates the model's effectiveness in distinguishing between geophysical data, various sources of hydroacoustic energy, and ambient noise.
Graph neuralnetworks (GNNs) can be adapted to GPUs with high computing capability due to massive arithmetic operations. Compared with mini-batch training, full-graph training does not require sampling of the input gr...
详细信息
Graph neuralnetworks (GNNs) can be adapted to GPUs with high computing capability due to massive arithmetic operations. Compared with mini-batch training, full-graph training does not require sampling of the input graph and halo region, avoiding potential accuracy losses. Current deep learning frameworks evenly partition large graphs to scale GNN training to distributed multi-GPU platforms. On the other hand, the rapid revolution of hardware requires technology companies and research institutions to frequently update their equipment to cope with the latest tasks. This results in a large-scale cluster with a mixture of GPUs with various computational capabilities and hardware specifications. However, existing works fail to consider sub-graphs adapted to different GPU generations, leading to inefficient resource utilization and degraded training efficiency. Therefore, we propose nu GNN, a Non-Uniformly partitioned full-graph GNN training framework on heterogeneous distributed platforms. nu GNN first models the GNN processing ability of hardware based on various theoretical parameters. Then, nu GNN automatically obtains a reasonable task partitioning scheme by combining hardware, model, and graph dataset information. Finally, nu GNN implements an irregular graph partitioning mechanism that allows GNN training tasks to execute efficiently on distributed heterogeneous systems. The experimental results show that in real-world scenarios with a mixture of GPU generations, nu GNN can outperform other static partitioning schemes based on hardware specifications.
As the scale of distributed training for Deep neuralnetwork (DNN) increases, communication has become a critical performance bottleneck in data center networks. In-network Aggregation (INA) can accelerate aggregating...
详细信息
Proprioception tells the brain the state of the body based on distributed sensory neurons. Yet, the principles that govern proprioceptive processing are poorly understood. Here, we employ a task -driven modeling appro...
详细信息
Proprioception tells the brain the state of the body based on distributed sensory neurons. Yet, the principles that govern proprioceptive processing are poorly understood. Here, we employ a task -driven modeling approach to investigate the neural code of proprioceptive neurons in cuneate nucleus (CN) and somatosensory cortex area 2 (S1). We simulated muscle spindle signals through musculoskeletal modeling and generated a large-scale movement repertoire to train neuralnetworks based on 16 hypotheses, each representing different computational goals. We found that the emerging, task -optimized internal representations generalize from synthetic data to predict neural dynamics in CN and S1 of primates. Computational tasks that aim to predict the limb position and velocity were the best at predicting the neural activity in both areas. Since task optimization develops representations that better predict neural activity during active than passive movements, we postulate that neural activity in the CN and S1 is top -down modulated during goal -directed movements.
Recently, Temporal Graph neuralnetworks (TGNNs) have demonstrated state-of-the-art performance in various high-impact applications, including fraud detection and content recommendation. Despite the success of TGNNs, ...
详细信息
ISBN:
(纸本)9798350387117;9798350387124
Recently, Temporal Graph neuralnetworks (TGNNs) have demonstrated state-of-the-art performance in various high-impact applications, including fraud detection and content recommendation. Despite the success of TGNNs, they are prone to the prevalent noise found in real-world dynamic graphs like time-deprecated links and skewed interaction distribution. The noise causes two critical issues that significantly compromise the accuracy of TGNNs: (1) models are supervised by inferior interactions, and (2) noisy input induces high variance in the aggregated messages. However, current TGNN denoising techniques do not consider the diverse and dynamic noise pattern of each node. In addition, they also suffer from the excessive mini-batch generation overheads caused by traversing more neighbors. We believe the remedy for fast and accurate TGNNs lies in temporal adaptive sampling. In this work, we propose TASER, the first adaptive sampling method for TGNNs optimized for accuracy, efficiency, and scalability. TASER adapts its mini-batch selection based on training dynamics and temporal neighbor selection based on the contextual, structural, and temporal properties of past interactions. To alleviate the bottleneck in mini-batch generation, TASER implements a pure GPU-based temporal neighbor finder and a dedicated GPU feature cache. We evaluate the performance of TASER using two state-of-the-art backbone TGNNs. On five popular datasets, TASER outperforms the corresponding baselines by an average of 2.3% in Mean Reciprocal Rank (MRR) while achieving an average of 5.1x speedup in training time.
暂无评论