检索结果-内蒙古大学图书馆

2021 IEEE Globecom Workshops, GC Wkshps 2021

作者： Sanchez, Jesus Rodriguez Edfors, Ove Liu, Liang Lund University Department of Electrical and Information Technology Sweden

ISBN: (纸本)9781665423908

Wireless-based positioning with large antenna arrays is a promising enabler of the high accuracy positioning services envisioned for 6G. These systems provide high spatial resolution due to the large number of antennas, while enjoying the benefit of sharing a common infrastructure between communication and positioning. Among the available techniques for wireless-based positioning, channel state information (CSI)-based fingerprinting via neural networks (NNs) offers high accuracy under challenging propagation conditions, without the need of storing and accessing large amounts of measurement data during inference. On the other hand, large antenna systems, such as Large Intelligent Surfaces (LIS), benefits from a distributed architecture and local processing of wireless signals received from nearby antennas, producing intermediate results that can be aggregated, and therefore considerably reducing the demand on interconnection bandwidth. In this work, we propose a method to perform positioning of users based on estimated CSI in a LIS built from panels. Following this method, panels provide a parameterized probability density function for the location of each user, which can be shared conveniently and fused in different panels or a central processing unit (CPU), providing high positioning accuracy using very low interconnection bandwidth. © 2021 IEEE.

关键词： Bandwidth

来源：评论

学校读者我要写书评

暂无评论

Hybrid transformer model with liquid neural networks and learnable encodings for buildings' energy forecasting

引用

ENERGY AND AI 2025年 20卷

作者： Antonesi, Gabriel Cioara, Tudor Anghel, Ionut Papias, Ioannis Michalakopoulos, Vasilis Sarmas, Elissaios Tech Univ Cluj Napoca Comp Sci Dept Distributed Syst Res Lab G Baritiu 26-28 Cluj Napoca 400027 Romania Natl Tech Univ Athens Sch Elect & Comp Engn Decis Support Syst Lab Ir Politech 9 Athens 15773 Greece

Accurate forecasting of buildings' energy demand is essential for building operators to manage loads and resources efficiently, and for grid operators to balance local production with demand. However, nowadays models still struggle to capture nonlinear relationships influenced by external factors like weather and consumer behavior, assume constant variance in energy data over time, and often fail to model sequential data. To address these limitations, we propose a hybrid Transformer-based model with Liquid neural networks and learnable encodings for building energy forecasting. The model leverages Dense Layers to learn non-linear mappings to create embeddings that capture underlying patterns in time series energy data. Additionally, a Convolutional neural network encoder is integrated to enhance the model's ability to understand temporal dynamics through spatial mappings. To address the limitations of classic attention mechanisms, we implement a reservoir processing module using Liquid neural networks which introduces a controlled non-linearity through dynamic reservoir computing, enabling the model to capture complex patterns in the data. For model evaluation, we utilized both pilot data and state-of-the-art datasets to determine the model's performance across various building contexts, including large apartment and commercial buildings and small households, with and without on-site energy production. The proposed transformer model demonstrates good predictive accuracy and training time efficiency across various types of buildings and testing configurations. Specifically, SMAPE scores indicate a reduction in prediction error, with improvements ranging from 1.5 % to 50 % over basic transformer, LSTM and ANN models while the higher R2 values further confirm the model's reliability in capturing energy time series variance. The 8 % improvement in training time over the basic transformer model, highlights the hybrid model computational efficiency without compromising ac

关键词： Residential building Commercial building Households Energy forecasting Transformer model Liquid neural network

来源：评论

学校读者我要写书评

暂无评论

distributed Training of Deep neural network Acoustic Models for Automatic Speech Recognition: A comparison of current training strategies

引用

IEEE SIGNAL processing MAGAZINE 2020年第3期37卷 39-49页

作者： Cui, Xiaodong Zhang, Wei Finkler, Ulrich Saon, George Picheny, Michael Kung, David IBM TJ Watson Res Ctr Yorktown Hts NY 10598 USA Columbia Univ Dept Elect Engn New York NY 10027 USA IBM Corp Armonk NY 10504 USA NYU New York NY 10003 USA NYU Courant Comp Sci New York NY 10003 USA NYU Ctr Data Sci New York NY 10003 USA IBM TJ Watson Res Ctr IBM Res AI Yorktown Hts NY USA IBM TJ Watson Res Ctr Accelerated Cognit Infrastruct Dept Algorithms & Tools Grp Yorktown Hts NY USA

The past decade has witnessed great progress in automatic speech recognition (ASR) due to advances in deep learning. The improvements in performance can be attributed to both improved models and large-scale training data. The key to training such models is the employment of efficient distributed learning techniques. In this article, we provide an overview of distributed training techniques for deep neural network (DNN) acoustic models used for ASR. Starting with the fundamentals of data parallel stochastic gradient descent (SGD) and ASR acoustic modeling, we investigate various distributed training strategies and their realizations in high-performance computing (HPC) environments with an emphasis on striking a balance between communication and computation. Experiments are carried out on a popular public benchmark to study the convergence, speedup, and recognition performance of the investigated strategies.

关键词： Training data Computational modeling Acoustics Hidden Markov models Data models Bandwidth Machine learning

来源：评论

学校读者我要写书评

暂无评论

Trinity: neural network Adaptive distributed Parallel Training Method Based on Reinforcement Learning

引用

ALGORITHMS 2022年第4期15卷 108-108页

作者： Zeng, Yan Wu, Jiyang Zhang, Jilin Ren, Yongjian Zhang, Yunquan Hangzhou Danzi Univ Sch Comp Sci Hangzhou 310018 Peoples R China Minist Educ Key Lab Modeling & Simulat Complex Syst Hangzhou 310018 Peoples R China Data Secur Governance Zhejiang Engn Res Ctr Hangzhou 310018 Peoples R China Chinese Acad Sci Inst Comp Technol Beijing 100086 Peoples R China

Deep learning, with increasingly large datasets and complex neural networks, is widely used in computer vision and natural language processing. A resulting trend is to split and train large-scale neural network models across multiple devices in parallel, known as parallel model training. Existing parallel methods are mainly based on expert design, which is inefficient and requires specialized knowledge. Although automatically implemented parallel methods have been proposed to solve these problems, these methods only consider a single optimization aspect of run time. In this paper, we present Trinity, an adaptive distributed parallel training method based on reinforcement learning, to automate the search and tuning of parallel strategies. We build a multidimensional performance evaluation model and use proximal policy optimization to cooptimize multiple optimization aspects. Our experiment used the CIFAR10 and PTB datasets based on InceptionV3, NMT, NASNet and PNASNet models. Compared with Google's Hierarchical method, Trinity achieves up to 5% reductions in runtime, communication, and memory overhead, and up to a 40% increase in parallel strategy search speeds.

关键词： distributed machine learning deep learning reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

A Survey of Deep Learning on CPUs: Opportunities and Co-Optimizations

引用

IEEE TRANSACTIONS ON neural networkS AND LEARNING SYSTEMS 2022年第10期33卷 5095-5115页

作者： Mittal, Sparsh Rajput, Poonam Subramoney, Sreenivas IIT Roorkee Roorkee 247667 Uttar Pradesh India IIT Hyderabad Hyderabad 502285 India Intel Labs Bengaluru 560054 India

CPU is a powerful, pervasive, and indispensable platform for running deep learning (DL) workloads in systems ranging from mobile to extreme-end servers. In this article, we present a survey of techniques for optimizing DL applications on CPUs. We include the methods proposed for both inference and training and those offered in the context of mobile, desktop/server, and distributed systems. We identify the areas of strength and weaknesses of CPUs in the field of DL. This article will interest practitioners and researchers in the area of artificial intelligence, computer architecture, mobile systems, and parallel computing.

关键词： Optimization Training Throughput Parallel processing Data centers Central processing Unit Graphics processing units Accelerator architectures approximate computing central processing unit mobile systems neural network hardware

来源：评论

学校读者我要写书评

暂无评论

Classification of images derived from submarine fibre optic sensing: detecting broadband seismic activity from hydroacoustic signals

引用

GEOPHYSICAL JOURNAL INTERNATIONAL 2024年第1期240卷 483-501页

作者： Matthaiou, Ioannis Masoudi, Ali Araki, Eiichiro Kodaira, Shuichi Modafferi, Stefano Brambilla, Gilberto Univ Southampton Optoelect Res Ctr Southampton SO17 1BJ Hants England Japan Agcy Marine Earth Sci & Technol Yokosuka Kanagawa 2370061 Japan Univ Southampton Sch Elect & Comp Sci Digital Hlth & Biomed Engn Southampton SO17 1BJ Hants England

distributed acoustic sensing (DAS) is an optoelectronic technology that utilizes fibre optic cables to detect disturbances caused by seismic waves. Using DAS, seismologists can monitor geophysical phenomena at high spatial and temporal resolutions over long distances in inhospitable environments. Field experiments using DAS, are typically associated with large volumes of observations, requiring algorithms for efficient processing and monitoring capabilities. In this study, we present a supervised classifier trained to recognize seismic activity from other sources of hydroacoustic energy. Our classifier is based on a 2-D convolutional neural network architecture. The 55-km-long ocean-bottom fibre optic cable, located off Cape Muroto in southwest of Japan, was interrogated using DAS. Data were collected during two different monitoring time periods. Optimization of the model's hyperparameters using Gaussian Processes Regression was necessary to prevent issues associated with small sizes of training data. Using a test set of 100 labeled images, the highest-performing model achieved an average classification accuracy of 92 per cent, correctly classifying 100 per cent of instances in the geophysical class, 80 per cent in the non-geophysical class and 96 per cent in ambient noise class. This performance demonstrates the model's effectiveness in distinguishing between geophysical data, various sources of hydroacoustic energy, and ambient noise.

关键词： Image processing Machine learning distributed acoustic sensing

来源：评论

学校读者我要写书评

暂无评论

νGNN: Non-Uniformly partitioned full-graph GNN training on mixed GPUs

引用

CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING 2025年 1-18页

作者： Wang, Hemeng Lin, Wenqing Sun, Qingxiao Liu, Weifeng China Univ Petr Dept CST SSSLab Beijing 102249 Peoples R China

Graph neural networks (GNNs) can be adapted to GPUs with high computing capability due to massive arithmetic operations. Compared with mini-batch training, full-graph training does not require sampling of the input graph and halo region, avoiding potential accuracy losses. Current deep learning frameworks evenly partition large graphs to scale GNN training to distributed multi-GPU platforms. On the other hand, the rapid revolution of hardware requires technology companies and research institutions to frequently update their equipment to cope with the latest tasks. This results in a large-scale cluster with a mixture of GPUs with various computational capabilities and hardware specifications. However, existing works fail to consider sub-graphs adapted to different GPU generations, leading to inefficient resource utilization and degraded training efficiency. Therefore, we propose nu GNN, a Non-Uniformly partitioned full-graph GNN training framework on heterogeneous distributed platforms. nu GNN first models the GNN processing ability of hardware based on various theoretical parameters. Then, nu GNN automatically obtains a reasonable task partitioning scheme by combining hardware, model, and graph dataset information. Finally, nu GNN implements an irregular graph partitioning mechanism that allows GNN training tasks to execute efficiently on distributed heterogeneous systems. The experimental results show that in real-world scenarios with a mixture of GPU generations, nu GNN can outperform other static partitioning schemes based on hardware specifications.

关键词： Graph neural network distributed training Graph partitioning GPU

来源：评论

学校读者我要写书评

暂无评论

Adaptive Gradient Data Partition and Route Selection for distributed DNN Training 20th

Adaptive Gradient Data Partition and Route Selection for D...

引用

20th IFIP WG 10.3 International Conference on network and Parallel Computing, NPC 2024

作者： Chai, Bo Tan, Xiaobin Yuan, Shenzhi Jia, Guangge Meng, Qiushi Wang, Weifeng Zhu, Shiyin University of Science and Technology of China Hefei China Institute of Artificial Intelligence Hefei Comprehensive National Science Center Hefei China H3C Technologies Co. Limited Beijing China

ISBN: (纸本)9789819628636

As the scale of distributed training for Deep neural network (DNN) increases, communication has become a critical performance bottleneck in data center networks. In-network Aggregation (INA) can accelerate aggregating gradients process, the critical communication bottleneck in distributed DNN training. Due to the limited processing capacity of network switch, existing aggregation approaches have turned to a combination of INA and Parameter Server (PS) methods. However, these approaches employ a fixed partitioning of the gradient data, leading to suboptimal performance. This paper proposes GPRS, an adaptive gradient aggregation framework for distributed DNN training, which adaptively combines the line-rate processing of INA and large memory capacity of PS. We present an adaptive gradient data partition scheme, which divides the gradient data into INA and PS aggregation. We propose a heuristic algorithm that determines the aggregation route selection by assigning priority to the available network paths. To validate our approach, we build a real-world experimental testbed and implement the proposed framework. Experimental results show that it can overcome the limitations of traditional INA-only methods and reduce communication time by 15% ∼ 52% for DNN training of varying scales. © IFIP International Federation for Information processing 2025.

关键词： Data aggregation

来源：评论

学校读者我要写书评

暂无评论

Task-driven neural network models predict neural dynamics of proprioception

引用

CELL 2024年第7期187卷 1745-1761.e19页

作者： Vargas, Alessandro Marin Bisi, Axel Chiappa, Alberto S. Versteeg, Chris Miller, Lee E. Mathis, Alexander Ecole Polytech Fed Lausanne EPFL Brain Mind Inst Sch Life Sci CH-1015 Lausanne Switzerland Ecole Polytech Fed Lausanne EPFL NeuroX Inst Sch Life Sci CH-1015 Lausanne Switzerland Northwestern Univ Feinberg Sch Med Dept Neurosci Chicago IL 60611 USA Northwestern Univ Feinberg Sch Med Dept Phys Med & Rehabil Chicago IL 60611 USA Northwestern Univ McCormick Sch Engn Dept Biomed Engn Evanston IL 60208 USA Shirley Ryan Abil Lab Chicago IL 60611 USA

Proprioception tells the brain the state of the body based on distributed sensory neurons. Yet, the principles that govern proprioceptive processing are poorly understood. Here, we employ a task -driven modeling approach to investigate the neural code of proprioceptive neurons in cuneate nucleus (CN) and somatosensory cortex area 2 (S1). We simulated muscle spindle signals through musculoskeletal modeling and generated a large-scale movement repertoire to train neural networks based on 16 hypotheses, each representing different computational goals. We found that the emerging, task -optimized internal representations generalize from synthetic data to predict neural dynamics in CN and S1 of primates. Computational tasks that aim to predict the limb position and velocity were the best at predicting the neural activity in both areas. Since task optimization develops representations that better predict neural activity during active than passive movements, we postulate that neural activity in the CN and S1 is top -down modulated during goal -directed movements.

关键词： proprioception task-driven models neural networks somatosensory cortex cuneate nucleus state estimation efference copy goal-driven models biomechanics statistics of movement

来源：评论

学校读者我要写书评

暂无评论

TASER: Temporal Adaptive Sampling for Fast and Accurate Dynamic Graph Representation Learning 38

TASER: Temporal Adaptive Sampling for Fast and Accurate Dyna...

引用

International Parallel and distributed processing Symposium (IPDPS)

作者： Deng, Gangda Zhou, Hongkuan Zeng, Hanqing Xia, Yinglong Leung, Christopher Li, Jianbo Kannan, Rajgopal Prasanna, Viktor Univ Southern Calif Los Angeles CA 90007 USA Meta AI Menlo Pk CA USA US Army Res Lab Los Angeles CA USA

ISBN: (纸本)9798350387117;9798350387124

Recently, Temporal Graph neural networks (TGNNs) have demonstrated state-of-the-art performance in various high-impact applications, including fraud detection and content recommendation. Despite the success of TGNNs, they are prone to the prevalent noise found in real-world dynamic graphs like time-deprecated links and skewed interaction distribution. The noise causes two critical issues that significantly compromise the accuracy of TGNNs: (1) models are supervised by inferior interactions, and (2) noisy input induces high variance in the aggregated messages. However, current TGNN denoising techniques do not consider the diverse and dynamic noise pattern of each node. In addition, they also suffer from the excessive mini-batch generation overheads caused by traversing more neighbors. We believe the remedy for fast and accurate TGNNs lies in temporal adaptive sampling. In this work, we propose TASER, the first adaptive sampling method for TGNNs optimized for accuracy, efficiency, and scalability. TASER adapts its mini-batch selection based on training dynamics and temporal neighbor selection based on the contextual, structural, and temporal properties of past interactions. To alleviate the bottleneck in mini-batch generation, TASER implements a pure GPU-based temporal neighbor finder and a dedicated GPU feature cache. We evaluate the performance of TASER using two state-of-the-art backbone TGNNs. On five popular datasets, TASER outperforms the corresponding baselines by an average of 2.3% in Mean Reciprocal Rank (MRR) while achieving an average of 5.1x speedup in training time.

关键词： Temporal Graph neural network Adaptive Sampling GPU

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：