检索结果-内蒙古大学图书馆

A Survey on Graph Neural Network Acceleration: A Hardware Perspective

Chinese Journal of Electronics 2024年第3期33卷 601-622页

作者： Shi CHEN Jingyu LIU Li SHEN School of Computer National University of Defense Technology Key Laboratory of Advanced Microprocessor Chips and Systems

Graph neural networks(GNNs) have emerged as powerful approaches to learn knowledge about graphs and *** rapid employment of GNNs poses requirements for processing *** to incompatibility of general platforms,dedicated hardware devices and platforms are developed to efficiently accelerate training and inference of *** conduct a survey on hardware acceleration for *** first include and introduce recent advances of the domain,and then provide a methodology of categorization to classify existing works into three ***,we discuss optimization techniques adopted at different *** finally we propose suggestions on future directions to facilitate further works.

关键词： Graph neural networks Deep learning acceleration Domain-specific architecture Hardware accelerator

来源：评论

学校读者我要写书评

暂无评论

A comprehensive survey on graph neural network accelerators

引用

Frontiers of computer Science 2025年第2期19卷 11-29页

作者： Jingyu LIU Shi CHEN Li SHEN School of Computer National University of Defense TechnologyChangsha 410073China Key Laboratory of Advanced Microprocessor Chips and Systems Changsha 410073China

Deep learning has gained superior accuracy on Euclidean structure data in neural *** a result,nonEuclidean structure data,such as graph data,has more sophisticated structural information,which can be applied in neural networks as well to address more complex and practical ***,actual graph data obeys a power-law distribution,so the adjacent matrix of a graph is random and *** processing accelerator(GPA)is designed to handle the problems ***,graph computing only processes 1-dimensional *** graph neural networks(GNNs),graph data is ***,GNNs include the execution processes of both traditional graph processing and neural network,which have irregular memory access and regular computation,*** obtain more information in graph data and require better model generalization ability,the layers of GNN are deeper,so the overhead of memory access and computation is *** present,GNN accelerators are designed to deal with this *** this paper,we conduct a systematic survey regarding the design and implementation of GNN ***,we review the challenges faced by GNN accelerators,and existing related works in detail to process ***,we evaluate previous works and propose future directions in this booming field.

关键词： graph neural network accelerators graph convolutional networks design space exploration deep learning domain-specific architecture

来源：评论

学校读者我要写书评

暂无评论

Decomposition-based learning in drone-assisted wireless-powered mobile edge computing networks

引用

Digital Communications and Networks 2024年第6期10卷 1769-1781页

作者： Xiaoyi Zhou Liang Huang Tong Ye Weiqiang Sun State Key Laboratory of Advanced Optical Communication Systems and Networks Shanghai Jiao Tong UniversityShanghai 200240China College of Computer Science and Technology Zhejiang University of TechnologyHangzhou 310058China

This paper investigates the multi-Unmanned Aerial Vehicle(UAV)-assisted wireless-powered Mobile Edge Computing(MEC)system,where UAVs provide computation and powering services to mobile *** aim to maximize the number of completed computation tasks by jointly optimizing the offloading decisions of all terminals and the trajectory planning of all *** action space of the system is extremely large and grows exponentially with the number of *** this case,single-agent learning will require an overlarge neural network,resulting in insufficient ***,the offloading decisions and trajectory planning are two subproblems performed by different executants,providing an opportunity for *** thus adopt the idea of decomposition and propose a 2-Tiered Multi-agent Soft Actor-Critic(2T-MSAC)algorithm,decomposing a single neural network into multiple small-scale *** the first tier,a single agent is used for offloading decisions,and an online pretrained model based on imitation learning is specially designed to accelerate the training process of this *** the second tier,UAVs utilize multiple agents to plan their *** agent exerts its influence on the parameter update of other agents through actions and rewards,thereby achieving joint *** results demonstrate that the proposed algorithm can be applied to scenarios with various location distributions of terminals,outperforming existing benchmarks that perform well only in specific *** particular,2T-MSAC increases the number of completed tasks by 45.5%in the scenario with uneven terminal ***,the pretrained model based on imitation learning reduces the convergence time of 2T-MSAC by 58.2%.

关键词： Mobile-edge computing Multi-agent reinforcement learning Offloading decision Trajectory planning Unmanned aerial vehicle Wireless power transfer

来源：评论

学校读者我要写书评

暂无评论

Multi-Scale Time Series Segmentation Network Based on Eddy Current Testing for Detecting Surface Metal Defects

引用

IEEE/CAA Journal of Automatica Sinica 2025年第3期12卷 528-538页

作者： Xiaorui Li Xiaojuan Ban Haoran Qiao Zhaolin Yuan Hong-Ning Dai Chao Yao Yu Guo Mohammad S.Obaidat George Q.Huang the School of Intelligence Science and Technology University of Science and Technology Beijing the Beijing Advanced Innovation Center for Materials Genome Engineering the Key Laboratory of Intelligent Bionic Unmanned Systems and the Institute of Materials Intelligent Technology Liaoning Academy of Materials IEEE the Department of Computer Science Hong Kong Baptist University the School of Computer and Communication Engineering Key Laboratory of Advanced Materials and Devices for Post-Moore Chips Ministry of Education University of Science and Technology Beijing the Beijing Advanced Innovation Center for Materials Genome Engineering University of Science and Technology Beijing the School of Computer and Communication Engineering University of Science and Technology Beijing the King Abdullah Ⅱ School of Information Technology The University of Jordan the Department of Computational Intelligence the School of Computing SRM University the School of Engineering The Amity University The Hong Kong Polytechnic University

In high-risk industrial environments like nuclear power plants, precise defect identification and localization are essential for maintaining production stability and safety. However, the complexity of such a harsh environment leads to significant variations in the shape and size of the defects. To address this challenge, we propose the multivariate time series segmentation network(MSSN), which adopts a multiscale convolutional network with multi-stage and depth-separable convolutions for efficient feature extraction through variable-length templates. To tackle the classification difficulty caused by structural signal variance, MSSN employs logarithmic normalization to adjust instance distributions. Furthermore, it integrates classification with smoothing loss functions to accurately identify defect segments amid similar structural and defect signal subsequences. Our algorithm evaluated on both the Mackey-Glass dataset and industrial dataset achieves over 95% localization and demonstrates the capture capability on the synthetic dataset. In a nuclear plant's heat transfer tube dataset, it captures 90% of defect instances with75% middle localization F1 score.

关键词： Eddy current testing nondestructive testing semantic segmentation time series analysis

来源：评论

学校读者我要写书评

暂无评论

Effect of Graphite Addition on Microstructure, Mechanical Properties and Thermal Properties of Injection Molded AZ91D Alloy+1

引用

Materials Transactions 2024年第4期65卷 374-380页

作者： Hideshima, Yasutoshi Maeda, Fumiya Fukuta, Tadao Ozaki, Koichi Course of Advanced Systems Engineering Graduate School of Computer Science and Systems Engineering Okayama Prefectural University Soja719-1197 Japan Process Technology Development Department Technology Development Division Seiko Epson Corp. Suwa399-0211 Japan Department of Systems Engineering Faculty of Computer Science and Systems Engineering Okayama Prefectural University Soja719-1197 Japan

Magnesium chips were coated with a high concentration of graphite using a binder and were used as the raw material for injection molding. The microstructure of the magnesium injection-molded product with added graphite exhibited a dispersion of needle-like graphite particles. No significant voids were observed at the interfaces between the graphite and the matrix. The addition of more than 0.5 mass% graphite decreased the proof stress and tensile strength of the injection-molded products. The Young’s modulus of the graphite-added products tended to decrease with an increase in the graphite content, which is consistent with the lower limit of the rule of mixtures. The thermal conductivity of the 6.9 mass% graphite-added product increased compared with that of the AZ91D magnesium alloy and the coefficient of linear thermal expansion decreased. Both values are within a range that satisfied the rules of mixtures. © 2024 The Japan Institute of Metals and Materials.

关键词： Thermal expansion

来源：评论

学校读者我要写书评

暂无评论

Dalea:A Persistent Multi-Level Extendible Hashing with Improved Tail Performance

引用

Journal of computer Science & technology 2023年第5期38卷 1051-1073页

作者：熊子威蒋德钧熊劲 Ren Ren Center for Advanced Computer Systems Institute of Computing TechnologyChinese Academy of Sciences Beijing 100190China University of Chinese Academy of Sciences Beijing 101408China Huawei Technology Co. Ltd.Shanghai 201206China

Persistent memory(PM)promises byte-addressability,large capacity,and *** memory systems,such as key-value stores and in-memory databases,benefit from such features of *** to the great popularity of hash-ing index in main memory systems,a number of research efforts are made to provide high average performance persistent ***,suboptimal tail performance in terms of tail throughput and tail latency is still observed for existing persistent *** this paper,we analyze major sources of suboptimal tail performance from key design issues of persis-tent *** identify the global hash structure and concurrency control as remaining explorable design spaces for im-proving tail *** propose Directory-sharing Multi-level Extendible Hashing(Dalea)for *** designs an-cestor link-based extendible hashing as well as fine-grained transient lock to address the two main sources(rehashing and locking)affecting tail *** evaluation results show that,compared with state-of-the-art persistent hashing Dash,Dalea achieves increased tail throughput by 4.1x and reduced tail latency by ***,in order to provide de-sign guidelines for improving tail performance,we adopt Dalea as a testbed to identify different impacts of four factors on tail performance,including fine-grained rehashing,transient locking,memory pre-allocation,and fingerprinting.

关键词： persistent memory persistent hashing indexing structure

来源：评论

学校读者我要写书评

暂无评论

SSC: An SRAM-Based Silence Computing Design for On-chip Memory 24th

SSC: An SRAM-Based Silence Computing Design for On-chip Mem...

引用

24th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2024

作者： Chen, Ziming Deng, Quan Hu, Yiyue He, Xiaowei Huang, Libo Wang, Yongwen College of Computer National University of Defense Technology Changsha China Key Laboratory of Advanced Microprocessor Chips and Systems Changsha China

ISBN: (纸本)9789819615445

The rapid development of emerging intelligent applications leads to a surge in computational demands and memory capacity requirements. Compute-in-memory (CIM) is a promising paradigm to alleviate the data movement bottleneck of emerging intelligent applications. SRAM-based CIM technology is employed to enhance the performance of general-purpose processors (e.g., CPU) by reusing on-chip memory (e.g. Cache) for computational tasks. In recent SRAM-based CIM works, peripheral compute circuits, e.g., adder trees, are introduced to improve system performance, as SRAM cells are hard to support arithmetic operations efficiently. However, the excessive peripheral circuit with a large area overhead degrades the memory density, which breaks the balance of computation and memory. To improve the computing capability and memory capability simultaneously of digital-based CIM (DCIM), we propose SSC, an SRAM-based silence computing CIM design, which leverages logic-in-memory operations and peripheral circuits to achieve parallel computing within the SRAM array. We propose 8+T (8T, 9T, and 11T) SRAM bitcells to support majority-of-three (MAJ3), COPY, and NOT logic, which is used to achieve a 3:2 compressor. Furthermore, a silence computing design, facilitated by a local connection between SRAM cells, is proposed to improve system parallelism, which supports parallel logic-in-memory operations and memory access. Our experiment results show that SSC achieves 3.7× memory density improvement compared to prior DCIMs normalized to 12nm. Additionally, the SWaP (TOPS/W×Kb/mm2) figure-of-merit emphasizes the importance of memory density and energy efficiency, showing that this work achieves a higher SWaP of 119.89, which is 1.84× more than prior DCIMs normalized to 12 nm. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.

关键词： computer circuits

来源：评论

学校读者我要写书评

暂无评论

Reconfiguration of vertex-disjoint shortest paths on graphs

引用

Journal of Graph Algorithms and Applications 2024年第3期28卷 87-101页

作者： Saito, Rin Eto, Hiroshi Ito, Takehiro Uehara, Ryuhei Graduate School of Information Sciences Tohoku University Sendai Japan School of Computer Science and Systems Engineering Kyushu Institute of Technology Iizuka Japan School of Information Science Japan Advanced Institute of Science and Technology Nomi Japan

We introduce and study reconfiguration problems for (internally) vertex-disjoint shortest paths: Given two tuples of internally vertex-disjoint shortest paths for fixed terminal pairs in an unweighted graph, we are asked to determine whether one tuple can be transformed into the other by exchanging a single vertex of one shortest path in the tuple at a time, so that all intermediate results remain tuples of internally vertex-disjoint shortest paths. We also study the shortest variant of the problem, that is, we wish to minimize the number of vertex-exchange steps required for such a transformation, if exists. These problems generalize the well-studied Shortest Path Reconfiguration problem. In this paper, we analyze the complexity of these problems from the viewpoint of graph classes, and give some interesting contrast. © 2024, Brown University. All rights reserved.

关键词： Graph theory

来源：评论

学校读者我要写书评

暂无评论

Critical Path Optimization for Logic Netlists 9

Critical Path Optimization for Logic Netlists

引用

9th International Conference on Integrated Circuits and Microsystems, ICICM 2024

作者： Yu, Xuewen Huang, Pengcheng Chen, Haiyan Chen, Wei School of Computer National University of Defense Technology Key Laboratory of Advanced Microprocessors Chips and Systems Changsha China School of Computer National University of Defense Technology Department of Computer Science Changsha China

ISBN: (纸本)9798331509453

It is of great concern for high performance microprocessors to optimize logic delay and improve performance in the semi-custom design flow based on commercial standard cell library. To solve this issue, the paper proposes an automated method to optimize the delay of critical path in any block. Firstly, we construct the logic cone for critical paths based on the logic netlist obtained from semi-custom design flow. Secondly, the algorithm conducts large-scale graph partitioning on the logic cone with appropriate constraints. Thirdly, using two-level optimization, we perform logic restructuring on sub graphs resulting from partitioning to get a set of new logic cells. Finally, employing a match algorithm, we pick out high-reusable restructured logic cells for full-custom circuit implementation and replace corresponding circuit structures with them in the logic netlist. The proposed algorithm picks out 5 logic cells for restructuring. Experimental results demonstrate that the approach achieves a reduction in the logic levels and logic delay can be reduced by about 10% after replacing the corresponding structures in the logic netlist. The work effectively reduces critical path delay, which has the potential to enhance the performance of microprocessors. © 2024 IEEE.

关键词： computer circuits

来源：评论

学校读者我要写书评

暂无评论

Hybrid Deadlock Recovery Algorithm for Irregular NoC in Multi-Chiplet systems 22

Hybrid Deadlock Recovery Algorithm for Irregular NoC in Mult...

引用

22nd IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2024

作者： Chen, Zhiqiang Wang, Yongwen Zhou, Hongwei National University of Defense Technology College of Computer Science and Technology Changsha China Key Laboratory of Advanced Microprocessor Chips and Systems Changsha China

ISBN: (纸本)9798331509712

Dividing a single System-on-Chip (SoC) into multiple chiplets and connecting them using 2.5D packaging technology is becoming a widely adopted approach to enhance chip scale and performance. However, when multiple chiplets are integrated to form a chiplet-based system, the Network-on-Chip (NoC) that interconnects these chiplets can be susceptible to deadlock. Additionally, modularity is a specific concern, as it involves integrating chiplets of different functions, sizes, manufacturing processes, and so on. However, physical layout constraints and potential vertical link failures may result in irregular topologies, which further complicates the design of both fault-tolerant and load-balanced routing algorithms. To tackle these challenges, a Hybrid Deadlock Recovery algorithm for Irregular NoC (HDRI) in multi-chiplet systems is proposed. This algorithm shows adaptability in irregular topologies, additionally exhibiting fault-tolerance capabilities. HDRI features two modes that can dynamically adapt to the network status. Upon detecting a deadlock, it recovers from the deadlock through the coordination of inter-chiplet packets. To be specific, HDRI seeks to break the deadlock by either forwarding or recycling the blocked packets, which respectively correspond to low and high load modes. Experimental results demonstrate that HDRI offers a 7.5% reduction in latency and an area overhead of less than 1.1%. © 2024 IEEE.

关键词： Network-on-chip

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：