检索结果-内蒙古大学图书馆

BGNN: Behavior-aware graph neural network for heterogeneous session-based recommendation

Frontiers of Computer Science 2023年第5期17卷 103-118页

作者： Jinwei LUO Mingkai HE Weike PAN Zhong MING College of Computer Science and Software Engineering Shenzhen UniversityShenzhen 518060China National Engineering Laboratory for Big Data System Computing Technology Shenzhen UniversityShenzhen 518060China

Session-based recommendation(SBR)and multibehavior recommendation(MBR)are both important problems and have attracted the attention of many researchers and *** from SBR that solely uses one single type of behavior sequences and MBR that neglects sequential dynamics,heterogeneous SBR(HSBR)that exploits different types of behavioral information(e.g.,examinations like clicks or browses,purchases,adds-to-carts and adds-to-favorites)in sequences is more consistent with real-world recommendation scenarios,but it is rarely *** efforts towards HSBR focus on distinguishing different types of behaviors or exploiting homogeneous behavior transitions in a sequence with the same type of ***,all the existing solutions for HSBR do not exploit the rich heterogeneous behavior transitions in an explicit way and thus may fail to capture the semantic relations between different types of ***,all the existing solutions for HSBR do not model the rich heterogeneous behavior transitions in the form of graphs and thus may fail to capture the semantic relations between different types of *** limitation hinders the development of HSBR and results in unsatisfactory *** a response,we propose a novel behavior-aware graph neural network(BGNN)for *** BGNN adopts a dual-channel learning strategy for differentiated modeling of two different types of behavior sequences in a ***,our BGNN integrates the information of both homogeneous behavior transitions and heterogeneous behavior transitions in a unified *** then conduct extensive empirical studies on three real-world datasets,and find that our BGNN outperforms the best baseline by 21.87%,18.49%,and 37.16%on average correspondingly.A series of further experiments and visualization studies demonstrate the rationality and effectiveness of our *** exploratory study on extending our BGNN to handle more than two types of behaviors show that our BGNN can e

关键词： session-based recommendation graph neural network heterogeneous behaviors

来源：评论

学校读者我要写书评

暂无评论

TIGA: Towards Efficient Near data Processing in SmartNICs-based Disaggregated Memory systems 24

TIGA: Towards Efficient Near Data Processing in SmartNICs-ba...

引用

61st ACM/IEEE Design Automation Conference, DAC 2024

作者： Duan, Zhuohui Yu, Zelin Liu, Haikun Liao, Xiaofei Jin, Hai Zheng, Shijie Wu, Sihan National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab Huazhong University of Science and Technology Wuhan430074 China

ISBN: (纸本)9798400706011

Memory disaggregation, facilitated by Smart Network Interface Cards (SmartNICs), has emerged as a cost-effective approach for sharing memory resources in data centers. However, current SoC-based SmartNICs face several challenges for supporting near-data processing (NDP) in disaggregated memory (DM) systems effectively, such as inefficient resource allocation for SmartNICs employed in NDP, and the lack of collaboration between SmartNICs on data nodes and CPUs on compute nodes. To address these issues, we propose TIGA, an efficient NDP framework for SmartNICs-based disaggregated memory systems. We propose an adaptive resource allocator to fully utilize the SoC cores among NDP engines automatically, and a SmartNIC-CPU cooperative computing mechanism to schedule NDP tasks among CPUs and SmartNICs. We prototype TIGA with FPGAs and evaluate it with several typical workloads. Experimental results show that TIGA significantly improves the efficiency of NDP tasks in DM systems compared with state-of-the-art SmartNIC-based co-processing schemes. © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

关键词： Storage allocation (computer)

来源：评论

学校读者我要写书评

暂无评论

P3DC:Reducing DRAM Cache Hit Latency by Hybrid Mappings

引用

Journal of Computer Science & technology 2024年第6期39卷 1341-1360页

作者： Ye Chi Ren-Tong Guo Xiao-Fei Liao Hai-Kun Liu Jianhui Yue National Engineering Research Center for Big Data Technology and System Wuhan 430074China Services Computing Technology and System Laboratory Wuhan 430074China Cluster and Grid Computing Laboratory Wuhan 430074China School of Computer Science and Technology Huazhong University of Science and TechnologyWuhan 430074China School of Big Data and Internet Shenzhen Technology UniversityShenzhen 518118China Department of Computer Science Michigan Technological UniversityHoughton 49931-1295U.S.A.

Die-stacked dynamic random access memory(DRAM)caches are increasingly advocated to bridge the performance gap between the on-chip cache and the main *** fully realize their potential,it is essential to improve DRAM cache hit rate and lower its cache hit *** order to take advantage of the high hit-rate of set-association and the low hit latency of direct-mapping at the same time,we propose a partial direct-mapped die-stacked DRAM cache called *** design is motivated by a key observation,i.e.,applying a unified mapping policy to different types of blocks cannot achieve a high cache hit rate and low hit latency *** address this problem,P3DC classifies data blocks into leading blocks and following blocks,and places them at static positions and dynamic positions,respectively,in a unified set-associative *** also propose a replacement policy to balance the miss penalty and the temporal locality of different *** addition,P3DC provides a policy to mitigate cache thrashing due to block type *** results demonstrate that P3DC can reduce the cache hit latency by 20.5%while achieving a similar cache hit rate compared with typical set-associative caches.P3DC improves the instructions per cycle(IPC)by up to 66%(12%on average)compared with the state-of-the-art direct-mapped cache—BEAR,and by up to 19%(6%on average)compared with the tag-data decoupled set-associative cache—DEC-A8.

关键词： die-stacked dynamic random access memory(DRAM) cache set-associative direct-mapped hit latency

来源：评论

学校读者我要写书评

暂无评论

RTGA: A Redundancy-free Accelerator for High-Performance Temporal Graph Neural Network Inference 24

RTGA: A Redundancy-free Accelerator for High-Performance Tem...

引用

61st ACM/IEEE Design Automation Conference, DAC 2024

作者： Yu, Hui Zhang, Yu Tan, Andong Lu, Chenze Zhao, Jin Liao, Xiaofei Jin, Hai Liu, Haikun National Engineering Research Center for Big Data Technology and System Service Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology China

ISBN: (纸本)9798400706011

Temporal Graph Neural Network (TGNN) has attracted much research attention because it can capture the dynamic nature of complex networks. However, existing solutions suffer from redundant computation overhead and excessive off-chip communications for TGNN inference because they often rely on redundant graph sampling and repeatedly fetching the features and vertex memory. This paper proposes a redundancy-free accelerator, RTGA, for high-performance TGNN inference. Specifically, RTGA proposes a redundancy-aware execution approach with temporal tree into a novel accelerator design to effectively eliminate unnecessary data processing for fewer redundant computations and off-chip communications and also designs a temporal-aware data caching method to improve data locality for TGNN. We have implemented and evaluated RTGA on a Xilinx Alveo U280 FPGA card. Compared with cutting-edge software solutions (i.e., TGN and TGL) and hardware solutions (i.e., BlockGNN and FlowGNN), RTGA improves the performance of TGNN inference by an average of 473.2x, 87.4x, 8.2x, and 6.9x and saves energy by 542.8x, 102.2x, 9.4x, and 8.3x, respectively. © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

关键词： Spatio-temporal data

来源：评论

学校读者我要写书评

暂无评论

High-Performance and Resource-Efficient Dynamic Memory Management in High-Level Synthesis 24

High-Performance and Resource-Efficient Dynamic Memory Manag...

引用

61st ACM/IEEE Design Automation Conference, DAC 2024

作者： Wang, Qinggang Zheng, Long An, Zhaozeng Huang, Haoqin Zhu, Haoran Huang, Yu Yao, Pengcheng Liao, Xiaofei Jin, Hai National Engineering Research Center for Big Data Technology and System Service Computing Technology and System Lab Cluster and Grid Computing Lab Huazhong University of Science and Technology Wuhan China Zhejiang Lab Hangzhou China

ISBN: (纸本)9798400706011

With the merits of high productivity and ease of use, highlevel synthesis (HLS) tools bring hope to fast FPGA-based architecture development. However, their usability and popularity are still limited due to lack of support for dynamic memory management (DMM). Though HLS-compatible DMM solutions have been proposed recently, nevertheless, based on our investigation, none of them can hit high performance (i.e., minimal memory (de-)allocation latency) and resource efficiency (i.e., managing arbitrarily sized memory with minimal FPGA resource consumption) with one stone, seriously limiting their practicality. In response, we propose HeroDMM, a high-performance and resource-efficient dynamic memory manager for HLS. Specifically, HeroDMM organizes the managed memory area with a novel cartesian-like tree (CT) structure, a key to resolving the dilemma between (de-)allocation latency and resource efficiency standing in front of prior efforts. With the CT structure, HeroDMM further devises a delicate memory management algorithm and specializes the hardware implementation for achieving ever-higher performance while ensuring resource efficiency. Results show that HeroDMM outperforms state-of-the-art HLS-compatible DMM solutions by 61.69%∼99.99% in performance improvement and 23.79%∼97.22% in resource consumption savings. © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

关键词： Memory management

来源：评论

学校读者我要写书评

暂无评论

Towards High-Performance Graph Processing: From a Hardware/Software Co-Design Perspective

引用

Journal of Computer Science & technology 2024年第2期39卷 245-266页

作者：廖小飞赵文举金海姚鹏程黄禹王庆刚赵进郑龙张宇邵志远 National Engineering Research Center for Big Data Technology and System School of Computer Science and Technology Huazhong University of Science and TechnologyWuhan 430074China Services Computing Technology and System Laboratory School of Computer Science and Technology Huazhong University of Science and TechnologyWuhan 430074China Cluster and Grid Computing Laboratory School of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhan 430074China Zhejiang Lab Hangzhou 311121China

Graph processing has been widely used in many scenarios,from scientific computing to artificial *** processing exhibits irregular computational parallelism and random memory accesses,unlike traditional ***,running graph processing workloads on conventional architectures(e.g.,CPUs and GPUs)often shows a significantly low compute-memory ratio with few performance benefits,which can be,in many cases,even slower than a specialized single-thread graph *** domain-specific hardware designs are essential for graph processing,it is still challenging to transform the hardware capability to performance boost without coupled software *** article presents a graph processing ecosystem from hardware to *** start by introducing a series of hardware accelerators as the foundation of this ***,the codesigned parallel graph systems and their distributed techniques are presented to support graph ***,we introduce our efforts on novel graph applications and hardware *** results show that various graph applications can be efficiently accelerated in this graph processing ecosystem.

关键词： graph processing hardware accelerator software system high performance ecosystem

来源：评论

学校读者我要写书评

暂无评论

AccelES: Accelerating Top-K SpMV for Embedding Similarity via Low-bit Pruning 31

AccelES: Accelerating Top-K SpMV for Embedding Similarity vi...

引用

31st IEEE International Symposium on High Performance Computer Architecture, HPCA 2025

作者： Zhai, Jiaqi Shi, Xuanhua Huang, Kaiyi Ye, Chencheng Hu, Weifang He, Bingsheng Jin, Hai Huazhong University of Science and Technology National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science and Technology Wuhan430074 China National University of Singapore School of Computing 119077 Singapore

ISBN: (纸本)9798331506476

In the realm of recommendation systems, achieving real-time performance in embedding similarity tasks is often hindered by the limitations of traditional Top-K sparse matrix-vector multiplication (SpMV) methods, which suffer from high latency due to inefficient memory access patterns. This paper identifies these critical gaps and introduces AccelES, a novel approach that significantly enhances the efficiency of Top-K SpMV. Our method employs a two-stage calculation scheme: the first stage utilizes a compact, low-bit dataset to quickly identify the most relevant entries, while the second stage performs full-precision calculations solely on this pruned subset, thereby minimizing computational overhead. Furthermore, AccelES incorporates innovative matrix representations, Ultra-CSR and Random-CSR, which optimize memory bandwidth utilization. Experimental results demonstrate that AccelES accelerates performance, surpassing state-of-the-art FPGA, GPU, and CPU solutions by factors of 3.4×, 2.5×, and 153.3×, respectively, under controlled conditions. These advancements not only enhance processing speed but also significantly improve real-time performance in recommendation systems, establishing AccelES as a pivotal contribution to the field of Top-K sparse matrix-vector multiplication. © 2025 IEEE.

关键词： Embeddings

来源：评论

学校读者我要写书评

暂无评论

Minimal Context-Switching data Race Detection with dataflow Tracking

引用

Journal of Computer Science & technology 2024年第1期39卷 211-226页

作者：郑龙李洋辛杰刘海峰郑然廖小飞金海 National Engineering Research Center for Big Data Technology and System School of Computer Science and Technology Huazhong University of Science and TechnologyWuhan 430074China Services Computing Technology and System Laboratory School of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhan 430074China Cluster and Grid Computing Laboratory School of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhan 430074China

data race is one of the most important concurrent anomalies in multi-threaded *** con-straint-based techniques are leveraged into race detection,which is able to find all the races that can be found by any oth-er sound race ***,this constraint-based approach has serious limitations on helping programmers analyze and understand data ***,it may report a large number of false positives due to the unrecognized dataflow propa-gation of the ***,it recommends a wide range of thread context switches to schedule the reported race(in-cluding the false one)whenever this race is exposed during the constraint-solving *** ad hoc recommendation imposes too many context switches,which complicates the data race *** address these two limitations in the state-of-the-art constraint-based race detection,this paper proposes DFTracker,an improved constraint-based race detec-tor to recommend each data race with minimal thread context ***,we reduce the false positives by ana-lyzing and tracking the dataflow in the *** this means,DFTracker thus reduces the unnecessary analysis of false race *** further propose a novel algorithm to recommend an effective race schedule with minimal thread con-text switches for each data *** experimental results on the real applications demonstrate that 1)without removing any true data race,DFTracker effectively prunes false positives by 68%in comparison with the state-of-the-art constraint-based race detector;2)DFTracker recommends as low as 2.6-8.3(4.7 on average)thread context switches per data race in the real world,which is 81.6%fewer context switches per data race than the state-of-the-art constraint based race ***,DFTracker can be used as an effective tool to understand the data race for programmers.

关键词： data race satisfiability modulo theory multi-threaded program dynamic detection

来源：评论

学校读者我要写书评

暂无评论

SpaHet: A Software/Hardware Co-design for Accelerating Heterogeneous-Sparsity based Sparse Matrix Multiplication 24

SpaHet: A Software/Hardware Co-design for Accelerating Heter...

引用

61st ACM/IEEE Design Automation Conference, DAC 2024

作者： Huang, Haoqin Yao, Pengcheng An, Zhaozeng Sun, Yufei Hu, Ao Xu, Peng Zheng, Long Liao, Xiaofei Jin, Hai National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab Huazhong University of Science and Technology Wuhan430074 China Zhejiang Lab Hangzhou311121 China

ISBN: (纸本)9798400706011

Sparse general matrix-matrix multiplication is widely used in data mining applications. Its irregular memory access patterns limit the performance of general-purpose processors, thus motivating many FPGA-based hardware innovations in recent years. Nevertheless, existing accelerators fail to efficiently support heterogeneous input matrix sparsity, which is universal in various real-world applications. With in-depth experimental analysis, we observe that their performance is bottlenecked by their fixed tiling mechanisms, which only alleviate the irregularity of one input matrix. Based on the observation, we propose SpaHet, a software/hardware co-design to accelerate heterogeneous-sparsity based sparse matrix multiplication. SpaHet adopts a dual-adaptive sliding window mechanism to cover the reuse characteristics of both input matrices simultaneously. With a specialized exploration algorithm, the window-based mechanism can automatically find the optimal tiling strategy instead of applying a fixed one based on empirical experience. A sparsity-aware merge tree is also proposed to maximize the output matrix reuse via accumulating intermediate results thoroughly. Our results on a Xilinx Alveo U280 accelerator card show that SpaHet outperforms state-of-the-art CPU-, GPU- and FPGA-based solutions by 7.71×, 1.1×, and 2.74× in performance, respectively. © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

关键词： Program processors

来源：评论

学校读者我要写书评

暂无评论

Towards Redundancy-Free Recommendation Model Training via Reusable-aware Near-Memory Processing 24

Towards Redundancy-Free Recommendation Model Training via Re...

引用

61st ACM/IEEE Design Automation Conference, DAC 2024

作者： Liu, Haifeng Zheng, Long Huang, Yu Huang, Haoyan Liao, Xiaofei Hai, Jin National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab Huazhong University of Science and Technology Wuhan430074 China Zhejiang Lab Hangzhou311121 China

ISBN: (纸本)9798400706011

The memory-intensive embedding layer in recommendation model continues to be the performance bottleneck. While prior works have attempted to improve the embedding layer performance by exploiting the data locality to cache the frequently accessed embedding vectors and their partial sums. However, these solutions rely on the static cache, which is inapplicable in the embedding training scenario where the embedding vectors are updated frequently. To this end, this paper proposes ReFree, a redundancy-free near-memory processing (NMP) solution for recommendation model training. Specifically, ReFree identifies the reusable data in realtime for both embedding layer forward and backward stages and leverages a lightweight NMP architecture to enable redundancy-free near-memory acceleration of the entire embedding training process. Evaluation results on real-world datasets show that ReFree outperforms the state-of-the-art solutions by 10.9× and reduces 5.3× energy consumption on average. © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

关键词： Embeddings

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：