Session-based recommendation(SBR)and multibehavior recommendation(MBR)are both important problems and have attracted the attention of many researchers and *** from SBR that solely uses one single type of behavior sequ...
详细信息
Session-based recommendation(SBR)and multibehavior recommendation(MBR)are both important problems and have attracted the attention of many researchers and *** from SBR that solely uses one single type of behavior sequences and MBR that neglects sequential dynamics,heterogeneous SBR(HSBR)that exploits different types of behavioral information(e.g.,examinations like clicks or browses,purchases,adds-to-carts and adds-to-favorites)in sequences is more consistent with real-world recommendation scenarios,but it is rarely *** efforts towards HSBR focus on distinguishing different types of behaviors or exploiting homogeneous behavior transitions in a sequence with the same type of ***,all the existing solutions for HSBR do not exploit the rich heterogeneous behavior transitions in an explicit way and thus may fail to capture the semantic relations between different types of ***,all the existing solutions for HSBR do not model the rich heterogeneous behavior transitions in the form of graphs and thus may fail to capture the semantic relations between different types of *** limitation hinders the development of HSBR and results in unsatisfactory *** a response,we propose a novel behavior-aware graph neural network(BGNN)for *** BGNN adopts a dual-channel learning strategy for differentiated modeling of two different types of behavior sequences in a ***,our BGNN integrates the information of both homogeneous behavior transitions and heterogeneous behavior transitions in a unified *** then conduct extensive empirical studies on three real-world datasets,and find that our BGNN outperforms the best baseline by 21.87%,18.49%,and 37.16%on average correspondingly.A series of further experiments and visualization studies demonstrate the rationality and effectiveness of our *** exploratory study on extending our BGNN to handle more than two types of behaviors show that our BGNN can e
Memory disaggregation, facilitated by Smart Network Interface Cards (SmartNICs), has emerged as a cost-effective approach for sharing memory resources in data centers. However, current SoC-based SmartNICs face several...
详细信息
Die-stacked dynamic random access memory(DRAM)caches are increasingly advocated to bridge the performance gap between the on-chip cache and the main *** fully realize their potential,it is essential to improve DRAM ca...
详细信息
Die-stacked dynamic random access memory(DRAM)caches are increasingly advocated to bridge the performance gap between the on-chip cache and the main *** fully realize their potential,it is essential to improve DRAM cache hit rate and lower its cache hit *** order to take advantage of the high hit-rate of set-association and the low hit latency of direct-mapping at the same time,we propose a partial direct-mapped die-stacked DRAM cache called *** design is motivated by a key observation,i.e.,applying a unified mapping policy to different types of blocks cannot achieve a high cache hit rate and low hit latency *** address this problem,P3DC classifies data blocks into leading blocks and following blocks,and places them at static positions and dynamic positions,respectively,in a unified set-associative *** also propose a replacement policy to balance the miss penalty and the temporal locality of different *** addition,P3DC provides a policy to mitigate cache thrashing due to block type *** results demonstrate that P3DC can reduce the cache hit latency by 20.5%while achieving a similar cache hit rate compared with typical set-associative caches.P3DC improves the instructions per cycle(IPC)by up to 66%(12%on average)compared with the state-of-the-art direct-mapped cache—BEAR,and by up to 19%(6%on average)compared with the tag-data decoupled set-associative cache—DEC-A8.
Temporal Graph Neural Network (TGNN) has attracted much research attention because it can capture the dynamic nature of complex networks. However, existing solutions suffer from redundant computation overhead and exce...
详细信息
With the merits of high productivity and ease of use, highlevel synthesis (HLS) tools bring hope to fast FPGA-based architecture development. However, their usability and popularity are still limited due to lack of su...
详细信息
Graph processing has been widely used in many scenarios,from scientific computing to artificial *** processing exhibits irregular computational parallelism and random memory accesses,unlike traditional ***,running gra...
详细信息
Graph processing has been widely used in many scenarios,from scientific computing to artificial *** processing exhibits irregular computational parallelism and random memory accesses,unlike traditional ***,running graph processing workloads on conventional architectures(e.g.,CPUs and GPUs)often shows a significantly low compute-memory ratio with few performance benefits,which can be,in many cases,even slower than a specialized single-thread graph *** domain-specific hardware designs are essential for graph processing,it is still challenging to transform the hardware capability to performance boost without coupled software *** article presents a graph processing ecosystem from hardware to *** start by introducing a series of hardware accelerators as the foundation of this ***,the codesigned parallel graph systems and their distributed techniques are presented to support graph ***,we introduce our efforts on novel graph applications and hardware *** results show that various graph applications can be efficiently accelerated in this graph processing ecosystem.
In the realm of recommendation systems, achieving real-time performance in embedding similarity tasks is often hindered by the limitations of traditional Top-K sparse matrix-vector multiplication (SpMV) methods, which...
详细信息
data race is one of the most important concurrent anomalies in multi-threaded *** con-straint-based techniques are leveraged into race detection,which is able to find all the races that can be found by any oth-er soun...
详细信息
data race is one of the most important concurrent anomalies in multi-threaded *** con-straint-based techniques are leveraged into race detection,which is able to find all the races that can be found by any oth-er sound race ***,this constraint-based approach has serious limitations on helping programmers analyze and understand data ***,it may report a large number of false positives due to the unrecognized dataflow propa-gation of the ***,it recommends a wide range of thread context switches to schedule the reported race(in-cluding the false one)whenever this race is exposed during the constraint-solving *** ad hoc recommendation imposes too many context switches,which complicates the data race *** address these two limitations in the state-of-the-art constraint-based race detection,this paper proposes DFTracker,an improved constraint-based race detec-tor to recommend each data race with minimal thread context ***,we reduce the false positives by ana-lyzing and tracking the dataflow in the *** this means,DFTracker thus reduces the unnecessary analysis of false race *** further propose a novel algorithm to recommend an effective race schedule with minimal thread con-text switches for each data *** experimental results on the real applications demonstrate that 1)without removing any true data race,DFTracker effectively prunes false positives by 68%in comparison with the state-of-the-art constraint-based race detector;2)DFTracker recommends as low as 2.6-8.3(4.7 on average)thread context switches per data race in the real world,which is 81.6%fewer context switches per data race than the state-of-the-art constraint based race ***,DFTracker can be used as an effective tool to understand the data race for programmers.
Sparse general matrix-matrix multiplication is widely used in data mining applications. Its irregular memory access patterns limit the performance of general-purpose processors, thus motivating many FPGA-based hardwar...
详细信息
The memory-intensive embedding layer in recommendation model continues to be the performance bottleneck. While prior works have attempted to improve the embedding layer performance by exploiting the data locality to c...
详细信息
暂无评论