检索结果-内蒙古大学图书馆

23rd USENIX Conference on File and Storage Technologies, FAST 2025

作者： Duan, Zhuohui Feng, Hao Liu, Haikun Liao, Xiaofei Jin, Hai Li, Bangyu National Engineering Research Center for Big Data Technology and System Service Computing Technology and System Lab/Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology China

ISBN: (纸本)9781939133458

The key-value separation is renowned for its significant mitigation of the write amplification inherent in traditional LSM trees. However, KV separation potentially increases performance overhead in the management of Value region, especially for garbage collection (GC) operation that is used to reduce the redundant space occupation. In response, many efforts have been made to optimize the GC mechanism for KV separation. However, our analysis indicates that such solution based on trade-offs between CPU and I/O overheads cannot simultaneously satisfy the three requirements of KV separated systems in terms of throughput, tail latency, and space usage. This limitation hinders their real-world application. In this paper, we introduce AegonKV, a "three-birds-one-stone" solution that comprehensively enhances the throughput, tail latency, and space usage of KV separated systems. AegonKV first proposes a SmartSSD-based GC offloading mechanism to enable asynchronous GC operations without competing with LSM read/write for bandwidth or CPU. AegonKV leverages offload-friendly data structures and hardware/software execution logic to address the challenges of GC offloading. Experiments demonstrate that AegonKV achieves the largest throughput improvement of 1.28-3.3 times, a significant reduction of 37%-66% in tail latency, and 15%-85% in space overhead compared to existing KV separated systems. © 2025 FAST. All Rights Reserved.

关键词： Digital storage

来源：评论

学校读者我要写书评

暂无评论

Soft-GNN:towards robust graph neural networks via self-adaptive data utilization

引用

Frontiers of Computer Science 2025年第4期19卷 1-12页

作者： Yao WU Hong HUANG Yu SONG Hai JIN National Engineering Research Center for Big Data Technology and System Service Computing Technology and System LabCluster and Grid Computing LabSchool of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhan 430074China College of Information and Communication National University of Defense TechnologyWuhan 430019China Department of Computer Science and Operations Research Universitéde MontréalMontreal H3C 3J7Canada

Graph neural networks(GNNs)have gained traction and have been applied to various graph-based data analysis tasks due to their high ***,a major concern is their robustness,particularly when faced with graph data that has been deliberately or accidentally polluted with *** presents a challenge in learning robust GNNs under noisy *** address this issue,we propose a novel framework called Soft-GNN,which mitigates the influence of label noise by adapting the data utilized in *** approach employs a dynamic data utilization strategy that estimates adaptive weights based on prediction deviation,local deviation,and global *** better utilizing significant training samples and reducing the impact of label noise through dynamic data selection,GNNs are trained to be more *** evaluate the performance,robustness,generality,and complexity of our model on five real-world datasets,and our experimental results demonstrate the superiority of our approach over existing methods.

关键词： graph neural networks node classification label noise robustness

来源：评论

学校读者我要写书评

暂无评论

Towards High-throughput and Low-latency Billion-scale Vector Search via CPU/GPU Collaborative Filtering and Re-ranking 23

Towards High-throughput and Low-latency Billion-scale Vector...

引用

23rd USENIX Conference on File and Storage Technologies, FAST 2025

作者： Tian, Bing Liu, Haikun Tang, Yuhang Xiao, Shihai Duan, Zhuohui Liao, Xiaofei Jin, Hai Zhang, Xuecang Zhu, Junhua Zhang, Yu National Engineering Research Center for Big Data Technology and System Service Computing Technology and System Lab/Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology China Huawei Technologies Co. Ltd. China

ISBN: (纸本)9781939133458

Approximate nearest neighbor search (ANNS) has emerged as a crucial component of database and AI infrastructure. Ever-increasing vector datasets pose significant challenges in terms of performance, cost, and accuracy for ANNS services. None of modern ANNS systems can address these issues simultaneously. In this paper, we present FusionANNS, a high-throughput, low-latency, cost-efficient, and high-accuracy ANNS system for billion-scale datasets using SSDs and only one entry-level GPU. The key idea of FusionANNS lies in CPU/GPU collaborative filtering and re-ranking mechanisms, which significantly reduce I/O operations across CPUs, GPU, and SSDs to break through the I/O performance bottleneck. Specifically, we propose three novel designs: (1) multi-tiered indexing to avoid data swapping between CPUs and GPU, (2) heuristic re-ranking to eliminate unnecessary I/Os and computations while guaranteeing high accuracy, and (3) redundant-aware I/O deduplication to further improve I/O efficiency. We implement FusionANNS and compare it with the state-of-the-art SSD-based ANNS system-SPANN and GPU-accelerated in-memory ANNS system-RUMMY. Experimental results show that FusionANNS achieves 1) 9.4-13.1× higher query per second (QPS) and 5.7-8.8× higher cost efficiency compared with SPANN;2) and 2-4.9× higher QPS and 2.3-6.8× higher cost efficiency compared with RUMMY, while guaranteeing low latency and high accuracy. © 2025 FAST. All Rights Reserved.

关键词： Collaborative filtering

来源：评论

学校读者我要写书评

暂无评论

AegonKV: a high bandwidth, low tail latency, and low storage cost KV-separated LSM store with SmartSSD-based GC offloading 25

AegonKV: a high bandwidth, low tail latency, and low storage...

引用

Proceedings of the 23rd USENIX Conference on File and Storage Technologies

作者： Zhuohui Duan Hao Feng Haikun Liu Xiaofei Liao Hai Jin Bangyu Li National Engineering Research Center for Big Data Technology and System Service Computing Technology and System Lab/Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology China

ISBN: (纸本)9781939133458

The key-value separation is renowned for its significant mitigation of the write amplification inherent in traditional LSM trees. However, KV separation potentially increases performance overhead in the management of Value region, especially for garbage collection (GC) operation that is used to reduce the redundant space occupation. In response, many efforts have been made to optimize the GC mechanism for KV separation. However, our analysis indicates that such solution based on trade-offs between CPU and I/O overheads cannot simultaneously satisfy the three requirements of KV separated systems in terms of throughput, tail latency, and space usage. This limitation hinders their real-world *** this paper, we introduce AegonKV, a "three-birds-one-stone" solution that comprehensively enhances the throughput, tail latency, and space usage of KV separated systems. AegonKV first proposes a SmartSSD-based GC offloading mechanism to enable asynchronous GC operations without competing with LSM read/write for bandwidth or CPU. AegonKV leverages offload-friendly data structures and hardware/ software execution logic to address the challenges of GC offloading. Experiments demonstrate that AegonKV achieves the largest throughput improvement of 1.28-3.3 times, a significant reduction of 37%-66% in tail latency, and 15%-85% in space overhead compared to existing KV separated systems.

关键词：

来源：评论

学校读者我要写书评

暂无评论

SumPA: Efficient Pattern-Centric Graph Mining with Pattern Abstraction 21

SumPA: Efficient Pattern-Centric Graph Mining with Pattern A...

引用

Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques

作者： Chuangyi Gui Xiaofei Liao Long Zheng Pengcheng Yao Qinggang Wang Hai Jin National Engineering Research Center for Big Data Technology and System/Service Computing Technology and System Lab/Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology Wuhan China

ISBN: (纸本)9781665442787

Graph mining aims to explore interesting structural information of a graph. Pattern-centric systems typically transform a generic-purpose graph mining problem into a series of subgraph matching problems for high performance. Existing pattern-centric mining systems reduce the substantial search space towards a single pattern by exploring a highly-optimized matching order, but inherent computational redundancies of such a matching order itself still suffer severely, leading to significant performance degradation. The key innovation of this work lies in a general redundancy criterion that characterizes computational redundancies arising in not only handing a single pattern but also matching multiple patterns simultaneously. In this paper, we present SumPA, a high-performance pattern-centric graph mining system that can sufficiently remove redundant computations for any complex graph mining problems. SumPA features three key designs: (1) a pattern abstraction technique that can simplify numerous complex patterns into a few simple abstract patterns based on pattern similarity, (2) abstraction-guided pattern matching that completely eliminates (totally and partially) redundant computations during subgraph enumeration, and (3) a suite of system optimizations to maximize storage and computation efficiency. Our evaluation on a wide variety of real-world graphs shows that SumPA outperforms the two state-of-the-art systems Peregrine and GraphPi by up to 61.89× and 8.94×, respectively. For many mining problems on large graphs, Peregrine takes hours or even days while SumPA finishes in only a few minutes.

关键词： data reuse

来源：评论

学校读者我要写书评

暂无评论

FusionANNS: An Efficient CPU/GPU Cooperative Processing Architecture for Billion-scale Approximate Nearest Neighbor Search

arXiv

引用

arXiv 2024年

作者： Tian, Bing Liu, Haikun Tang, Yuhang Xiao, Shihai Duan, Zhuohui Liao, Xiaofei Zhang, Xuecang Zhu, Junhua Zhang, Yu National Engineering Research Center for Big Data Technology and System Service Computing Technology and System Lab/Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology China Huawei Technologies Co. Ltd China

Approximate nearest neighbor search (ANNS) has emerged as a crucial component of database and AI infrastructure Ever-increasing vector datasets pose significant challenges in terms of performance, cost, and accuracy for ANNS services None of modern ANNS systems can address these issues simultaneously. We present FusionANNS, a high-throughput, low-latency cost-efficient, and high-accuracy ANNS system for billion scale datasets using SSDs and only one entry-level GPU The key idea of FusionANNS lies in CPU/GPU collabo rative filtering and re-ranking mechanisms, which signifi cantly reduce I/O operations across CPUs, GPU, and SSDs to break through the I/O performance bottleneck. Specifically we propose three novel designs: (1) multi-tiered indexing to avoid data swapping between CPUs and GPU, (2) heuristic re-ranking to eliminate unnecessary I/Os and computations while guaranteeing high accuracy, and (3) redundant-aware I/O deduplication to further improve I/O efficiency. We imple ment FusionANNS and compare it with the state-of-the-art SSD-based ANNS system–SPANN and GPU-accelerated in memory ANNS system–RUMMY. Experimental results show that FusionANNS achieves 1) 9.4-13.1× higher query per second (QPS) and 5.7-8.8× higher cost efficiency compared with SPANN;2) and 2-4.9× higher QPS and 2.3-6.8× higher cost efficiency compared with RUMMY, while guaranteeing low latency and high accuracy. Copyright © 2024, The Authors. All rights reserved.

关键词： Nearest neighbor search

来源：评论

学校读者我要写书评

暂无评论

GraSU: A fast graph update library for fpga-based dynamic graph processing 21

GraSU: A fast graph update library for fpga-based dynamic gr...

引用

2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 2021

作者： Wang, Qinggang Zheng, Long Huang, Yu Yao, Pengcheng Gui, Chuangyi Liao, Xiaofei Jin, Hai Jiang, Wenbin Mao, Fubing National Engineering Research Center for Big Data Technology and System Service Computing Technology and System Lab Cluster and Grid Computing Lab Huazhong University of Science and Technology China

ISBN: (纸本)9781450382182

Existing FPGA-based graph accelerators, typically designed for static graphs, rarely handle dynamic graphs that often involve substantial graph updates (e.g., edge/node insertion and deletion) over time. In this paper, we aim to fill this gap. The key innovation of this work is to build an FPGA-based dynamic graph accelerator easily from any off-the-shelf static graph accelerator with minimal hardware engineering efforts (rather than from scratch). We observe\em spatial similarity of dynamic graph updates in the sense that most of graph updates get involved with only a small fraction of vertices. We therefore propose an FPGA library, called GraSU, to exploit spatial similarity for fast graph updates. GraSU uses a differential data management, which retains the high-value data (that will be frequently accessed) in the specialized on-chip UltraRAM while the overwhelming majority of low-value ones reside in the off-chip memory. Thus, GraSU can transform most of off-chip communications arising in dynamic graph updates into fast on-chip memory accesses. Our experiences show that GraSU can be easily integrated into existing state-of-the-art static graph accelerators with only 11 lines of code modifications. Our implementation atop AccuGraph using a Xilinx Alveo#8482;\U250 board outperforms two state-of-the-art CPU-based dynamic graph systems, Stinger and Aspen, by an average of 34.24× and 4.42× in terms of update throughput, improving further overall efficiency by 9.80× and 3.07× on average. © 2021 ACM.

关键词： Field programmable gate arrays (FPGA)

来源：评论

学校读者我要写书评

暂无评论

OUTLIER SYNTHESIS VIA HAMILTONIAN MONTE CARLO FOR OUT-OF-DISTRIBUTION DETECTION

arXiv

引用

arXiv 2025年

作者： Li, Hengzhuang Zhang, Teng National Engineering Research Center for Big Data Technology and System Service Computing Technology and Systems Laboratory Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology Wuhan China

Out-of-distribution (OOD) detection is crucial for developing trustworthy and reliable machine learning systems. Recent advances in training with auxiliary OOD data demonstrate efficacy in enhancing detection capabilities. Nonetheless, these methods heavily rely on acquiring a large pool of high-quality natural outliers. Some prior methods try to alleviate this problem by synthesizing virtual outliers but suffer from either poor quality or high cost due to the monotonous sampling strategy and the heavy-parameterized generative models. In this paper, we overcome all these problems by proposing the Hamiltonian Monte Carlo Outlier Synthesis (HamOS) framework, which views the synthesis process as sampling from Markov chains. Based solely on the in-distribution data, the Markov chains can extensively traverse the feature space and generate diverse and representative outliers, hence exposing the model to miscellaneous potential OOD scenarios. The Hamiltonian Monte Carlo with sampling acceptance rate almost close to 1 also makes our framework enjoy great efficiency. By empirically competing with SOTA baselines on both standard and large-scale benchmarks, we verify the efficacy and efficiency of our proposed HamOS. Our code is available at: https://***/Fir-lat/HamOS_OOD. © 2025, CC BY.

关键词： Markov chains

来源：评论

学校读者我要写书评

暂无评论

Efficient FPGA-based graph processing with hybrid pull-push computational model

引用

Frontiers of Computer Science 2020年第4期14卷 13-28页

作者： Chengbo YANG Long ZHENG Chuangyi GUI Hai JIN National Engineering Research Center for Big Data Technology and System/Service Computing Technology and System Lab/Cluster and Grid Computing Lab School of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhan430074China

Hybrid pull-push computational model can provide compelling results over either of single one for processing real-world *** and pipeline parallelism of FPGAs make it potential to process different stages of graph ***,considering the limited on-chip resources and streamline pipeline computation,the efficiency of hybrid model on FPGAs often suffers due to well-known random access feature of graph *** this paper,we present a hybrid graph processing system on FPGAs,which can achieve the best of both *** approach on FPGAs is unique and novel as ***,we propose to use edge block(consisting of edges with the same destination vertex set),which allows to sequentially access edges at block granularity for locality while still preserving the *** to the independence of blocks in the sense that all edges in an inactive block are associated with inactive vertices,this also enables to skip invalid blocks for reducing redundant ***,we consider a large number of vertices and their associated edge-blocks to maintain a predictable execution *** also present to switch models in advance with few stalls using their state *** evaluation on a wide variety of graph algorithms for many real-world graphs shows that our approach achieves up to 3.69x speedup over state-of-the-art FPGA-based graph processing systems.

关键词： graph processing efficiency computational model FPGAs

来源：评论

学校读者我要写书评

暂无评论

Reveal training performance mystery between Tensor Flow and PyTorch in the single GPU environment

引用

Science China(Information Sciences) 2022年第1期65卷 147-163页

作者： Hulin DAI Xuan PENG Xuanhua SHI Ligang HE Qian XIONG Hai JIN National Engineering Research Center for Big Data Technology and System Service Computing Technology and System LabSchool of Computer Science and Technology Huazhong University of Science and Technology Department of Computer Science University of Warwick

Deep learning has gained tremendous success in various fields while training deep neural networks(DNNs) is very compute-intensive, which results in numerous deep learning frameworks that aim to offer better usability and higher performance to deep learning practitioners. Tensor Flow and Py Torch are the two most popular frameworks. Tensor Flow is more promising within the industry context, while Py Torch is more appealing in academia. However, these two frameworks differ much owing to the opposite design philosophy:static vs dynamic computation graph. Tensor Flow is regarded as being more performance-friendly as it has more opportunities to perform optimizations with the full view of the computation graph. However, there are also claims that Py Torch is faster than Tensor Flow sometimes, which confuses the end-users on the choice between them. In this paper, we carry out the analytical and experimental analysis to unravel the mystery of comparison in training speed on single-GPU between Tensor Flow and Py Torch. To ensure that our investigation is as comprehensive as possible, we carefully select seven popular neural networks, which cover computer vision, speech recognition, and natural language processing(NLP). The contributions of this work are two-fold. First, we conduct the detailed benchmarking experiments on Tensor Flow and Py Torch and analyze the reasons for their performance difference. This work provides the guidance for the end-users to choose between these two frameworks. Second, we identify some key factors that affect the performance,which can direct the end-users to write their models more efficiently.

关键词： deep learning performance comparison TensorFlow PyTorch

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：