Out-of-distribution (OOD) detection is crucial for developing trustworthy and reliable machine learning systems. Recent advances in training with auxiliary OOD data demonstrate efficacy in enhancing detection capabili...
详细信息
Existing FPGA-based graph accelerators, typically designed for static graphs, rarely handle dynamic graphs that often involve substantial graph updates (e.g., edge/node insertion and deletion) over time. In this paper...
详细信息
Federated continual learning (FCL) has received increasing attention due to its potential in handling real-world streaming data, characterized by evolving data distributions and varying client classes over time. The c...
详细信息
Approximate nearest neighbor search (ANNS) has emerged as a crucial component of database and AI infrastructure Ever-increasing vector datasets pose significant challenges in terms of performance, cost, and accuracy f...
详细信息
Hybrid pull-push computational model can provide compelling results over either of single one for processing real-world *** and pipeline parallelism of FPGAs make it potential to process different stages of graph ***,...
详细信息
Hybrid pull-push computational model can provide compelling results over either of single one for processing real-world *** and pipeline parallelism of FPGAs make it potential to process different stages of graph ***,considering the limited on-chip resources and streamline pipeline computation,the efficiency of hybrid model on FPGAs often suffers due to well-known random access feature of graph *** this paper,we present a hybrid graph processing system on FPGAs,which can achieve the best of both *** approach on FPGAs is unique and novel as ***,we propose to use edge block(consisting of edges with the same destination vertex set),which allows to sequentially access edges at block granularity for locality while still preserving the *** to the independence of blocks in the sense that all edges in an inactive block are associated with inactive vertices,this also enables to skip invalid blocks for reducing redundant ***,we consider a large number of vertices and their associated edge-blocks to maintain a predictable execution *** also present to switch models in advance with few stalls using their state *** evaluation on a wide variety of graph algorithms for many real-world graphs shows that our approach achieves up to 3.69x speedup over state-of-the-art FPGA-based graph processing systems.
作者:
Xiaofan BaiChaoxiang HeXiaojing MaBin Benjamin ZhuHai JinSchool of Cyber Science and Engineering
Huazhong University of Science and Technology National Engineering Research Center for Big Data Technology and System and Services Computing Technology and System Lab and Hubei Engineering Research Center on Big Data Security and Hubei Key Laboratory of Distributed System Security MicrosoftSchool of Computer Science and Technology
Huazhong University of Science and Technology and National Engineering Research Center for Big Data Technology and System and Services Computing Technology and System Lab and Cluster and Grid Computing Lab.
Cloud-based AI services offer numerous benefits but also introduce vulnerabilities, allowing for tampering with deployed DNN models, ranging from injecting malicious behaviors to reducing computing resources. Fingerpr...
Cloud-based AI services offer numerous benefits but also introduce vulnerabilities, allowing for tampering with deployed DNN models, ranging from injecting malicious behaviors to reducing computing resources. Fingerprint samples are generated to query models to detect such tampering. In this paper, we present Intersecting-Boundary-Sensitive Fingerprinting (IBSF), a novel method for black-box integrity verification of DNN models using only top-1 lab.ls. Recognizing that tampering with a model alters its decision boundary, IBSF crafts fingerprint samples from normal samples by maximizing the partial Shannon entropy of a selected subset of categories to position the fingerprint samples near decision boundaries where the categories in the subset intersect. These fingerprint samples are almost indistinguishable from their source samples. We theoretically establish and confirm experimentally that these fingerprint samples' expected sensitivity to tampering increases with the cardinality of the subset. Extensive evaluation demonstrates that IBSF surpasses existing state-of-the-art fingerprinting methods, particularly with larger subset cardinality, establishing its state-of-the-art performance in black-box tampering detection using only top-1 lab.ls. The IBSF code is availab.e at: https://***/CGCL-codes/IBSF.
Existing privacy-preserving approaches are generally designed to provide privacy guarantee for individual data in a database, which reduces the utility of the database for data analysis. In this paper, we propose a no...
详细信息
Hypergraph Neural Network (HyperGNN) has emerged as a potent methodology for dissecting intricate multilateral connections among various entities. Current software/hardware solutions leverage a sequential execution mo...
详细信息
ISBN:
(数字)9798350350579
ISBN:
(纸本)9798350350586
Hypergraph Neural Network (HyperGNN) has emerged as a potent methodology for dissecting intricate multilateral connections among various entities. Current software/hardware solutions leverage a sequential execution model that relies on hyperedge and vertex indices for conducting standard matrix operations for HyperGNN inference. Yet, they are impeded by the dual challenges of redundant computation and irregular memory access overheads. This is primarily due to the frequent and repetitive access and updating of a number of feature vectors corresponding to the same hyperedges and vertices. To address these challenges, we propose the first redundancy-aware accelerator, RAHP, which enables high performance execution of HyperGNN inference. Specifically, we present a redundancy-aware asynchronous execution approach into the accelerator design for HyperGNN to reduce redundant computations and off-chip memory accesses. To unveil opportunities for data reuse and unlock the parallelism that existing HyperGNN solutions fail to capture, it prioritizes vertices with the highest degree as roots, prefetching other vertices along the hypergraph structure to capture the common vertices among multiple hyperedges, and synchronizing the computations of hyperedges and vertices in real-time. By such means, this facilitates the concurrent processing of relevant hyperedge and vertex computations of the common vertices along the hypergraph topology, resulting in smaller redundant computations overhead. Furthermore, by efficiently caching intermediate results of the common vertices, it curtails memory traffic and off-chip communications. To fully harness the performance potential of our proposed approach in the accelerator, RAHP incorporates a topology-driven data loading mechanism to minimize off-chip memory accesses on the fly. It is also endowed with an adaptive data synchronization scheme to mitigate the effects of conflicting updates of both hyperedges and vertices. Moreover, RAHP emplo
Approximate nearest neighbor search (ANNS) has emerged as a crucial component of database and AI infrastructure. Ever-increasing vector datasets pose significant challenges in terms of performance, cost, and accuracy ...
ISBN:
(纸本)9781939133458
Approximate nearest neighbor search (ANNS) has emerged as a crucial component of database and AI infrastructure. Ever-increasing vector datasets pose significant challenges in terms of performance, cost, and accuracy for ANNS services. None of modern ANNS systems can address these issues simultaneously. In this paper, we present Fusion-ANNS, a high-throughput, low-latency, cost-efficient, and high-accuracy ANNS system for billion-scale datasets using SSDs and only one entry-level GPU. The key idea of Fusion-ANNS lies in CPU/GPU collab.rative filtering and reranking mechanisms, which significantly reduce I/O operations across CPUs, GPU, and SSDs to break through the I/O performance bottleneck. Specifically, we propose three novel designs: (1) multi-tiered indexing to avoid data swapping between CPUs and GPU, (2) heuristic re-ranking to eliminate unnecessary I/Os and computations while guaranteeing high accuracy, and (3) redundant-aware I/O deduplication to further improve I/O efficiency. We implement FusionANNS and compare it with the state-of-the-art SSD-based ANNS system-SPANN and GPU-accelerated in-memory ANNS system-RUMMY. Experimental results show that FusionANNS achieves 1) 9.4-13.1× higher query per second (QPS) and 5.7-8.8× higher cost efficiency compared with SPANN; 2) and 2-4.9× higher QPS and 2.3-6.8× higher cost efficiency compared with RUMMY, while guaranteeing low latency and high accuracy.
Evaluating and enhancing the general capabilities of large language models (LLMs) has been an important research topic. Graph is a common data structure in the real world, and understanding graph data is a crucial par...
详细信息
暂无评论