The key-value separation is renowned for its significant mitigation of the write amplification inherent in traditional LSM trees. However, KV separation potentially increases performance overhead in the management of ...
详细信息
Graph neural networks(GNNs)have gained traction and have been applied to various graph-based data analysis tasks due to their high ***,a major concern is their robustness,particularly when faced with graph data that h...
详细信息
Graph neural networks(GNNs)have gained traction and have been applied to various graph-based data analysis tasks due to their high ***,a major concern is their robustness,particularly when faced with graph data that has been deliberately or accidentally polluted with *** presents a challenge in learning robust GNNs under noisy *** address this issue,we propose a novel framework called Soft-GNN,which mitigates the influence of label noise by adapting the data utilized in *** approach employs a dynamic data utilization strategy that estimates adaptive weights based on prediction deviation,local deviation,and global *** better utilizing significant training samples and reducing the impact of label noise through dynamic data selection,GNNs are trained to be more *** evaluate the performance,robustness,generality,and complexity of our model on five real-world datasets,and our experimental results demonstrate the superiority of our approach over existing methods.
Approximate nearest neighbor search (ANNS) has emerged as a crucial component of database and AI infrastructure. Ever-increasing vector datasets pose significant challenges in terms of performance, cost, and accuracy ...
详细信息
The key-value separation is renowned for its significant mitigation of the write amplification inherent in traditional LSM trees. However, KV separation potentially increases performance overhead in the management of ...
ISBN:
(纸本)9781939133458
The key-value separation is renowned for its significant mitigation of the write amplification inherent in traditional LSM trees. However, KV separation potentially increases performance overhead in the management of Value region, especially for garbage collection (GC) operation that is used to reduce the redundant space occupation. In response, many efforts have been made to optimize the GC mechanism for KV separation. However, our analysis indicates that such solution based on trade-offs between CPU and I/O overheads cannot simultaneously satisfy the three requirements of KV separated systems in terms of throughput, tail latency, and space usage. This limitation hinders their real-world *** this paper, we introduce AegonKV, a "three-birds-one-stone" solution that comprehensively enhances the throughput, tail latency, and space usage of KV separated systems. AegonKV first proposes a SmartSSD-based GC offloading mechanism to enable asynchronous GC operations without competing with LSM read/write for bandwidth or CPU. AegonKV leverages offload-friendly data structures and hardware/ software execution logic to address the challenges of GC offloading. Experiments demonstrate that AegonKV achieves the largest throughput improvement of 1.28-3.3 times, a significant reduction of 37%-66% in tail latency, and 15%-85% in space overhead compared to existing KV separated systems.
Graph mining aims to explore interesting structural information of a graph. Pattern-centric systems typically transform a generic-purpose graph mining problem into a series of subgraph matching problems for high perfo...
详细信息
ISBN:
(纸本)9781665442787
Graph mining aims to explore interesting structural information of a graph. Pattern-centric systems typically transform a generic-purpose graph mining problem into a series of subgraph matching problems for high performance. Existing pattern-centric mining systems reduce the substantial search space towards a single pattern by exploring a highly-optimized matching order, but inherent computational redundancies of such a matching order itself still suffer severely, leading to significant performance degradation. The key innovation of this work lies in a general redundancy criterion that characterizes computational redundancies arising in not only handing a single pattern but also matching multiple patterns simultaneously. In this paper, we present SumPA, a high-performance pattern-centric graph mining system that can sufficiently remove redundant computations for any complex graph mining problems. SumPA features three key designs: (1) a pattern abstraction technique that can simplify numerous complex patterns into a few simple abstract patterns based on pattern similarity, (2) abstraction-guided pattern matching that completely eliminates (totally and partially) redundant computations during subgraph enumeration, and (3) a suite of system optimizations to maximize storage and computation efficiency. Our evaluation on a wide variety of real-world graphs shows that SumPA outperforms the two state-of-the-art systems Peregrine and GraphPi by up to 61.89× and 8.94×, respectively. For many mining problems on large graphs, Peregrine takes hours or even days while SumPA finishes in only a few minutes.
Approximate nearest neighbor search (ANNS) has emerged as a crucial component of database and AI infrastructure Ever-increasing vector datasets pose significant challenges in terms of performance, cost, and accuracy f...
详细信息
Existing FPGA-based graph accelerators, typically designed for static graphs, rarely handle dynamic graphs that often involve substantial graph updates (e.g., edge/node insertion and deletion) over time. In this paper...
详细信息
Out-of-distribution (OOD) detection is crucial for developing trustworthy and reliable machine learning systems. Recent advances in training with auxiliary OOD data demonstrate efficacy in enhancing detection capabili...
详细信息
Hybrid pull-push computational model can provide compelling results over either of single one for processing real-world *** and pipeline parallelism of FPGAs make it potential to process different stages of graph ***,...
详细信息
Hybrid pull-push computational model can provide compelling results over either of single one for processing real-world *** and pipeline parallelism of FPGAs make it potential to process different stages of graph ***,considering the limited on-chip resources and streamline pipeline computation,the efficiency of hybrid model on FPGAs often suffers due to well-known random access feature of graph *** this paper,we present a hybrid graph processing system on FPGAs,which can achieve the best of both *** approach on FPGAs is unique and novel as ***,we propose to use edge block(consisting of edges with the same destination vertex set),which allows to sequentially access edges at block granularity for locality while still preserving the *** to the independence of blocks in the sense that all edges in an inactive block are associated with inactive vertices,this also enables to skip invalid blocks for reducing redundant ***,we consider a large number of vertices and their associated edge-blocks to maintain a predictable execution *** also present to switch models in advance with few stalls using their state *** evaluation on a wide variety of graph algorithms for many real-world graphs shows that our approach achieves up to 3.69x speedup over state-of-the-art FPGA-based graph processing systems.
Deep learning has gained tremendous success in various fields while training deep neural networks(DNNs) is very compute-intensive, which results in numerous deep learning frameworks that aim to offer better usability ...
详细信息
Deep learning has gained tremendous success in various fields while training deep neural networks(DNNs) is very compute-intensive, which results in numerous deep learning frameworks that aim to offer better usability and higher performance to deep learning practitioners. Tensor Flow and Py Torch are the two most popular frameworks. Tensor Flow is more promising within the industry context, while Py Torch is more appealing in academia. However, these two frameworks differ much owing to the opposite design philosophy:static vs dynamic computation graph. Tensor Flow is regarded as being more performance-friendly as it has more opportunities to perform optimizations with the full view of the computation graph. However, there are also claims that Py Torch is faster than Tensor Flow sometimes, which confuses the end-users on the choice between them. In this paper, we carry out the analytical and experimental analysis to unravel the mystery of comparison in training speed on single-GPU between Tensor Flow and Py Torch. To ensure that our investigation is as comprehensive as possible, we carefully select seven popular neural networks, which cover computer vision, speech recognition, and natural language processing(NLP). The contributions of this work are two-fold. First, we conduct the detailed benchmarking experiments on Tensor Flow and Py Torch and analyze the reasons for their performance difference. This work provides the guidance for the end-users to choose between these two frameworks. Second, we identify some key factors that affect the performance,which can direct the end-users to write their models more efficiently.
暂无评论