Pre-training a language model and then fine-tuning it has shown to be an efficient and effective technique for a wide range of code intelligence tasks, such as code generation, code summarization, and vulnerability de...
详细信息
As software engineering advances and the code demand rises, the prevalence of code clones has increased. This phenomenon poses risks like vulnerability propagation, underscoring the growing importance of code clone de...
详细信息
Community websites bring many conveniences to people, and the classification of community content is playing an important role in website management and information searching. As the carrier of community content, post...
详细信息
The real academic network belongs to a heterogeneous network, therefore, for the link prediction tasks, some information on the network may be lost if only using homogeneous network methods. In order to make good use ...
详细信息
Early diagnosis of osteonecrosis of the femoral head (ONFH) can inhibit the progression and improve femoral head preservation. The radiograph difference between early ONFH and healthy ones is not apparent to the naked...
详细信息
Knowledge hypergraphs generalize knowledge graphs using hyperedges to connect multiple entities and depict complicated relations. Existing methods either transform hyperedges into an easier-to-handle set of binary rel...
详细信息
Streaming Graph Pattern Mining (GPM) has been widely used in many application fields. However, the existing streaming GPM solution suffers from many unnecessary explorations and isomorphism tests, while the existing s...
ISBN:
(纸本)9798350323481
Streaming Graph Pattern Mining (GPM) has been widely used in many application fields. However, the existing streaming GPM solution suffers from many unnecessary explorations and isomorphism tests, while the existing static GPM ones require many repetitive operations to compute the full graph. In this paper, we propose a pattern-aware incremental execution approach and design the first streaming GPM accelerator called PSMiner, which integrates multiple optimizations to reduce redundant computation and improve computing efficiency. We have conducted extensive experiments. The results show that compared with the state-of-the-art software and hardware solutions, PSMiner achieves the average speedups of 770.9× and 60.4×, respectively.
Mobile and Web-of-Things (WoT) devices at the network edge account for more than half of the world's web traffic, making a great data source for various machine learning (ML) applications, particularly federated l...
详细信息
Modern storage systems typically replicate data on multiple servers to provide high reliability and availability. However, most commercially-deployed datastores often fail to offer low latency, high throughput, and st...
Modern storage systems typically replicate data on multiple servers to provide high reliability and availability. However, most commercially-deployed datastores often fail to offer low latency, high throughput, and strong consistency at the same time. This paper presents Whale, a Remote Direct Memory Access (RDMA) based primary-backup replication system for in-memory datastores. Whale achieves both low latency and strong consistency by decoupling metadata multicasting from data replication for all backup nodes, and using an optimistic commitment mechanism to respond to client write requests earlier. Whale achieves high throughput by propagating writes from the primary node to backup nodes asynchronously via RDMA-optimized chain replication. To further reduce the cost of data replication, we design a log-structured datastore to fully exploit the advantages of one-sided RDMA and Persistent Memory (PM). We implement Whale on a cluster equipped with PM and InfiniBand RDMA networks. Experimental results show that Whale achieves much higher throughput and lower latency than state-of-the-art replication protocols.
The key-value separation is renowned for its significant mitigation of the write amplification inherent in traditional LSM trees. However, KV separation potentially increases performance overhead in the management of ...
ISBN:
(纸本)9781939133458
The key-value separation is renowned for its significant mitigation of the write amplification inherent in traditional LSM trees. However, KV separation potentially increases performance overhead in the management of Value region, especially for garbage collection (GC) operation that is used to reduce the redundant space occupation. In response, many efforts have been made to optimize the GC mechanism for KV separation. However, our analysis indicates that such solution based on trade-offs between CPU and I/O overheads cannot simultaneously satisfy the three requirements of KV separated systems in terms of throughput, tail latency, and space usage. This limitation hinders their real-world *** this paper, we introduce AegonKV, a "three-birds-one-stone" solution that comprehensively enhances the throughput, tail latency, and space usage of KV separated systems. AegonKV first proposes a SmartSSD-based GC offloading mechanism to enable asynchronous GC operations without competing with LSM read/write for bandwidth or CPU. AegonKV leverages offload-friendly data structures and hardware/ software execution logic to address the challenges of GC offloading. Experiments demonstrate that AegonKV achieves the largest throughput improvement of 1.28-3.3 times, a significant reduction of 37%-66% in tail latency, and 15%-85% in space overhead compared to existing KV separated systems.
暂无评论