Recent years have witnessed significant progress in developing deep learning-based models for automated code completion. Examples of such models include CodeGPT and StarCoder. These models are typically trained from a...
详细信息
Recent years have witnessed significant progress in developing deep learning-based models for automated code completion. Examples of such models include CodeGPT and StarCoder. These models are typically trained from a large amount of source code collected from open-source communities such as GitHub. Although using source code in GitHub has been a common practice for training deep-learning-based models for code completion, it may induce some legal and ethical issues such as copyright infringement. In this paper, we investigate the legal and ethical issues of current neural code completion models by answering the following question: Is my code used to train your neural code completion model? To this end, we tailor a membership inference approach (termed CodeMI) that was originally crafted for classification tasks to a more challenging task of code completion. In particular, since the target code completion models perform as opaque black boxes, preventing access to their training data and parameters, we opt to train multiple shadow models to mimic their behavior. The acquired posteriors from these shadow models are subsequently employed to train a membership classifier. Subsequently, the membership classifier can be effectively employed to deduce the membership status of a given code sample based on the output of a target code completion model. We comprehensively evaluate the effectiveness of this adapted approach across a diverse array of neural code completion models, (i.e., LSTM-based, CodeGPT, CodeGen, and StarCoder). Experimental results reveal that the LSTM-based and CodeGPT models suffer the membership leakage issue, which can be easily detected by our proposed membership inference approach with an accuracy of 0.842, and 0.730, respectively. Interestingly, our experiments also show that the data membership of current large language models of code, e.g., CodeGen and StarCoder, is difficult to detect, leaving amper space for further improvement. Finally, we also t
Graph convolutional network (GCN) has achieved enormous success in learning structural information from unstructured data. As graphs become increasingly large, distributed training for GCNs is severely prolonged by fr...
详细信息
ISBN:
(数字)9798350383508
ISBN:
(纸本)9798350383515
Graph convolutional network (GCN) has achieved enormous success in learning structural information from unstructured data. As graphs become increasingly large, distributed training for GCNs is severely prolonged by frequent cross-worker communications. Existing efforts to improve the training efficiency often come at the expense of GCN performance, while the communication overhead persists. In this paper, we propose PSC-GCN, a holistic pipelined framework for distributed GCN training with communication-efficient sampling and inclusion-aware caching, to address the communication bottleneck while ensuring satisfactory model performance. Specifically, we devise an asynchronous pre-fetching scheme to retrieve stale statistics (features, embedding, gradient) of boundary nodes in advance, such that the embedding aggregation and model update are pipelined with statistics transmission. To alleviate communication volume and staleness effect, we introduce a variance-reduction based sampling policy, which prioritizes inner nodes over boundary ones for reducing the access frequency to remote neighbors, thus mitigating cross-worker statistics exchange. Complementing graph sampling, a feature caching module is co-designed to buffer hot nodes with high inclusion probability, ensuring that frequently sampled nodes will be available in local memory. Extensive evaluations on real-world datasets show the superiority of PSC-GCN over state-of-the-art methods, where we can reduce training time by 72%-80% without sacrificing model accuracy.
Graph mining aims to explore interesting structural information of a graph. Pattern-centric systems typically transform a generic-purpose graph mining problem into a series of subgraph matching problems for high perfo...
详细信息
ISBN:
(纸本)9781665442787
Graph mining aims to explore interesting structural information of a graph. Pattern-centric systems typically transform a generic-purpose graph mining problem into a series of subgraph matching problems for high performance. Existing pattern-centric mining systems reduce the substantial search space towards a single pattern by exploring a highly-optimized matching order, but inherent computational redundancies of such a matching order itself still suffer severely, leading to significant performance degradation. The key innovation of this work lies in a general redundancy criterion that characterizes computational redundancies arising in not only handing a single pattern but also matching multiple patterns simultaneously. In this paper, we present SumPA, a high-performance pattern-centric graph mining system that can sufficiently remove redundant computations for any complex graph mining problems. SumPA features three key designs: (1) a pattern abstraction technique that can simplify numerous complex patterns into a few simple abstract patterns based on pattern similarity, (2) abstraction-guided pattern matching that completely eliminates (totally and partially) redundant computations during subgraph enumeration, and (3) a suite of system optimizations to maximize storage and computation efficiency. Our evaluation on a wide variety of real-world graphs shows that SumPA outperforms the two state-of-the-art systems Peregrine and GraphPi by up to 61.89× and 8.94×, respectively. For many mining problems on large graphs, Peregrine takes hours or even days while SumPA finishes in only a few minutes.
With the rapid growth of real-world graphs,the size of which can easily exceed the on-chip(board)storage capacity of an accelerator,processing large-scale graphs on a single Field Programmable Gate Array(FPGA)becomes ...
详细信息
With the rapid growth of real-world graphs,the size of which can easily exceed the on-chip(board)storage capacity of an accelerator,processing large-scale graphs on a single Field Programmable Gate Array(FPGA)becomes *** multi-FPGA acceleration is of great necessity and *** cloud providers(e.g.,Amazon,Microsoft,and Baidu)now expose FPGAs to users in their data centers,providing opportunities to accelerate large-scale graph *** this paper,we present a communication library,called FDGLib,which can easily scale out any existing single FPGA-based graph accelerator to a distributed version in a data center,with minimal hardware engineering *** provides six APIs that can be easily used and integrated into any FPGA-based graph accelerator with only a few lines of code *** the torus-based FPGA interconnection in data centers,FDGLib also improves communication efficiency using simple yet effective torus-friendly graph partition and placement *** interface FDGLib into AccuGraph,a state-of-the-art graph *** results on a 32-node Microsoft Catapult-like data center show that the distributed AccuGraph can be 2.32x and 4.77x faster than a state-of-the-art distributed FPGA-based graph accelerator ForeGraph and a distributed CPU-based graph system Gemini,with better scalability.
Individuals and businesses have been significantly benefited by Large Language Models (LLMs) including PaLM, Gemini and ChatGPT in various ways. For example, LLMs enhance productivity, reduce costs, and enable us to f...
详细信息
Federated Learning (FL) is increasingly adopted in edge computing scenarios, where a large number of heterogeneous clients operate under constrained or sufficient resources. The iterative training process in conventio...
详细信息
Large Language Models (LLMs) have achieved significant performance in various natural language processing tasks but also pose safety and ethical threats, thus requiring red teaming and alignment processes to bolster t...
Transactional stream processing engines (TSPEs) have gained increasing attention due to their capability of processing real-time stream applications with transactional semantics. However, TSPEs remain susceptible to s...
详细信息
ISBN:
(数字)9798350317152
ISBN:
(纸本)9798350317169
Transactional stream processing engines (TSPEs) have gained increasing attention due to their capability of processing real-time stream applications with transactional semantics. However, TSPEs remain susceptible to system failures and power outages. Existing TSPEs mainly focus on performance improvement, but still face a significant challenge to guarantee fault tolerance while offering high-performance services. We revisit commonly-used fault tolerance approaches in stream processing and database systems, and find that these approaches do not work well on TSPEs due to complex data dependencies. In this paper, we propose a novel TSPE called MorphStreamR to achieve fast failure recovery while guaranteeing low performance overhead at runtime. The key idea of MorphStreamR is to record intermediate results of resolved dependencies at runtime, and thus eliminate data dependencies to improve task parallelism during failure recovery. MorphStreamR further mitigates the runtime overhead by selectively tracking data dependencies and incorporating workload-aware log commitment. Experimental results show that MorphStreamR can significantly reduce the recovery time by up to 3.1 x while experiencing much less performance slowdown at runtime, compared with other applicable fault tolerance approaches.
Directed Acyclic Graph (DAG)-based blockchain (a.k.a distributed ledger) has become prevalent for supporting highly concurrent applications. Its inherent parallel data structure accelerates block generation significan...
详细信息
ISBN:
(数字)9798350317152
ISBN:
(纸本)9798350317169
Directed Acyclic Graph (DAG)-based blockchain (a.k.a distributed ledger) has become prevalent for supporting highly concurrent applications. Its inherent parallel data structure accelerates block generation significantly, shifting the bottleneck from performance to storage scalability. An intuitive solution is to apply state sharding that divides the entire ledger (i.e., transactions and states) into multiple shards. While each node only stores proportional transactions, it suffers from the challenges of storing and ensuring the processing consistency of cross-shard transactions. In this paper, we propose SharDAG, a new mechanism that leverages adaptive sharding for DAG-based blockchains to achieve high performance and strong consistency. The key idea of SharDAG is to exploit unique characteristics - silent assets - and design a lightweight processing mechanism based on avatar account caching. Furthermore, we design a Byzantine resilient cross-shard verification mechanism with a theoretically optimal number of participating nodes, which guarantees the consistency and security of avatar account aggregation. Our comprehensive evaluations on real-world workloads demonstrate that SharDAG presents up to 3.8 x throughput improvement compared to the state-of-the-art and reduces the storage overhead of cross-shard transactions.
In this paper, we propose efficient distributed algorithms for three holistic aggregation functions on random regular graphs that are good candidates for network topology in next-generation data *** three holistic agg...
详细信息
In this paper, we propose efficient distributed algorithms for three holistic aggregation functions on random regular graphs that are good candidates for network topology in next-generation data *** three holistic aggregation functions include SELECTION(select the k-th largest or smallest element),DISTINCT(query the count of distinct elements), MODE(query the most frequent element). We design three basic techniques — Pre-order Network Partition, Pairwise-independent Random Walk, and Random Permutation Delivery, and devise the algorithms based on the techniques. The round complexity of the distributed SELECTION is Θ(log N) which meets the lower bound where N is the number of nodes and each node holds a numeric element. The round complexity of the distributed DISTINCT and MODE algorithms are O(log3N/log log N) and O(log2N log log N) respectively. All of our results break the lower bounds obtained on general graphs and our distributed algorithms are all based on the CON GE S T model, which restricts each node to send only O(log N) bits on each edge in one round under synchronous communications.
暂无评论