Within the realm of softwareengineering, specialized tasks on code, such as program repair, present unique challenges, necessitating fine-tuning Large language models (LLMs) to unlock state-of-the-art performance. Fi...
详细信息
Within the realm of softwareengineering, specialized tasks on code, such as program repair, present unique challenges, necessitating fine-tuning Large language models (LLMs) to unlock state-of-the-art performance. Fine-tuning approaches proposed in the literature for LLMs on program repair tasks generally overlook the need to reason about the logic behind code changes, beyond syntactic patterns in the data. High-performing fine-tuning experiments also usually come at very high computational costs. With MORepair, we propose a novel perspective on the learning focus of LLM fine-tuning for program repair: we not only adapt the LLM parameters to the syntactic nuances of the task of code transformation (objective ➊), but we also specifically fine-tune the LLM with respect to the logical reason behind the code change in the training data (objective ➋). Such a multi-objective fine-tuning will instruct LLMs to generate high-quality *** apply MORepair to fine-tune four open-source LLMs with different sizes and architectures. Experimental results on function-level and repository-level repair benchmarks show that the implemented fine-tuning effectively boosts LLM repair performance by 11.4% to 56.0%. We further show that our fine-tuning strategy yields superior performance compared to the state-of-the-art approaches, including standard fine-tuning, Fine-tune-CoT, and RepairLLaMA.
Solid State Drives (SSDs) based on the NVMe Zoned Namespaces (ZNS) interface can notably reduce the costs of address mapping, garbage collection, and over-provisioning by dividing the storage space into multiple zones...
详细信息
Solid State Drives (SSDs) based on the NVMe Zoned Namespaces (ZNS) interface can notably reduce the costs of address mapping, garbage collection, and over-provisioning by dividing the storage space into multiple zones for sequential writes and random reads. The Log-Structured Merge (LSM) tree, which is extensively used in key-value storage systems, converts random writes to sequential writes, hence a suitable scenario to utilize ZNS SSDs. However, LSM tree associated data significantly varies in lifetime due to the levels and merging mechanisms of the LSM tree. Therefore, without an accurate method to estimate data lifetime, data with disparate lifetimes may be placed in the same zone, thus causing low space utilization and high write amplification within the *** address these issues, the paper proposes two data overlapping aware optimizations to realize intelligent data placement: a zone allocation scheme and a garbage collection scheme. The key technique of these optimizations is an accurate data-lifetime estimation by considering both the associated tree level of the data and the data overlapping ratio between the data and those in the neighboring level. Using the estimation technique, the zone allocation optimization can place data with similar lifetimes in the same zone. Besides, the garbage collection optimization can reclaim zones in an adaptive manner based on overlapping ratios to reduce the amount of data migration. Experimental results demonstrate that the optimization schemes effectively reduce garbage collection-incurred data copy by average factors of 2.11 × and 1.50 × in comparison to a conventional work and a state-of-the-art work, respectively. Consequently, the proposed work successfully alleviates the write amplification effect by 18% and 6%, compared to the conventional work and the state-of-the-art work, respectively.
Distributed Collaborative Machine Learning (DCML) has emerged in artificial intelligence-empowered edge computing environments, such as the Industrial Internet of Things (IIoT), to process tremendous data generated by...
详细信息
Distributed Collaborative Machine Learning (DCML) has emerged in artificial intelligence-empowered edge computing environments, such as the Industrial Internet of Things (IIoT), to process tremendous data generated by smart devices. However, parallel DCML frameworks require resource-constrained devices to update the entire Deep Neural Network (DNN) models and are vulnerable to reconstruction attacks. Concurrently, the serial DCML frameworks suffer from training efficiency problems due to their serial training nature. In this paper, we propose a Model Pruning-enabled Federated Split Learning framework (MP-FSL) to reduce resource consumption with a secure and efficient training scheme. Specifically, MP-FSL compresses DNN models by adaptive channel pruning and splits each compressed model into two parts that are assigned to the client and the server. Meanwhile, MP-FSL adopts a novel aggregation algorithm to aggregate the pruned heterogeneous models. We implement MP-FSL with a real FL platform to evaluate its performance. The experimental results show that MP-FSL outperforms the state-of-the-art frameworks in model accuracy by up to 1.35%, while concurrently reducing storage and computational resource consumption by up to 32.2% and 26.73%, respectively. These results demonstrate that MP-FSL is a comprehensive solution to the challenges faced by DCML, with superior performance in both reduced resource consumption and enhanced model performance.
Graph pattern mining is essential for deciphering complex networks. In the real world, graphs are dynamic and evolve over time, necessitating updates in mining patterns to reflect these changes. Traditional methods us...
详细信息
Graph pattern mining is essential for deciphering complex networks. In the real world, graphs are dynamic and evolve over time, necessitating updates in mining patterns to reflect these changes. Traditional methods use fine-grained incremental computation to avoid full re-mining after each update, which improves speed but often overlooks potential gains from examining inter-update interactions holistically, thus missing out on overall efficiency *** this paper, we introduce Cheetah, a dynamic graph mining system that processes updates in a coarse-grained manner by leveraging exploration domains. These domains exploit the community structure of real-world graphs to uncover data reuse opportunities typically missed by existing approaches. Exploration domains, which encapsulate extensive portions of the graph relevant to updates, allow multiple updates to explore the same regions efficiently. Cheetah dynamically constructs these domains using a management module that identifies and maintains areas of redundancy as the graph changes. By grouping updates within these domains and employing a neighbor-centric expansion strategy, Cheetah minimizes redundant data accesses. Our evaluation of Cheetah across five real-world datasets shows it outperforms current leading systems by an average factor of 2.63 ×.
暂无评论