software is constantly changing, requiring developers to perform several derived tasks in a timely manner, such as writing a description for the intention of the code change, or identifying the defect-prone code chang...
详细信息
ISBN:
(纸本)9798400703270
software is constantly changing, requiring developers to perform several derived tasks in a timely manner, such as writing a description for the intention of the code change, or identifying the defect-prone code changes. Considering that the cost of dealing with these tasks can account for a large proportion (typically around 70 percent) of the total development expenditure, automating such processes will significantly lighten the burdens of developers. To achieve such a target, existing approaches mainly rely on training deep learning models from scratch or fine-tuning existing pre-trained models on such tasks, both of which have weaknesses. Specifically, the former uses comparatively small-scale labelled data for training, making it difficult to learn and exploit the domain knowledge of programming language hidden in the large-amount unlabelled code in the wild;the latter is hard to fully leverage the learned knowledge of the pre-trained model, as existing pre-trained models are designed to encode a single code snippet rather than a code change (i.e., the difference between two code snippets). We propose to pre-train a model specially designed for code changes to better support developers in software maintenance. To this end, we first collect a large-scale dataset containing 1.5M+ pairwise data of code changes and commit messages. based on these data, we curate five different tasks for pre-training, which equip the model with diverse domain knowledge about code changes. We fine-tune the pre-trained model, CCT5, on three widely-studied tasks incurred by code changes and two tasks specific to the code review process. Results show that CCT5 outperforms both conventional deep learning approaches and existing pre-trained models on these tasks.
knowledge graph question answering is an important research direction for question answering tasks. In recent years, there has been an increasing amount of research on knowledge graph question answering, but temporal ...
详细信息
Graph summarization aims to extract critical information from large graphs by creating summaries that represent the original data. Especially, real-world daily applications generate massive dynamic streaming graphs, r...
详细信息
Broken access control is still one of the most common vulnerabilities. This vulnerability can cause unauthorized information disclosure, modification, or destruction of all data or performing business function outside...
详细信息
With the widespread application of blockchain technology, various range proof protocols based on zero-knowledge proofs have been proposed. However, existing range proof protocols suffer from issues such as high commun...
详细信息
The recruitment process usually requires a lot of resources and time. In the IT field, recruiters must also have domain knowledge in the IT field to be able to recruit well. Technological developments have given rise ...
详细信息
The detection of oil well dynamic liquid level using acoustic methods requires digital signal processing techniques to reduce environmental noises. Most denoising techniques consist of filter-based approaches and spec...
详细信息
This study mainly focus on Sybil attacks with the Identity-Augmented Proof-of-Stake (IdAPoS) protocol under different network topologies, including random, scale-free, and hierarchical networks. The study finds that s...
详细信息
This paper examines the role of the enterprise social networking (ESN) tool Slack in the daily work of software practitioners within NAV, a large-scale agile public sector organization. based on 13 interviews with NAV...
详细信息
ISBN:
(纸本)9783031611537;9783031611544
This paper examines the role of the enterprise social networking (ESN) tool Slack in the daily work of software practitioners within NAV, a large-scale agile public sector organization. based on 13 interviews with NAV developers, our case study explores how Slack is employed for knowledge sharing and daily communication across the organization. We used a newly developed framework for communication in agile teams as a theoretical lens. Through our analysis, we found that Slack use had become deeply integrated into the organizational culture and fostered alignment in three main ways: Promoting communication transparency through open discussions visible for developers organization-wide, enhancing communication quality with prompt responses and constant communication, and encouraging communication discipline through structured channels and threads. This study also unveiled some challenges, such as information overload and hindered focus. However, our findings suggest that if common hurdles are overcome, modern ESN tools can reshape how cross-organizational communication plays out in large-scale agile, reinforcing the agile principles of collaboration and motivated individuals.
Data parallelism is a powerful design paradigm for clustering tasks involving large datasets. However, existing solutions suffer from three problems: (i) using non-identical distribution based partitioning methods may...
详细信息
ISBN:
(纸本)9789819754946;9789819754953
Data parallelism is a powerful design paradigm for clustering tasks involving large datasets. However, existing solutions suffer from three problems: (i) using non-identical distribution based partitioning methods may pose the risk of data skew;(ii) frequent communication among the divided data partitions may lead to potential performance degradation;and (iii) unnecessary full computation results in significant computational overhead. In order to address these issues, we propose a density peak-based statistical parallel clustering algorithm for big data (DPSPC). Our sampling-based approach creates equal-sized data blocks with the same statistical measures of clusters, reducing data skew and eliminating inter-block communication. By sampling a subset of blocks for computation, we avoid full computation. The experimental results suggest that the NMI index of the DPSPC algorithm is generally not 10% lower than that of other distributed density peak clustering algorithms, with runtime about one-tenth and the lowest communication volume.
暂无评论