A pull request(PR) is an event in Git where a contributor asks project maintainers to review code he/she wants to merge into a project. The PR mechanism greatly improves the efficiency of distributed software developm...
详细信息
A pull request(PR) is an event in Git where a contributor asks project maintainers to review code he/she wants to merge into a project. The PR mechanism greatly improves the efficiency of distributed software development in the opensource community. Nevertheless, the massive number of PRs in an open-source software(OSS) project increases the workload of developers. To reduce the burden on developers, many previous studies have investigated factors that affect the chance of PRs getting accepted and built prediction models based on these factors. However, most prediction models are built on the data after PRs are submitted for a while(e.g., comments on PRs), making them not useful in practice. Because integrators still need to spend a large amount of effort on inspecting PRs. In this study, we propose an approach named E-PRedictor(earlier PR predictor) to predict whether a PR will be merged when it is created. E-PRedictor combines three dimensions of manual statistic features(i.e., contributor profile, specific pull request, and project profile) and deep semantic features generated by BERT models based on the description and code changes of PRs. To evaluate the performance of E-PRedictor, we collect475192 PRs from 49 popular open-source projects on GitHub. The experiment results show that our proposed approach can effectively predict whether a PR will be merged or not. E-PRedictor outperforms the baseline models(e.g., Random Forest and VDCNN) built on manual features significantly. In terms of F1@Merge, F1@Reject, and AUC(area under the receiver operating characteristic curve), the performance of E-PRedictor is 90.1%, 60.5%, and 85.4%, respectively.
LiDAR sensors measure the environment by emitting lasers and, when combined with deep neural networks (DNNs), can effectively identify surrounding obstacles such as vehicles and pedestrians. Given its crucial role in ...
详细信息
The scarcity of training data restricts a neural network from capturing schema diversity and intricacies, hindering schema-matching models' generalization capabilities. In this paper, we propose ISResMat, a framew...
详细信息
With the growing popularity of LLMs among the general public users, privacy-preserving and adversarial robustness have become two pressing demands for LLM-based services, which have largely been pursued separately but...
详细信息
Modern advanced large language model (LLM) applications often prepend long contexts before user queries to improve model output quality. These contexts frequently repeat, either partially or fully, across multiple que...
Distributed training of graph neural networks (GNNs) has become a crucial technique for processing large graphs. Prevalent GNN frameworks are model-centric, necessitating the transfer of massive graph vertex features ...
详细信息
Smart contracts are widely used on the blockchain to implement complex transactions,such as decentralized applications on *** vulnerability detection of large-scale smart contracts is critical,as attacks on smart cont...
详细信息
Smart contracts are widely used on the blockchain to implement complex transactions,such as decentralized applications on *** vulnerability detection of large-scale smart contracts is critical,as attacks on smart contracts often cause huge economic *** it is difficult to repair and update smart contracts,it is necessary to find the vulnerabilities before they are ***,code analysis,which requires traversal paths,and learning methods,which require many features to be trained,are too time-consuming to detect large-scale on-chain ***-based methods will obtain detection models from a feature space compared to code analysis methods such as symbol *** the existing features lack the interpretability of the detection results and training model,even worse,the large-scale feature space also affects the efficiency of *** paper focuses on improving the detection efficiency by reducing the dimension of the features,combined with expert *** this paper,a feature extraction model Block-gram is proposed to form low-dimensional knowledge-based features from ***,the metadata is separated and the runtime code is converted into a sequence of opcodes,which are divided into segments based on some instructions(jumps,etc.).Then,scalable Block-gram features,including 4-dimensional block features and 8-dimensional attribute features,are mined for the learning-based model ***,feature contributions are calculated from SHAP values to measure the relationship between our features and the results of the detection *** addition,six types of vulnerability labels are made on a dataset containing 33,885 contracts,and these knowledge-based features are evaluated using seven state-of-the-art learning algorithms,which show that the average detection latency speeds up 25×to 650×,compared with the features extracted by N-gram,and also can enhance the interpretability of the detection model.
Deep learning-based methods significantly advance the exploration of associations among triple-wise biological entities (e.g., drug-target protein-adverse reaction), thereby facilitating drug discovery and safeguardin...
Diffusion-based generative models have demonstrated their powerful performance across various tasks, but this comes at a cost of the slow sampling speed. To achieve both efficient and high-quality synthesis, various d...
Offline reinforcement learning endeavors to leverage offline datasets to craft effective agent policy without online interaction, which imposes proper conservative constraints with the support of behavior policies to ...
暂无评论