Code review is a critical process in software development, contributing to the overall quality of the product by identifying errors early. A key aspect of this process is the selection of appropriate reviewers to scru...
详细信息
Code review is a critical process in software development, contributing to the overall quality of the product by identifying errors early. A key aspect of this process is the selection of appropriate reviewers to scrutinize changes made to source code. However, in large-scale open-source projects, selecting the most suitable reviewers for a specific change can be a challenging task. To address this, we introduce the Code Context Based Reviewer Recommendation (CCB-RR), a model that leverages information from changesets to recommend the most suitable reviewers. The model takes into consideration the paths of modified files and the context derived from the changesets, including their titles and descriptions. Additionally, CCB-RR employs KeyBERT to extract the most relevant keywords and compare the semantic similarity across changesets. The model integrates the paths of modified files, keyword information, and the context of code changes to form a comprehensive picture of the changeset. We conducted extensive experiments on four open-source projects, demonstrating the effectiveness of CCB-RR. The model achieved a Top-1 accuracy of 60%, 55%, 51%, and 45% on the Android, OpenStack, QT, and LibreOffice projects respectively. For Mean Reciprocal Rank (MRR), CCB achieved 71%, 62%, 52%, and 68% on the same projects respectively, thereby highlighting its potential for practical application in code reviewer recommendation.
Partial-label learning(PLL) is a typical problem of weakly supervised learning, where each training instance is annotated with a set of candidate labels. Self-training PLL models achieve state-of-the-art performance b...
详细信息
Partial-label learning(PLL) is a typical problem of weakly supervised learning, where each training instance is annotated with a set of candidate labels. Self-training PLL models achieve state-of-the-art performance but suffer from error accumulation problems caused by mistakenly disambiguated instances. Although co-training can alleviate this issue by training two networks simultaneously and allowing them to interact with each other, most existing co-training methods train two structurally identical networks with the same task, i.e., are symmetric, rendering it insufficient for them to correct each other due to their similar limitations. Therefore, in this paper, we propose an asymmetric dual-task co-training PLL model called AsyCo,which forces its two networks, i.e., a disambiguation network and an auxiliary network, to learn from different views explicitly by optimizing distinct tasks. Specifically, the disambiguation network is trained with a self-training PLL task to learn label confidence, while the auxiliary network is trained in a supervised learning paradigm to learn from the noisy pairwise similarity labels that are constructed according to the learned label confidence. Finally, the error accumulation problem is mitigated via information distillation and confidence refinement. Extensive experiments on both uniform and instance-dependent partially labeled datasets demonstrate the effectiveness of AsyCo.
End-to-end training has emerged as a prominent trend in speech recognition, with Conformer models effectively integrating Transformer and CNN architectures. However, their complexity and high computational cost pose d...
详细信息
To predict the lithium-ion(Li-ion) battery degradation trajectory in the early phase,arranging the maintenance of battery energy storage systems is of great ***,under different operation conditions,Li-ion batteries pr...
详细信息
To predict the lithium-ion(Li-ion) battery degradation trajectory in the early phase,arranging the maintenance of battery energy storage systems is of great ***,under different operation conditions,Li-ion batteries present distinct degradation patterns,and it is challenging to capture negligible capacity fade in early *** the data-driven method showing promising performance,insufficient data is still a big issue since the ageing experiments on the batteries are too slow and *** this study,we proposed twin autoencoders integrated into a two-stage method to predict the early cycles' degradation *** two-stage method can properly predict the degradation from course to *** twin autoencoders serve as a feature extractor and a synthetic data generator,***,a learning procedure based on the long-short term memory(LSTM) network is designed to hybridize the learning process between the real and synthetic *** performance of the proposed method is verified on three datasets,and the experimental results show that the proposed method can achieve accurate predictions compared to its competitors.
Traffic encryption techniques facilitate cyberattackers to hide their presence and activities. Traffic classification is an important method to prevent network threats. However, due to the tremendous traffic volume an...
详细信息
Traffic encryption techniques facilitate cyberattackers to hide their presence and activities. Traffic classification is an important method to prevent network threats. However, due to the tremendous traffic volume and limitations of computing, most existing traffic classification techniques are inapplicable to the high-speed network environment. In this paper, we propose a High-speed Encrypted Traffic Classification(HETC) method containing two stages. First, to efficiently detect whether traffic is encrypted, HETC focuses on randomly sampled short flows and extracts aggregation entropies with chi-square test features to measure the different patterns of the byte composition and distribution between encrypted and unencrypted flows. Second, HETC introduces binary features upon the previous features and performs fine-grained traffic classification by combining these payload features with a Random Forest model. The experimental results show that HETC can achieve a 94% F-measure in detecting encrypted flows and a 85%–93% F-measure in classifying fine-grained flows for a 1-KB flow-length dataset, outperforming the state-of-the-art comparison methods. Meanwhile, HETC does not need to wait for the end of the flow and can extract mass computing features. The average time for HETC to process each flow is only 2 or 16 ms, which is lower than the flow duration in most cases, making it a good candidate for high-speed traffic classification.
The current urban intelligent transportation is in a rapid development stage, and coherence control of vehicle formations has important implications in urban intelligent transportation research. This article focuses o...
详细信息
As urban populations grow, smart home technology has become a key enabler for enhancing energy efficiency, comfort, and convenience in residential environments. However, existing smart home implementations often strug...
详细信息
Temporal knowledge graph(TKG) reasoning, has seen widespread use for modeling real-world events, particularly in extrapolation settings. Nevertheless, most previous studies are embedded models, which require both enti...
详细信息
Temporal knowledge graph(TKG) reasoning, has seen widespread use for modeling real-world events, particularly in extrapolation settings. Nevertheless, most previous studies are embedded models, which require both entity and relation embedding to make predictions, ignoring the semantic correlations among different entities and relations within the same timestamp. This can lead to random and nonsensical predictions when unseen entities or relations occur. Furthermore, many existing models exhibit limitations in handling highly correlated historical facts with extensive temporal depth. They often either overlook such facts or overly accentuate the relationships between recurring past occurrences and their current counterparts. Due to the dynamic nature of TKG, effectively capturing the evolving semantics between different timestamps can be *** address these shortcomings, we propose the recurrent semantic evidenceaware graph neural network(RE-SEGNN), a novel graph neural network that can learn the semantics of entities and relations simultaneously. For the former challenge, our model can predict a possible answer to missing quadruples based on semantics when facing unseen entities or relations. For the latter problem, based on an obvious established force, both the recency and frequency of semantic history tend to confer a higher reference value for the current. We use the Hawkes process to compute the semantic trend, which allows the semantics of recent facts to gain more attention than those of distant facts. Experimental results show that RE-SEGNN outperforms all SOTA models in entity prediction on 6 widely used datasets, and 5 datasets in relation prediction. Furthermore, the case study shows how our model can deal with unseen entities and relations.
The manual process of evaluating answer scripts is strenuous. Evaluators use the answer key to assess the answers in the answer scripts. Advancements in technology and the introduction of new learning paradigms need a...
详细信息
There is a growing interest in sustainable ecosystem development, which includes methods such as scientific modeling, environmental assessment, and development forecasting and planning. However, due to insufficient su...
详细信息
暂无评论