Source code representation with deep learning techniques is an important research field. There have been many studies to learn sequential or structural information for code representation. However, existing sequence-b...
详细信息
Source code representation with deep learning techniques is an important research field. There have been many studies to learn sequential or structural information for code representation. However, existing sequence-based models and non-sequence models both have their limitations. Although researchers attempt to incorporate structural information into sequence-based models, they only mine part of token-level hierarchical structure information. In this paper, we analyze how the complete hierarchical structure influences the tokens in code sequences and abstract this influence as a property of code tokens called hierarchical embedding. This hierarchical structure includes frequent combinations, which represent strong semantics and can help identify unique code structures. We further analyze these hierarchy combinations and propose a novel compression algorithm Hierarchy BPE. Our algorithm can extract frequent hierarchy combinations and reduce the total length of hierarchical embedding. Based on the above compression algorithm, we propose the Byte-Pair Encoded Hierarchy Transformer (BPE-HiT), a simple but effective sequence model that incorporates the compressed hierarchical embeddings of source code into a Transformer model. Given that BPE-HiT significantly reduces computational overhead, we scale up the model training phase and implement a hierarchy-aware pre-training framework. We conduct extensive experiments on 10 datasets for evaluation, including code classification, clone detection, method name prediction and code completion tasks. Results show that our non-pre-trained BPE-HiT outperforms the state-of-the-art baselines by at least 0.94% on average accuracy on code classification tasks with three different program languages. On the method name prediction task, BPE-HiT outperforms baselines by at least 2.04, 1.34 in F1-score on two real-world datasets. Besides, our pre-trained BPE-HiT outperforms other pre-trained baseline models with the same number of parameter
In this paper, a new lightweight and efficient time-series backbone structure-temporal attention splitting network (TAS) is built, and good estimation of Cuff-Less blood pressure and non-invasive blood glucose is achi...
详细信息
Currently, more and more video data and terminal devices accessing video resources are available to users. Video platforms such as Tiktok and Youtube are gradually rising, and the user scale and video resources are in...
详细信息
Asymmetric group key agreement allows a group of users to negotiate a public encryption key that corresponds to several decryption keys, and each decryption key can only be computed by one group member. This novel not...
详细信息
This paper proposes to use K-means and Apriori to prediction device action based on time in Smart Home System. In the existing methods, the system provides services to human when conditions are met, such as high tempe...
详细信息
The height of Infiltration line of tailings dam is a very important parameter for the safety of tailings reservoir. Infiltration line is the intersection line between the free water surface and the cross section of th...
详细信息
The rise of Internet of Things (IoT) technology has enhanced several aspects of our lives. However, the dynamic connection between IoT devices, the resource restrictions, and their heterogeneity elevate the risk of ne...
详细信息
Gait Emotion Recognition (GER) is an emerging task within Human Emotion Recognition. Skeleton-based GER requires discriminative spatial and temporal features. However, current methods primarily focus on capturing spat...
Graph Neural Networks (GNNs) have become essential in interpreting relational data across various domains, yet, they often struggle to generalize to unseen graph data that differs markedly from training instances. In ...
ISBN:
(纸本)9798331314385
Graph Neural Networks (GNNs) have become essential in interpreting relational data across various domains, yet, they often struggle to generalize to unseen graph data that differs markedly from training instances. In this paper, we introduce a novel framework called General Retrieval-Augmented Graph Learning (RAGRAPH), which brings external graph data into the general graph foundation model to improve model generalization on unseen scenarios. On the top of our framework is a toy graph vector library that we established, which captures key attributes, such as features and task-specific label information. During inference, the RAGRAPH adeptly retrieves similar toy graphs based on key similarities in downstream tasks, integrating the retrieved data to enrich the learning context via the message-passing prompting mechanism. Our extensive experimental evaluations demonstrate that RAGRAPH significantly outperforms state-of-the-art graph learning methods in multiple tasks such as node classification, link prediction, and graph classification across both dynamic and static datasets. Furthermore, extensive testing confirms that RAGRAPH consistently maintains high performance without the need for task-specific fine-tuning, highlighting its adaptability, robustness, and broad applicability.
A major concern of the existing transportation system is air pollution caused by the burning of fossil fuels. This problem can be minimized using battery electric vehicles (EVs) in which a battery is used as the main ...
详细信息
暂无评论