检索结果-内蒙古大学图书馆

32nd IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2025

作者： Bi, Zhangqian Wan, Yao Chu, Zhaoyang Hu, Yufei Zhang, Junyi Zhang, Hongyu Xu, Guandong Jin, Hai Services Computing Technology and System Lab Cluster and Grid Computing Lab National Engineering Research Center for Big Data Technology and System Wuhan China School of Computer Science and Technology Huazhong University of Science and Technology Wuhan China School of Big Data and Software Engineering Chongqing University Chongqing China School of Computer Science University of Technology Sydney Sydney Australia

ISBN: (纸本)9798331535100

Pre-training a language model and then fine-tuning it has shown to be an efficient and effective technique for a wide range of code intelligence tasks, such as code generation, code summarization, and vulnerability detection. However, pre-training language models on a large-scale code corpus is compu-tationally expensive. Fortunately, many off-the-shelf Pre-trained Code Models (PCMs), such as CodeBERT, CodeT5, CodeGen, and Code Llama, have been released publicly. These models acquire general code understanding and generation capability during pre-training, which enhances their performance on downstream code intelligence tasks. With an increasing number of these public pre-trained models, selecting the most suitable one to reuse for a specific task is essential. In this paper, we systematically investigate the reusability of PCMs. We first explore three intuitive model selection methods that select by size, training data, or brute-force fine-tuning. Experimental results show that these straightforward techniques either perform poorly or suffer high costs. Motivated by these findings, we explore learning-based model selection strategies that utilize pre-trained models without altering their parameters. Specifically, we train proxy models to gauge the performance of pre-trained models, and measure the distribution deviation between a model's latent features and the task's labels, using their closeness as an indicator of model transferability. We conduct experiments on 100 widely-used open-source PCMs for code intelligence tasks, with sizes ranging from 42.5 million to 3 billion parameters. The results demonstrate that learning-based selection methods reduce selection time to 100 seconds, compared to 2,700 hours with brute-force fine-tuning, with less than 6% performance degradation across related tasks. © 2025 IEEE.

关键词： machine learning model reuse Model selection pre-trained code models

来源：评论

学校读者我要写书评

暂无评论

Machine Learning is All You Need: A Simple Token-Based Approach for Effective Code Clone Detection 24

Machine Learning is All You Need: A Simple Token-Based Appro...

引用

44th ACM/IEEE International Conference on Software Engineering, ICSE 2024

作者： Feng, Siyue Suo, Wenqi Wu, Yueming Zou, Deqing Liu, Yang Jin, Hai Huazhong University of Science and Technology China Nanyang Technological University Singapore Hubei Engineering Research Center on Big Data Security Cluster and Grid Computing Lab National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Hubei Key Laboratory of Distributed System Security China Jinyinhu Laboratory Wuhan430074 China School of Cyber Science and Engineering Hust Wuhan430074 China School of Computer Science and Technology Hust Wuhan430074 China

ISBN: (纸本)9798400702174

As software engineering advances and the code demand rises, the prevalence of code clones has increased. This phenomenon poses risks like vulnerability propagation, underscoring the growing importance of code clone detection techniques. While numerous code clone detection methods have been proposed, they often fall short in real-world code environments. They either struggle to identify code clones effectively or demand substantial time and computational resources to handle complex clones. This paper introduces a code clone detection method namely Toma using tokens and machine learning. Specifically, we extract token type sequences and employ six similarity calculation methods to generate feature vectors. These vectors are then input into a trained machine learning model for classification. To evaluate the effectiveness and scalability of Toma, we conduct experiments on the widely used BigCloneBench dataset. Results show that our tool outperforms token-based code clone detectors and most tree-based clone detectors, demonstrating high effectiveness and significant time savings. © 2024 ACM.

关键词： Machine learning

来源：评论

学校读者我要写书评

暂无评论

A Hybrid Model Combined with SVM and CNN for Community Content Classification 23

A Hybrid Model Combined with SVM and CNN for Community Conte...

引用

23rd IEEE International Conference on High Performance computing and Communications, 7th IEEE International Conference on Data Science and systems, 19th IEEE International Conference on Smart City and 7th IEEE International Conference on Dependability in Sensor, Cloud and Big Data systems and Applications, HPCC-DSS-SmartCity-DependSys 2021

作者： Ye, Yukui Xie, Xia Jin, Hai Wang, Duoqiang National Engineering Research Center for Big Data Technology and System School of Computer Science and Technology Huazhong University of Science and Technology Services Computing Technology and System Lab Cluster and Grid Computing Lab Wuhan China School of Computer Science and Technology Hainan University Haikou China

ISBN: (纸本)9781665494571

Community websites bring many conveniences to people, and the classification of community content is playing an important role in website management and information searching. As the carrier of community content, posts are difficult to classify manually. According to the characteristics of community content, a hybrid classification model for machine learning is proposed. This model consists of three steps. Firstly, aiming at the problem of fewer features of posts, a weighted word vector is proposed to enrich the features of posts. Secondly, since the single kernel function of SVM can not completely match all data distributions, a mixed kernel function is employed to improve the model. Finally, in order to fully utilize the powerful feature extraction ability of Convolutional Neural Network as well as the classification ability of SVM, a hybrid model is designed and implemented by replacing softmax layer with SVM classifier. The corresponding experiment results indicate that compared with traditional Convolutional Neural Network, the proposed hybrid model has better performance and stability with classification accuracy improved from 0.9% to 1.4% in general. © 2021 IEEE.

关键词： Support vector machines Smart cities Support vector machine classification Machine learning Feature extraction Stability analysis Data models

来源：评论

学校读者我要写书评

暂无评论

Cooperative Relationship Prediction between Scholars in Heterogeneous Academic Network 23

Cooperative Relationship Prediction between Scholars in Hete...

引用

作者： Shi, Jia Jin, Hai Xie, Xia National Engineering Research Center for Big Data Technology and System School of Computer Science and Technology Huazhong University of Science and Technology Services Computing Technology and System Lab Cluster and Grid Computing Lab Wuhan China School of Computer Science and Technology Hainan University Haikou China

ISBN: (纸本)9781665494571

The real academic network belongs to a heterogeneous network, therefore, for the link prediction tasks, some information on the network may be lost if only using homogeneous network methods. In order to make good use of the rich semantic information in a heterogeneous network for better predicting the cooperative relationship between scholars, this paper proposes a cooperative relationship prediction (CoRP) model. The CoRP model includes the following three steps. Firstly, a structural feature extraction module is designed based on meta path by using the similarity metric function to measure the meta path similarity between scholars. Secondly, to overcome the defects that the existing methods usually ignore the attribute information on nodes, three attribute feature presentation functions are constructed to fully utilize the attribute information of scholars in the academic network. In the end, a machine learning classifier which combines the structural similarity and attribute similarity is utilized to learn each attribute feature and complete the task of cooperative relationship prediction. The experimental results reveal that compared with the benchmark method, the proposed method has better performance and interpretability with improvement from 3% to 10% of AUC value for the AMiner academic data set. © 2021 IEEE.

关键词： Measurement Smart cities Semantics Machine learning Benchmark testing Predictive models Feature extraction

来源：评论

学校读者我要写书评

暂无评论

AsT: An Asymmetric-Sensitive Transformer for Osteonecrosis of the Femoral Head Detection 37

AsT: An Asymmetric-Sensitive Transformer for Osteonecrosis o...

引用

37th AAAI Conference on Artificial Intelligence, AAAI 2023

作者： Chen, Haoyang Liu, Shuai Lu, Feng Li, Wei Sheng, Bin Li, Mi Jin, Hai Zomaya, Albert Y. National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology China Centre for Distributed and High Performance Computing School of Computer Science The University of Sydney Australia Department of Computer Science and Engineering Shanghai Jiao Tong University China Tongji Hospital Tongji Medical College Huazhong University of Science and Technology China

ISBN: (纸本)9781577358800

Early diagnosis of osteonecrosis of the femoral head (ONFH) can inhibit the progression and improve femoral head preservation. The radiograph difference between early ONFH and healthy ones is not apparent to the naked eye. It is also hard to produce a large dataset to train the classification model. In this paper, we propose Asymmetric-Sensitive Transformer (AsT) to capture the uneven development of the bilateral femoral head to enable robust ONFH detection. Our ONFH detection is realized using the self-attention mechanism to femoral head regions while conferring sensitivity to the uneven development by the attention-shared transformer. The real-world experiment studies show that AsT achieves the best performance of AUC 0.9313 in the early diagnosis of ONFH and can find out misdiagnosis cases firmly. Copyright © 2023, Association for the Advancement of Artificial Intelligence (***). All rights reserved.

关键词： Large dataset

来源：评论

学校读者我要写书评

暂无评论

HYPERBOLIC HYPERGRAPH NEURAL NETWORKS FOR MULTI-RELATIONAL KNOWLEDGE HYPERGRAPH REPRESENTATION

arXiv

引用

arXiv 2024年

作者： Li, Mengfan Shi, Xuanhua Qiao, Chenqi Zhang, Teng Jin, Hai National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology Wuhan430074 China

Knowledge hypergraphs generalize knowledge graphs using hyperedges to connect multiple entities and depict complicated relations. Existing methods either transform hyperedges into an easier-to-handle set of binary relations or view hyperedges as isolated and ignore their adjacencies. Both approaches have information loss and may potentially lead to the creation of sub-optimal models. To fix these issues, we propose the Hyperbolic Hypergraph Neural Network (H2GNN), whose essential component is the hyper-star message passing, a novel scheme motivated by a lossless expansion of hyperedges into hierarchies. It implements a direct embedding that consciously incorporates adjacent entities, hyper-relations, and entity position-aware information. As the name suggests, H2GNN operates in the hyperbolic space, which is more adept at capturing the tree-like hierarchy. We compare H2GNN with 15 baselines on knowledge hypergraphs, and it outperforms state-of-the-art approaches in both node classification and link prediction tasks. Copyright © 2024, The Authors. All rights reserved.

关键词： Knowledge graph

来源：评论

学校读者我要写书评

暂无评论

PSMiner: A Pattern-Aware Accelerator for High-Performance Streaming Graph Pattern Mining 23

PSMiner: A Pattern-Aware Accelerator for High-Performance St...

引用

Proceedings of the 60th Annual ACM/IEEE Design Automation Conference

作者： Hao Qi Yu Zhang Ligang He Kang Luo Jun Huang Haoyu Lu Jin Zhao Hai Jin National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology China National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology China and Zhejiang-HUST Joint Research Center for Graph Processing Zhejiang Lab China Department of Computer Science University of Warwick United Kingdom

ISBN: (纸本)9798350323481

Streaming Graph Pattern Mining (GPM) has been widely used in many application fields. However, the existing streaming GPM solution suffers from many unnecessary explorations and isomorphism tests, while the existing static GPM ones require many repetitive operations to compute the full graph. In this paper, we propose a pattern-aware incremental execution approach and design the first streaming GPM accelerator called PSMiner, which integrates multiple optimizations to reduce redundant computation and improve computing efficiency. We have conducted extensive experiments. The results show that compared with the state-of-the-art software and hardware solutions, PSMiner achieves the average speedups of 770.9× and 60.4×, respectively.

关键词：

来源：评论

学校读者我要写书评

暂无评论

FlexiFed: Personalized Federated Learning for Edge Clients with Heterogeneous Model Architectures 23

FlexiFed: Personalized Federated Learning for Edge Clients w...

引用

32nd ACM World Wide Web Conference, WWW 2023

作者： Wang, Kaibin He, Qiang Chen, Feifei Chen, Chunyang Huang, Faliang Jin, Hai Yang, Yun School of Computer Science and Technology Huazhong University of Science and Technology China Department of Computing Technologies Swinburne University of Technology Australia School of Information Technology Deakin University Australia Faculty of Information Technology Monash University Australia Guangxi Key Lab of Human-machine Interaction and Intelligent Decision Nanning Normal University China National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab Huazhong University of Science and Technology Wuhan430074 China

ISBN: (纸本)9781450394161

Mobile and Web-of-Things (WoT) devices at the network edge account for more than half of the world's web traffic, making a great data source for various machine learning (ML) applications, particularly federated learning (FL) which offers a promising solution to privacy-preserving ML feeding on these data. FL allows edge mobile and WoT devices to train a shared global ML model under the orchestration of a central parameter server. In the real world, due to resource heterogeneity, these edge devices often train different versions of models (e.g., VGG-16 and VGG-19) or different ML models (e.g., VGG and ResNet) for the same ML task (e.g., computer vision and speech recognition). Existing FL schemes have assumed that participating edge devices share a common model architecture, and thus cannot facilitate FL across edge devices with heterogeneous ML model architectures. We explored this architecture heterogeneity challenge and found that FL can and should accommodate these edge devices to improve model accuracy and accelerate model training. This paper presents our findings and FlexiFed, a novel scheme for FL across edge devices with heterogeneous model architectures, and three model aggregation strategies for accommodating architecture heterogeneity under FlexiFed. Experiments with four widely-used ML models on four public datasets demonstrate 1) the usefulness of FlexiFed;and 2) that compared with the state-of-the-art FL scheme, FlexiFed improves model accuracy by 2.6%-9.7% and accelerates model convergence by 1.24 × -4.04 ×. © 2023 ACM.

关键词： Speech recognition

来源：评论

学校读者我要写书评

暂无评论

Software-Defined, Fast and Strongly-Consistent Data Replication for RDMA-Based PM Datastores

Software-Defined, Fast and Strongly-Consistent Data Replicat...

引用

International Symposium on Parallel and Distributed Processing (IPDPS)

作者： Haodi Lu Haikun Liu Chencheng Ye Xiaofei Liao Fubing Mao Yu Zhang Hai Jin National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab School of Computing Science and Technology Huazhong University of Science and Technology China

Modern storage systems typically replicate data on multiple servers to provide high reliability and availability. However, most commercially-deployed datastores often fail to offer low latency, high throughput, and strong consistency at the same time. This paper presents Whale, a Remote Direct Memory Access (RDMA) based primary-backup replication system for in-memory datastores. Whale achieves both low latency and strong consistency by decoupling metadata multicasting from data replication for all backup nodes, and using an optimistic commitment mechanism to respond to client write requests earlier. Whale achieves high throughput by propagating writes from the primary node to backup nodes asynchronously via RDMA-optimized chain replication. To further reduce the cost of data replication, we design a log-structured datastore to fully exploit the advantages of one-sided RDMA and Persistent Memory (PM). We implement Whale on a cluster equipped with PM and InfiniBand RDMA networks. Experimental results show that Whale achieves much higher throughput and lower latency than state-of-the-art replication protocols.

关键词：

来源：评论

学校读者我要写书评

暂无评论

AegonKV: a high bandwidth, low tail latency, and low storage cost KV-separated LSM store with SmartSSD-based GC offloading 25

AegonKV: a high bandwidth, low tail latency, and low storage...

引用

Proceedings of the 23rd USENIX Conference on File and Storage Technologies

作者： Zhuohui Duan Hao Feng Haikun Liu Xiaofei Liao Hai Jin Bangyu Li National Engineering Research Center for Big Data Technology and System Service Computing Technology and System Lab/Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology China

ISBN: (纸本)9781939133458

The key-value separation is renowned for its significant mitigation of the write amplification inherent in traditional LSM trees. However, KV separation potentially increases performance overhead in the management of Value region, especially for garbage collection (GC) operation that is used to reduce the redundant space occupation. In response, many efforts have been made to optimize the GC mechanism for KV separation. However, our analysis indicates that such solution based on trade-offs between CPU and I/O overheads cannot simultaneously satisfy the three requirements of KV separated systems in terms of throughput, tail latency, and space usage. This limitation hinders their real-world *** this paper, we introduce AegonKV, a "three-birds-one-stone" solution that comprehensively enhances the throughput, tail latency, and space usage of KV separated systems. AegonKV first proposes a SmartSSD-based GC offloading mechanism to enable asynchronous GC operations without competing with LSM read/write for bandwidth or CPU. AegonKV leverages offload-friendly data structures and hardware/ software execution logic to address the challenges of GC offloading. Experiments demonstrate that AegonKV achieves the largest throughput improvement of 1.28-3.3 times, a significant reduction of 37%-66% in tail latency, and 15%-85% in space overhead compared to existing KV separated systems.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：