检索结果-内蒙古大学图书馆

Competition for gradient-free tuning of large language models:approaches, results, current challenges and future directions

引用

National science Review 2023年第6期10卷 16-19页

作者： Tingfeng Cao Liang Chen Dixiang Zhang Tianxiang Sun Zhengfu He Xipeng Qiu Xing Xu Hai Zhang School of Software Engineering South China University of Technology School of Computer Science Fudan University School of Computer Science and Engineering University of Electronic Science and Technology of China Pazhou Laboratory (Huangpu) School of Mathematics Northwest University

PROBLEM Recent years have witnessed the rapid progress of self-supervised language models (LMs)[1],especially large language models (LLMs)[2].LLMs not only achieved state-of-the-art performance on many natural language processing tasks,but also captured widespread attention from the public due to their great potential in a variety of real-world applications (***,search engines,writing assistants,etc.)through providing general-purpose intelligent services.A few of the LLMs are becoming foundation models,an analogy to infrastructure,that empower hundreds of downstream applications.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Revolutionizing Image Captioning: Integrating Attention Mechanisms with Adaptive Fusion Gates

IAENG International Journal of Computer Science

引用

IAENG International Journal of computer science 2024年第3期51卷 212-221页

作者： Sheng, Shou-Jun Zhou, Zi-Wei School of Computer and Software Engineering University of Science and Technology Liaoning AnShan114051 China

In order to dynamically create a sequence of textual descriptions for images, image description models often make use of the attention mechanism, which involves an automatic focus on different regions within an image. However, a prevalent issue with current attention mechanisms is their tendency to overlook essential elements within the image, prioritizing contextual aspects of the object when generating descriptive text. This constraint results in a decrease in the precision of the textual descriptions produced. To address this issue and improve the accuracy of image interpretation, a proposed model for image description utilizes an attention-based approach and includes a multi-layer decoder and a fusion gate. This model is based on an encoder-decoder architecture and utilizes Residual Network (ResNet) framework for feature extraction during the encoding phase, thereby extending the encoder-decoder structure into the decoding phase. Within this framework, an adaptive fusion gate mechanism is introduced and combined with multi-layer cascade decoders to facilitate the generation of utterances. This allows decoders from lower layers to actively contribute to the final text prediction phase, thereby incrementally improving the accuracy of the predicted text and generating more precise descriptions. MS COCO 2014 dataset has been utilized for the purpose of training and validating the effectiveness of this model in understanding images. The results clearly and unequivocally establish the model’s capacity to generate exceptional predictions. When compared with the top-performing models, it has demonstrated significant enhancements, as indicated by a 0.096 rise in BLEU_1 metric, a 0.153 improvement in ROUGE_L metric, and a remarkable 0.32 increase in CIDEr metric on MS COCO dataset. The overall improvement in performance across all evaluation criteria highlights the model’s alignment with the requirements of image understanding applications. © (2024), (International Associ

关键词： computer vision

来源：评论

学校读者我要写书评

暂无评论

A Review of Intelligent Configuration and Its Security for Complex Networks

引用

Chinese Journal of Electronics 2024年第4期33卷 920-947页

作者： Yue ZHAO Bin YANG Fei TENG Xianhua NIU Ning HU Bo TIAN Science and Technology on Communication Security Laboratory School of Computer and Information Engineering Chuzhou University School of Computing and Artificial Intelligence Southwest Jiaotong University School of Computer and Software Engineering Xihua University Cyberspace Institute of Advanced Technology Guangzhou University

Complex networks are becoming more complex because of the use of many components with diverse technologies. In fact, manual configuration that makes each component interoperable has breed latent danger to system security. There is still no comprehensive review of these studies and prospects for further research. According to the complexity of component configuration and difficulty of security assurance in typical complex networks, this paper systematically reviews the abstract models and formal analysis methods required for intelligent configuration of complex networks, specifically analyzes, and compares the current key technologies such as configuration semantic awareness, automatic generation of security configuration, dynamic deployment, and verification evaluation. These technologies can effectively improve the security of complex networks intelligent configuration and reduce the complexity of operation and maintenance. This paper also summarizes the mainstream construction methods of complex networks configuration and its security test environment and detection index system, which lays a theoretical foundation for the formation of the comprehensive effectiveness verification capability of configuration security. The whole lifecycle management system of configuration security process proposed in this paper provides an important technical reference for reducing the complexity of network operation and maintenance and improving network security.

关键词： Complex networks Intelligent configuration Configuration security software defined network Lifecycle management

来源：评论

学校读者我要写书评

暂无评论

MotionUVRNN: A Motion Capture Recurrent Neural Network for Wind Field Forecast Correction via U and V Components Modeling

IAENG International Journal of Computer Science

引用

IAENG International Journal of computer science 2024年第7期51卷 791-800页

作者： Gu, Zhan-Xin Wu, Jie Jin, Wei Han, Guo-Jing School of Computer Science and Software Engineering University of Science and Technology Liaoning Anshan114051 China School of Computer Science and Software Engineering University of Science and Technology Liaoning Anshan114051 China Anshan Meteorological Service Anshan114004 China

Wind field forecasting is crucial for human activities, but numerical weather prediction still has room to improve accuracy. In this paper, we formalize wind field forecast correction as a spatiotemporal sequence prediction task and propose MotionUVRNN, a recurrent neural network model suitable for wind field correction. The model consists of three elements: a feature decoupling module, an encoder, and a corrector. The feature decoupling module decouples the wind field into U-Wind (Wind component in x/longitude-direction) feature, V-Wind (Wind component in y/latitude direction) feature, and spatiotemporal feature. The encoder unifiedly models the motion trend and transient variation of the wind field. A novel long short-term memory dynamic capture unit (RS-LSTM) is proposed as the core part of the encoder to capture the different dynamic patterns of U-Wind and V-Wind. The corrector extracts and integrates multi-scale wind field information to capture regional features. By correcting the 3-12 hour wind field forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF;EC for short), the proposed model reduced the wind speed prediction error by 53.37% to 46.52%, and lowered the wind direction error by 21.99% to 13.26%. Moreover, this model surpasses the existing leading spatiotemporal sequence prediction models in wind field correction. © (2024) International Association of Engineers. All rights reserved.

关键词： Signal encoding

来源：评论

学校读者我要写书评

暂无评论

Underwater Biological Target Detection Algorithm and Research Based on YOLOv7 Algorithm

IAENG International Journal of Computer Science

引用

IAENG International Journal of computer science 2024年第6期51卷 594-601页

作者： Zhuang, Hongwei Liu, Weisheng School of Computer Science and Software Engineering University of Science and Technology Liaoning Anshan China College of Computer Science and Software Engineering University of Science and Technology Liaoning CO Anshan114051 China

Underwater target detection is an important method for detecting marine organisms. However, due to the image occlusion of underwater targets, blurred water quality, poor lighting conditions, small targets, and complex backgrounds, the detection of underwater biological targets has posed significant challenges. In the intricate underwater environment, the conventional feature extraction method has a few drawbacks, including imprecise feature extraction, sluggish detection speed, and inadequate robustness. Consequently, an underwater target detection method based on the enhanced You Only Look Once 7 (YOLOv7) is proposed in this study. The network architecture is reconstructed, and the Deformable Convolutional Network (DCN) modules replace some 3×3 convolutional blocks in the ELAN structure to offset sampling points and reduce background interference. Skip connections and 1× 1 convolutional architecture are added to the DCN module to improve the model’s perception of image details. In addition, Contextual Transformer 3 (COT3) is also incorporated to improve visual performance. Finally, to improve the detection efficiency of small objects, the CIoU loss function is finally replaced by the Normalized Wasserstein Distance (NWD) algorithm. The mAP of DCCN-YOLOv7 on the URPC dataset is 80.4%, according to the experimental results, 2.8% higher than the YOLOv7 network model that is used as a baseline. Furthermore, in contrast to the original YOLOv7 algorithm, the detection speed and accuracy are higher, making it more appropriate for target recognition underwater. © (2024), (International Association of Engineers). All rights reserved.

关键词： Feature extraction

来源：评论

学校读者我要写书评

暂无评论

Troy: Efficient Service Deployment for Windows Systems

引用

Chinese Journal of Electronics 2024年第1期33卷 313-322页

作者： Deyu ZHANG Yu XIE Mucong XU En CHENG Xiaoyan KUI Bangwen HE Yunhao LI School of Computer Science and Engineering Central South University

The modern university computer lab and kindergarden through 12th grade classrooms require a centralized solution to efficiently manage a large number of desktops. The existing solutions either bring virtualization overhead in runtime or requires loading a large image over 30 GB leading to an unacceptable network latency. In this work, we propose Troy which takes advantage of the differencing virtual hard disk techniques in Windows *** such, Troy only loads the modifications made on one machine to all other machines. Troy consists of two modules that are responsible to generate an initial image and merge a differencing image with its parent image, respectively. Specifically, we identify the key fields in the virtual hard disk image that links the differencing image and the parent image and find the modified blocks in the differencing images that should be used to replace the blocks in the parent image. We further design a lazy copy solution to reduce the I/O burden in image merging. We have implemented Troy on bare metal machines. The evaluation results show that the performance of Troy is comparable to the native implementation in Windows, without requiring the Windows environment.

关键词： Service deployment Virtual hard disk File system merging Windows system

来源：评论

学校读者我要写书评

暂无评论

Aligning enhanced feature representation for generalized zero-shot learning

引用

science China(Information sciences) 2025年第2期68卷 74-88页

作者： Zhiyu FANG Xiaobin ZHU Chun YANG Hongyang ZHOU Jingyan QIN Xu-Cheng YIN School of Computer & Communication Engineering University of Science and Technology Beijing

Constructing an effective common latent embedding by aligning the latent spaces of cross-modal variational autoencoders（VAEs） is a popular strategy for generalized zero-shot learning（GZSL）. However, due to the lack of fine-grained instance-wise annotations, existing VAE methods can easily suffer from the posterior collapse problem. In this paper, we propose an innovative asymmetric VAE network by aligning enhanced feature representation（AEFR） for GZSL. Distinguished from general VAE structures, we designed two asymmetric encoders for visual and semantic observations and one decoder for visual reconstruction. Specifically, we propose a simple yet effective gated attention mechanism（GAM） in the visual encoder for enhancing the information interaction between observations and latent variables, alleviating the possible posterior collapse problem effectively. In addition, we propose a novel distributional decoupling-based contrastive learning（D2-CL） to guide learning classification-relevant information while aligning the representations at the taxonomy level in the latent representation space. Extensive experiments on publicly available datasets demonstrate the state-of-the-art performance of our method. The source code is available at https://***/seeyourmind/AEFR.

关键词： generalized zero-shot learning gated attention mechanism contrastive learning multi-modal alignment

来源：评论

学校读者我要写书评

暂无评论

EKBSA:A Chinese Sentiment Analysis Model by Enhancing K-BERT

引用

计算机科学技术学报（英文版） 2025年第1期40卷 60-72页

作者：白欢王大玲冯时张一飞 School of Computer Science and Engineering Northeastern University Shenyang China

Pre-trained language models(PLMs),such as BERT,have achieved good results on many natural language processing(NLP)***,some studies have attempted to integrate factual knowledge into PLMs to adapt to vari-ous downstream *** sentiment analysis tasks,sentiment knowledge,such as sentiment words,plays a significant role in determining the sentiment tendencies of *** Chinese sentiment analysis,historical stories and fables imbue words with richer connotations and more complex sentiments than those typically found in English,which makes senti-ment knowledge injection *** clearly,this knowledge has not been fully *** this paper,we propose EKBSA,a Chinese sentiment analysis model,which is based on the K-BERT model and utilizes a sentiment knowledge graph to achieve better results on sentiment analysis *** construct a high-quality sentiment knowledge graph,we collect a large number of sentiment words by combining several existing sentiment ***,in order to under-stand texts better,we enhance local attention through syntactic analysis and direct to EKBSA focus more on syntactical-ly relevant *** is compatible with BERT and existing structural *** results show that EKBSA achieves better performance on Chinese sentiment analysis *** upon EKBSA,we further change the gen-eral attention to the context attention and propose Context EKBSA,so that the model can adapt to sentiment analysis tasks in Chinese conversations and achieve good performance.

关键词： Chinese sentiment analysis local attention sentiment knowledge graph pre-trained language model con-text attention

来源：评论

学校读者我要写书评

暂无评论

Multi-scale persistent spatiotemporal transformer for long-term urban traffic flow prediction

引用

Journal of Electronic science and Technology 2024年第1期22卷 53-69页

作者： Jia-Jun Zhong Yong Ma Xin-Zheng Niu Philippe Fournier-Viger Bing Wang Zu-kuan Wei School of Computer Science and Engineering University of Electronic Science and Technology of ChinaChengdu611731China College of Computer Science&Software Engineering Shenzhen UniversityShenzhen518060China School of Computer Science Southwest Petroleum UniversityChengdu610500China

Long-term urban traffic flow prediction is an important task in the field of intelligent transportation,as it can help optimize traffic management and improve travel *** improve prediction accuracy,a crucial issue is how to model spatiotemporal dependency in urban traffic *** recent years,many studies have adopted spatiotemporal neural networks to extract key information from traffic ***,most models ignore the semantic spatial similarity between long-distance areas when mining spatial *** also ignore the impact of predicted time steps on the next unpredicted time step for making long-term ***,these models lack a comprehensive data embedding process to represent complex spatiotemporal *** paper proposes a multi-scale persistent spatiotemporal transformer(MSPSTT)model to perform accurate long-term traffic flow prediction in *** adopts an encoder-decoder structure and incorporates temporal,periodic,and spatial features to fully embed urban traffic data to address these *** model consists of a spatiotemporal encoder and a spatiotemporal decoder,which rely on temporal,geospatial,and semantic space multi-head attention modules to dynamically extract temporal,geospatial,and semantic *** spatiotemporal decoder combines the context information provided by the encoder,integrates the predicted time step information,and is iteratively updated to learn the correlation between different time steps in the broader time range to improve the model’s accuracy for long-term *** on four public transportation datasets demonstrate that MSPSTT outperforms the existing models by up to 9.5%on three common metrics.

关键词： Graph neural network Multi-head attention mechanism Spatio-temporal dependency Traffic flow prediction

来源：评论

学校读者我要写书评

暂无评论

OCRBench: on the hidden mystery of OCR in large multimodal models

引用

science China(Information sciences) 2024年第12期67卷 23-35页

作者： Yuliang LIU Zhang LI Mingxin HUANG Biao YANG Wenwen YU Chunyuan LI Xu-Cheng YIN Cheng-Lin LIU Lianwen JIN Xiang BAI School of Artificial Intelligence and Automation Huazhong University of Science and Technology School of Electronic and Information Engineering South China University of Technology Microsoft Research School of Computer & Communication Engineering University of Science and Technology Beijing Institute of Automation Chinese Academy of Sciences School of Software Engineering Huazhong University of Science and Technology

Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual tasks remains relatively unexplored. In this paper, we conducted a comprehensive evaluation of large multimodal models, such as GPT4V and Gemini, in various text-related visual tasks including text recognition, scene text-centric visual question answering(VQA), document-oriented VQA, key information extraction(KIE), and handwritten mathematical expression recognition(HMER). To facilitate the assessment of optical character recognition(OCR) capabilities in large multimodal models, we propose OCRBench, a comprehensive evaluation benchmark. OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available. Furthermore, our study reveals both the strengths and weaknesses of these models, particularly in handling multilingual text, handwritten text, non-semantic text, and mathematical expression *** importantly, the baseline results presented in this study could provide a foundational framework for the conception and assessment of innovative strategies targeted at enhancing zero-shot multimodal *** evaluation pipeline and benchmark are available at https://***/Yuliang-Liu/Multimodal OCR.

关键词： large multimodal model OCR text recognition scene text-centric VQA document-oriented VQA key information extraction handwritten mathematical expression recognition

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：