检索结果-内蒙古大学图书馆

IAENG International Journal of computer science 2025年第2期52卷 515-523页

作者： Li, Mei-Qi Zhou, Zi-Wei School of Computer Science and Software Engineering University of Science and Technology LiaoNing Anshan114051 China

Image captioning is an interdisciplinary research hotspot at the intersection of computer vision and natural language processing, representing a multimodal task that integrates core technologies from both fields. This task requires the use of computer vision techniques to analyze and extract key visual features from images, followed by the application of natural language processing techniques to generate descriptive text that is syntactically and semantically aligned with human cognition. This process poses a significant challenge for computers. Existing models mostly ignore the relative positional information of visual objects and struggle to efficiently capture the complex relationships between visual and textual data. To address these challenges, we propose a vision-to-text bidirectional collaborative image captioning method. This approach extracts both visual features and positional information of objects, allowing the model to better understand the spatial relationships between objects. The CEW word embedding approach encodes textual information more profoundly, enhancing semantic expression and contextual understanding. In the decoding phase, a bidirectional cross-attention mechanism strengthens the interaction between vision and text, leading to improved accuracy in image understanding. The model is trained and tested on the MSCOCO 2014 dataset and compared with several popular models. Experimental results demonstrate that the proposed method achieves significant improvements on the CIDEr and BLEU-1 evaluation metrics with an increase of 1.5 and 1.1, respectively. In addition, we conduct ablation experiments, quantitative analysis, and qualitative analysis to comprehensively validate the effectiveness and stability of the proposed algorithm. © (2025), (International Association of Engineers). All rights reserved.

关键词： Embeddings

来源：评论

学校读者我要写书评

暂无评论

The Improved Unet Semantic Segmentation Network for Remote Sensing Images

IAENG International Journal of Computer Science

引用

IAENG International Journal of computer science 2025年第4期52卷 1187-1195页

作者： Zhu, Hang Zhao, Ji School of Computer Science and Software Engineering University of Science and Technology Liaoning Anshan114051 China

With the development of artificial intelligence, deep learning has been increasingly used to achieve automatic detection of geographic information, replacing manual interpretation and improving efficiency. However, remote sensing images themselves have the issue of slight inter-class variance and significant intra-class variance, making it challenging to extract valuable information. Additionally, the increasing resolution and size of remote sensing images in recent years have introduced more complexity in the types of information, further increasing the difficulty of extracting valuable data. This paper proposes an improved Unet semantic segmentation network (referred to as RAUnet). First, in the encoder, continuous convolutional blocks are enhanced to extract features. At the same time, the EMAM multi-scale attention module is employed for cross-channel learning, capturing information from different feature channels of the target and using the surrounding feature information to assist in distinguishing target information. To capture multi-directional long-range dependencies, the Lo2 module is used for long-range modeling, which captures not only local contextual information but also long-range dependencies. In the decoder, a Dysample upsampling module is used to restore feature details, and in the skip connection layer, features are added for feature fusion. Experimental results show that compared to mainstream models, the proposed method achieves superior segmentation results on the Potsdam and Vihingen datasets. © (2025), (International Association of Engineers). All rights reserved.

关键词： Semantic Segmentation

来源：评论

学校读者我要写书评

暂无评论

Real-time all-frequency global llumination with radiance caching

引用

Computational Visual Media 2024年第5期10卷 923-936页

作者： Youxin Xing Gaole Pan Xiang Chen Ji Wu Lu Wang Beibei Wang School of Software Shandong UniversityJinan 250101China School of Computer Science and Engineering Nanjing University of Science and TechnologyNanjing 210094China

Global illumination(GI)plays a crucial role in rendering realistic results for virtual exhibitions,such as virtual car *** scenarios usually include all-frequency bidirectional reflectance distribution functions(BRDFs),although their geometries and light configurations may be *** allfrequency BRDFs in real time remains challenging due to the complex light *** approaches,including precomputed radiance transfer,light probes,and the most recent path-tracing-based approaches(ReSTIR PT),cannot satisfy both quality and performance requirements ***,we propose a practical hybrid global illumination approach that combines ray tracing and cached GI by caching the incoming radiance with *** approach can produce results close to those of ofline renderers at the cost of only approximately 17 ms at runtime and is robust over all-frequency *** approach is designed for applications involving static lighting and geometries,such as virtual exhibitions.

关键词： real-time global illumination all-frequency BRDFs Haar wavelets radiance caching

来源：评论

学校读者我要写书评

暂无评论

Multi-label, Classification-based Prediction of Breast Cancer Metastasis Directions

IAENG International Journal of Computer Science

引用

IAENG International Journal of computer science 2025年第1期52卷 1-10页

作者： Wang, Tingting Fan, Qi Tan, Liang Zhang, Beier School of Computer and Software Engineering Anhui Institute of Information Technology China School of Computer Science and Technology Huaibei Normal University China School of Computer and Software Engineering Anhui Institute of Information Technology China School of Computer Science and Technology Huaibei Normal University China

Predicting the metastatic direction of primary breast cancer (BC), thus assisting physicians in precise treatment, strict follow-up, and effectively improving the prognosis. The clinical data of 293,946 patients with primary BC diagnosed between 2010 and 2015 were collected from the Surveillance, Epidemiology, and End Results database. Multiple interpolations and Multi-label Synthetic Minority Over-sampling Technique methods were used for data analysis, and machine learning model was established for multi-label classification. Finally, Surgical information, lymph node status, distant metastasis, tumor size, chemotherapy, histological type, and radiotherapy had significant influence as inputs. Compared with the k-nearest neighbor model, average accuracies of the decision tree and random forest (RF) models increased from 88.84% to 93.59% and 94.14%, respectively. Their average precision, recall rate, F1 score, area under the receiver operating characteristic curve and weighted-F1 increased from 87.24% to 95.85% and 94.74%, 87.73% to 90.40% and 91.76%, 87.07% to 92.16% and 93.45%, 97.11% to 99.53% and 99.95%, 82.13% to 89.44% and 90.48%, respectively. In conclusion, the RF model, which showed the best performance, can be used in multi-label prediction of BC metastasis directions, and can assist physicians in diagnosing and treating patients with primary BC. © (2025), (International Association of Engineers). All rights reserved.

关键词： Lung cancer

来源：评论

学校读者我要写书评

暂无评论

An infrastructure software perspective toward computation offloading between executable specifications and foundation models

引用

science China(Information sciences) 2025年第4期68卷 380-382页

作者： Dezhi RAN Mengzhou WU Yuan CAO Assaf MARRON David HAREL Tao XIE Key Laboratory of High Confidence Software Technologies (PKU) Ministry of Education School of Computer SciencePeking University School of Electronics Engineering and Computer Science Peking University Department of Computer Science and Applied Mathematics Weizmann Institute of Science

Foundation models(FMs) [1] have revolutionized software development and become the core components of large software systems. This paradigm shift, however, demands fundamental re-imagining of software engineering theories and methodologies [2]. Instead of replacing existing software modules implemented by symbolic logic, incorporating FMs' capabilities to build software systems requires entirely new modules that leverage the unique capabilities of ***, while FMs excel at handling uncertainty, recognizing patterns, and processing unstructured data, we need new engineering theories that support the paradigm shift from explicitly programming and maintaining user-defined symbolic logic to creating rich, expressive requirements that FMs can accurately perceive and implement.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Object Detection Model for Remote Sensing Images Based on YOLOv9

IAENG International Journal of Computer Science

引用

IAENG International Journal of computer science 2025年第3期52卷 840-847页

作者： Hou, Donghao Zhang, Yujun School of Computer and Software Engineering University of Science and Technology Liaoning Anshan114051 China

In the field of object detection for remote sensing images, especially in applications such as environmental monitoring and urban planning, significant progress has been made. This paper addresses the common challenges faced by traditional object detection methods in remote sensing images, such as the large number of targets and complex backgrounds, by proposing a novel network based on YOLOv9. The network innovatively introduces the C3_CD_CGA module, an enhanced module based on Cascaded Group Attention, designed to reduce computational redundancy and increase attention diversity, and enhances the processing capability of multi-scale information through the CD module. The C3 module employs deep asymmetric convolution to mitigate information loss and increase the receptive field. Additionally, the network integrates DSConv with the RepNCSPELAN4 module to adaptively focus on and precisely capture the features of elongated and curved local structures, such as vehicles. The introduction of the CARAFE module further improves the spatial resolution of the feature maps, significantly enhancing performance across various visual tasks. Experimental results show that the improved YOLOv9 achieves a mean average precision (mAP) of 88% on the SIMD dataset, which is an improvement of 1.6% compared to the baseline YOLOv9 model and 1.5% higher than the state-of-the-art YOLO-SE model. This model not only achieves more effective multi-target recognition in complex backgrounds but also strikes a good balance between accuracy and efficiency. © (2025), (International Association of Engineers). All rights reserved.

关键词： Urban planning

来源：评论

学校读者我要写书评

暂无评论

A Transfer Learning Framework for Deep Multi-Agent Reinforcement Learning

引用

IEEE/CAA Journal of Automatica Sinica 2024年第11期11卷 2346-2348页

作者： Yi Liu Xiang Wu Yuming Bo Jiacun Wang Lifeng Ma the School of Automation Nanjing University of Science and Technology the Department of Computer Science and Software Engineering Monmouth University

Dear Editor,This letter presents a new transfer learning framework for the deep multi-agent reinforcement learning(DMARL) to reduce the convergence difficulty and training time when applying DMARL to a new scenario [1... 详细信息

关键词： Deep agent Framework

来源：评论

学校读者我要写书评

暂无评论

A Generative Model-Based Network Framework for Ecological Data Reconstruction

引用

computers, Materials & Continua 2025年第1期82卷 929-948页

作者： Shuqiao Liu Zhao Zhang Hongyan Zhou Xuebo Chen School of Electronic and Information Engineering University of Science and Technology LiaoningAnshan114051China School of Computer Science and Software Engineering University of Science and Technology LiaoningAnshan114051China

This study examines the effectiveness of artificial intelligence techniques in generating high-quality environmental data for species introductory site selection *** Strengths,Weaknesses,Opportunities,Threats(SWOT)analysis data with Variation Autoencoder(VAE)and Generative AdversarialNetwork(GAN)the network framework model(SAE-GAN),is proposed for environmental data *** model combines two popular generative models,GAN and VAE,to generate features conditional on categorical data embedding after SWOT *** model is capable of generating features that resemble real feature distributions and adding sample factors to more accurately track individual sample *** data is used to retain more semantic information to generate *** model was applied to species in Southern California,USA,citing SWOT analysis data to train the *** show that the model is capable of integrating data from more comprehensive analyses than traditional methods and generating high-quality reconstructed data from them,effectively solving the problem of insufficient data collection in development *** model is further validated by the Technique for Order Preference by Similarity to an Ideal Solution(TOPSIS)classification assessment commonly used in the environmental data *** study provides a reliable and rich source of training data for species introduction site selection systems and makes a significant contribution to ecological and sustainable development.

关键词： Convolutional Neural Network(CNN) VAE GAN TOPSIS data reconstruction

来源：评论

学校读者我要写书评

暂无评论

On learning the right attention point for feature enhancement

引用

science China(Information sciences) 2023年第1期66卷 131-143页

作者： Liqiang LIN Pengdi HUANG Chi-Wing FU Kai XU Hao ZHANG Hui HUANG College of Computer Science and Software Engineering Shenzhen University Department of Computer Science and Engineering The Chinese University of Hong Kong School of Computer Science National University of Defense Technology School of Computing Science Simon Fraser University

We present a novel attention-based mechanism to learn enhanced point features for point cloud processing tasks, e.g., classification and segmentation. Unlike prior studies, which were trained to optimize the weights of a pre-selected set of attention points, our approach learns to locate the best attention points to maximize the performance of a specific task, e.g., point cloud classification. Importantly, we advocate the use of single attention point to facilitate semantic understanding in point feature learning. Specifically,we formulate a new and simple convolution, which combines convolutional features from an input point and its corresponding learned attention point(LAP). Our attention mechanism can be easily incorporated into state-of-the-art point cloud classification and segmentation networks. Extensive experiments on common benchmarks, such as Model Net40, Shape Net Part, and S3DIS, all demonstrate that our LAP-enabled networks consistently outperform the respective original networks, as well as other competitive alternatives, which employ multiple attention points, either pre-selected or learned under our LAP framework.

关键词： point convolution feature enhancement attention point deep neural network

来源：评论

学校读者我要写书评

暂无评论

Multi-scale persistent spatiotemporal transformer for long-term urban traffic flow prediction

引用

Journal of Electronic science and Technology 2024年第1期22卷 53-69页

作者： Jia-Jun Zhong Yong Ma Xin-Zheng Niu Philippe Fournier-Viger Bing Wang Zu-kuan Wei School of Computer Science and Engineering University of Electronic Science and Technology of ChinaChengdu611731China College of Computer Science&Software Engineering Shenzhen UniversityShenzhen518060China School of Computer Science Southwest Petroleum UniversityChengdu610500China

Long-term urban traffic flow prediction is an important task in the field of intelligent transportation,as it can help optimize traffic management and improve travel *** improve prediction accuracy,a crucial issue is how to model spatiotemporal dependency in urban traffic *** recent years,many studies have adopted spatiotemporal neural networks to extract key information from traffic ***,most models ignore the semantic spatial similarity between long-distance areas when mining spatial *** also ignore the impact of predicted time steps on the next unpredicted time step for making long-term ***,these models lack a comprehensive data embedding process to represent complex spatiotemporal *** paper proposes a multi-scale persistent spatiotemporal transformer(MSPSTT)model to perform accurate long-term traffic flow prediction in *** adopts an encoder-decoder structure and incorporates temporal,periodic,and spatial features to fully embed urban traffic data to address these *** model consists of a spatiotemporal encoder and a spatiotemporal decoder,which rely on temporal,geospatial,and semantic space multi-head attention modules to dynamically extract temporal,geospatial,and semantic *** spatiotemporal decoder combines the context information provided by the encoder,integrates the predicted time step information,and is iteratively updated to learn the correlation between different time steps in the broader time range to improve the model’s accuracy for long-term *** on four public transportation datasets demonstrate that MSPSTT outperforms the existing models by up to 9.5%on three common metrics.

关键词： Graph neural network Multi-head attention mechanism Spatio-temporal dependency Traffic flow prediction

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：