检索结果-内蒙古大学图书馆

Multi-Task Visual Semantic Embedding Network for Image-Text Retrieval

Journal of computer science & technology 2024年第4期39卷 811-826页

作者： Xue-Yang Qin Li-Shuang Li Jing-Yao Tang Fei Hao Mei-Ling Ge Guang-Yao Pang School of Computer Science and Technology Dalian University of TechnologyDalian 116024China School of Computer Science Shaanxi Normal UniversityXi’an 710119China School of Computer Engineering Weifang UniversityWeifang 261061China Guangxi Colleges and Universities Key Laboratory of Intelligent Industry Software Wuzhou UniversityWuzhou 543002 China

Image-text retrieval aims to capture the semantic correspondence between images and texts,which serves as a foundation and crucial component in multi-modal recommendations,search systems,and online *** mainstream methods primarily focus on modeling the association of image-text pairs while neglecting the advantageous impact of multi-task learning on image-text *** this end,a multi-task visual semantic embedding network(MVSEN)is proposed for image-text ***,we design two auxiliary tasks,including text-text matching and multi-label classification,for semantic constraints to improve the generalization and robustness of visual semantic embedding from a training ***,we present an intra-and inter-modality interaction scheme to learn discriminative visual and textual feature representations by facilitating information flow within and between ***,we utilize multi-layer graph convolutional networks in a cascading manner to infer the correlation of image-text *** results show that MVSEN outperforms state-of-the-art methods on two publicly available datasets,Flickr30K and MSCOCO,with rSum improvements of 8.2%and 3.0%,respectively.

关键词： image-text retrieval cross-modal retrieval multi-task learning graph convolutional network

来源：评论

学校读者我要写书评

暂无评论

Multimodal Medical Image Fusion based on the VGG19 Model in the NSCT Domain

引用

Recent Advances in computer science and Communications 2024年第5期17卷 59-70页

作者： Liu, Chunxiang Wang, Yuwei Cheng, Tianqi Guo, Xinping Wang, Lei School of Resources and Environmental Engineering Shandong University of Technology Zibo255000 China School of Computer Science and Technology Shandong University of Technology Zibo255000 China

Aim: To deal with the drawbacks of the traditional medical image fusion methods, such as the low preservation ability of the details, the loss of edge information, and the image distortion, as well as the huge need for the training data for deep learning, a new multi-modal medical image fusion method based on the VGG19 model and the non-subsampled contourlet transform (NSCT) is proposed, whose overall objective is to simultaneously make the full use of the advantages of the NSCT and the VGG19 model. Methodology: Firstly, the source images are decomposed into the high-pass and low-pass sub-bands by NSCT, respectively. Then, the weighted average fusion rule is implemented to produce the fused low-pass sub-band coefficients, while an extractor based on the pre-trained VGG19 model is constructed to obtain the fused high-pass subband coefficients. Result and Discussion: Finally, the fusion results are reconstructed by the inversion transform of the NSCT on the fused coefficients. To prove the effectiveness and the accuracy, experiments on three types of medical datasets are implemented. Conclusion: By comparing seven famous fusion methods, both of the subjective and objective evaluations demonstrate that the proposed method can effectively avoid the loss of detailed feature information, capture more medical information from the source images, and integrate them into the fused images. © 2024 Bentham science Publishers.

关键词： Image fusion

来源：评论

学校读者我要写书评

暂无评论

Object Detection Model for Remote Sensing Images Based on YOLOv9

IAENG International Journal of Computer Science

引用

IAENG International Journal of computer science 2025年第3期52卷 840-847页

作者： Hou, Donghao Zhang, Yujun School of Computer and Software Engineering University of Science and Technology Liaoning Anshan114051 China

In the field of object detection for remote sensing images, especially in applications such as environmental monitoring and urban planning, significant progress has been made. This paper addresses the common challenges faced by traditional object detection methods in remote sensing images, such as the large number of targets and complex backgrounds, by proposing a novel network based on YOLOv9. The network innovatively introduces the C3_CD_CGA module, an enhanced module based on Cascaded Group Attention, designed to reduce computational redundancy and increase attention diversity, and enhances the processing capability of multi-scale information through the CD module. The C3 module employs deep asymmetric convolution to mitigate information loss and increase the receptive field. Additionally, the network integrates DSConv with the RepNCSPELAN4 module to adaptively focus on and precisely capture the features of elongated and curved local structures, such as vehicles. The introduction of the CARAFE module further improves the spatial resolution of the feature maps, significantly enhancing performance across various visual tasks. Experimental results show that the improved YOLOv9 achieves a mean average precision (mAP) of 88% on the SIMD dataset, which is an improvement of 1.6% compared to the baseline YOLOv9 model and 1.5% higher than the state-of-the-art YOLO-SE model. This model not only achieves more effective multi-target recognition in complex backgrounds but also strikes a good balance between accuracy and efficiency. © (2025), (International Association of Engineers). All rights reserved.

关键词： Urban planning

来源：评论

学校读者我要写书评

暂无评论

Context-aware Proactive Edge Caching for Vehicular Edge Computing Based on Asynchronous Federated Learning

引用

IEEE Internet of Things Journal 2025年第13期12卷 23195-23206页

作者： Liao, Zhuofan Liu, Pang Zheng, Bin Tang, XiaoYong Changsha University of Science and Technology School of Computer and Communication Engineering Changsha410114 China

Edge caching is a promising technique for effectively reducing backhaul pressure and content access latency in the Internet of Vehicles (IoV). The existing content caching solutions still face the following challenges: 1) Contents cached on edge servers are outdated quickly as time and user preferences change. 2) The large amount of vehicle data causes huge communication overheads. 3) Limited storage resources of edge servers. Simultaneously considering these issues to reduce transmission latency is a large-scale 0-1 constraint problem, which is NP-hard, and boosting cache hit rates is a key entry point. In this work, we propose a Context-aware Proactive Caching Strategy (CPCS) based on asynchronous federated learning, which works as follows. To improve the accuracy of content popularity prediction, thus improving the cache hit rate, we combine contextual information between different contents and use long and short-term memory networks to analyze the dynamic preferences of vehicle users. After that, vehicles complete the model training and upload via an asynchronous federation learning to complete the popularity prediction. To explore the problem of local models being outdated in asynchronous federated learning, CPCS integrates model compression algorithms, enhancing system efficiency and prediction accuracy. With the prediction results, CPCS gives a content placement algorithm based on the prediction results to approximate the optimal caching scheme. Simulation results show that the CPCS can improve the cache hit rate by 17% at most compared to existing state-of-the-art caching strategies. © 2014 IEEE.

关键词： Prediction models

来源：评论

学校读者我要写书评

暂无评论

Study on Electromagnetic Performance of Permanent Magnet Rotor and Dual Stator Starter Generator for Electric Vehicle Range Extender

引用

Progress In Electromagnetics Research B 2024年 106卷 39-55页

作者： Gao, Mingling Yu, Zhenhai Jiao, Wenjie Hu, Wenjing Geng, Huihui Liu, Yixin Liu, Shiqiang Liu, Yishuo School of Computer Science and Technology Shandong University of Technology Zibo255000 China School of Transportation and Vehicle Engineering Shandong University of Technology Zibo255000 China

The flywheel-type dual-stator permanent magnet starter generator combines engine flywheel and starter generator rotor into a single unit, which has the advantages of high efficiency, high power density, and compact structure. This paper proposes a new type of dual-stator permanent magnet starter generator topology in which the two stators are concentric and share the same permanent magnet rotor. Equivalent magnetic circuit modeling of the inner stator’s magnetic field, outer stator’s magnetic field, and synthetic magnetic field using the equivalent magnetic circuit method list the system of flux equations and solve the main magnetic flux, leakage flux, and leakage coefficient, and the results show that the equivalent magnetic circuit method has smaller error and higher accuracy than the finite element method. The harmonic electric potential of the starter generator is modeled and analyzed. The permanent magnet rotor and inner and outer stator structures are optimized to obtain the optimal parameters, and the prototype is manufactured and tested. The optimized starter generator no-load induced electromotive force fundamental amplitude is improved. The induced electromotive force harmonic distortion rate is reduced, and the output performance of the whole generator is significantly improved. © (2024), Electromagnetics Academy. All rights reserved.

关键词： Stators

来源：评论

学校读者我要写书评

暂无评论

Nonparametric Statistical Feature Scaling Based Quadratic Regressive Convolution Deep Neural Network for Software Fault Prediction

引用

computers, Materials & Continua 2024年第3期78卷 3469-3487页

作者： Sureka Sivavelu Venkatesh Palanisamy School of Computer Science Engineering and Information Systems Vellore Institute of TechnologyVellore632014India

The development of defect prediction plays a significant role in improving software quality. Such predictions are used to identify defective modules before the testing and to minimize the time and cost. The software with defects negatively impacts operational costs and finally affects customer satisfaction. Numerous approaches exist to predict software defects. However, the timely and accurate software bugs are the major challenging issues. To improve the timely and accurate software defect prediction, a novel technique called Nonparametric Statistical feature scaled QuAdratic regressive convolution Deep nEural Network (SQADEN) is introduced. The proposed SQADEN technique mainly includes two major processes namely metric or feature selection and classification. First, the SQADEN uses the nonparametric statistical Torgerson–Gower scaling technique for identifying the relevant software metrics by measuring the similarity using the dice coefficient. The feature selection process is used to minimize the time complexity of software fault prediction. With the selected metrics, software fault perdition with the help of the Quadratic Censored regressive convolution deep neural network-based classification. The deep learning classifier analyzes the training and testing samples using the contingency correlation coefficient. The softstep activation function is used to provide the final fault prediction results. To minimize the error, the Nelder–Mead method is applied to solve non-linear least-squares problems. Finally, accurate classification results with a minimum error are obtained at the output layer. Experimental evaluation is carried out with different quantitative metrics such as accuracy, precision, recall, F-measure, and time complexity. The analyzed results demonstrate the superior performance of our proposed SQADEN technique with maximum accuracy, sensitivity and specificity by 3%, 3%, 2% and 3% and minimum time and space by 13% and 15% when compared with the two sta

关键词： Software defect prediction feature selection nonparametric statistical Torgerson-Gower scaling technique quadratic censored regressive convolution deep neural network softstep activation function nelder-mead method

来源：评论

学校读者我要写书评

暂无评论

HA-FGOVD: Highlighting Fine-Grained Attributes via Explicit Linear Composition for Open-Vocabulary Object Detection

引用

IEEE Transactions on Multimedia 2025年 27卷 3171-3183页

作者： Ma, Yuqi Liu, Mengyin Zhu, Chao Yin, Xu-Cheng University of Science and Technology Beijing School of Computer and Communication Engineering Beijing100083 China

Open-vocabulary object detection (OVD) models are considered to be Large Multi-modal Models (LMM), due to their extensive training data and a large number of parameters. Mainstream OVD models prioritize object coarse-grained category rather than focus on their fine-grained attributes, e.g., colors or materials, thus failed to identify objects specified with certain attributes. Despite being pretrained on large-scale image-text pairs with rich attribute information, their latent feature space does not highlight these fine-grained attributes. In this paper, we introduce HA-FGOVD, a universal and explicit method that enhances the attribute-level detection capabilities of frozen OVD models by highlighting fine-grained attributes in explicit linear space. Our approach uses a LLM to extract attribute words in input text as a zero-shot task. Then, token attention masks are adjusted to guide text encoders in extracting both global and attribute-specific features, which are explicitly composited as two vectors in linear space to form a new attribute-highlighted feature for detection tasks. The composition weight scalars can be learned or transferred across different OVD models, showcasing the universality of our method. Experimental results show that HA-FGOVD achieves state-of-the-art performance on the FG-OVD benchmark and demonstrates promising generalization on the OVDEval benchmark, suggesting that our method addresses significant limitations in fine-grained attribute detection and has potential for broader fine-grained detection applications. © 1999-2012 IEEE.

关键词： Coarse-grained modeling

来源：评论

学校读者我要写书评

暂无评论

Inductive Lottery Ticket Learning for Graph Neural Networks

引用

Journal of computer science & technology 2024年第6期39卷 1223-1237页

作者： Yong-Duo Sui Xiang Wang Tianlong Chen Meng Wang Xiang-Nan He Tat-Seng Chua School of Data Science University of Science and Technology of ChinaHefei 230027China Department of Electrical and Computer Engineering The University of Texas at AustinAustin 78712U.S.A School of Computer Science and Information Engineering Hefei University of TechnologyHefei 230009China School of Computing National University of SingaporeSingapore

Graph neural networks (GNNs) have gained increasing popularity, while usually suffering from unaffordable computations for real-world large-scale applications. Hence, pruning GNNs is of great need but largely unexplored. The recent work Unified GNN Sparsification (UGS) studies lottery ticket learning for GNNs, aiming to find a subset of model parameters and graph structures that can best maintain the GNN performance. However, it is tailed for the transductive setting, failing to generalize to unseen graphs, which are common in inductive tasks like graph classification. In this work, we propose a simple and effective learning paradigm, Inductive Co-Pruning of GNNs (ICPG), to endow graph lottery tickets with inductive pruning capacity. To prune the input graphs, we design a predictive model to generate importance scores for each edge based on the input. To prune the model parameters, it views the weight’s magnitude as their importance scores. Then we design an iterative co-pruning strategy to trim the graph edges and GNN weights based on their importance scores. Although it might be strikingly simple, ICPG surpasses the existing pruning method and can be universally applicable in both inductive and transductive learning settings. On 10 graph-classification and two node-classification benchmarks, ICPG achieves the same performance level with 14.26%–43.12% sparsity for graphs and 48.80%–91.41% sparsity for the GNN model.

关键词： lottery ticket hypothesis graph neural networks neural network pruning

来源：评论

学校读者我要写书评

暂无评论

Denoising 1SPP Monte carlo renderings based on human visual perception 3

Denoising 1SPP Monte carlo renderings based on human visual ...

引用

3rd International Conference on Electronic Information Engineering and Data Processing, EIEDP 2024

作者： Qi, Peili Chen, Chunyi School of the computer science and technology Changchun University of Science and Technology Changchun China School of the computer science and technology Changchun University of Science and Technology Changchun China

ISBN: (纸本)9781510680531

In traditional Monte Carlo (MC) path-tracing denoising approaches, uniform processing across all pixels often overlooks the variable importance of different image regions as perceived by human observers. This study introduces a novel denoising method tailored to 1spp (one sample per pixel) MC renderings, leveraging human visual perception to prioritize computation in visually salient areas. By classifying pixels based on visual saliency, our method efficiently allocates computational resources, enhancing quality in high-saliency regions while reducing unnecessary processing in less noticeable areas. Experimental results validate the effectiveness of our approach, demonstrating improved denoising performance with reduced computational overhead. This saliency-based strategy not only achieves high-quality denoising but also paves the way for more perception-driven approaches in real-time rendering applications. © 2024 SPIE.

关键词： Pixels

来源：评论

学校读者我要写书评

暂无评论

DCL-depth: monocular depth estimation network based on iam and depth consistency loss

引用

Multimedia Tools and Applications 2025年第8期84卷 4773-4787页

作者： Han, Chenggong Lv, Chen Kou, Qiqi Jiang, He Cheng, Deqiang The School of Information and Control Engineering China University of Mining and Technology Xuzhou221116 China The School of Computer Science and Technology China University of Mining and Technology Xuzhou221116 China

The self-supervised monocular depth estimation algorithm obtains excellent results in outdoor environments. However, traditional self-supervised depth estimation methods often suffer from edge blurring in complex textured regions and the loss of depth information in pixels within weakly-textured areas. To enhance the perception ability of the deep network for complex textured areas and the accuracy of depth estimation in weakly-textured regions, the following methods are proposed in this paper. First of all, the image activity measure (IAM) is used to segment the image features. Based on the multi-directional distribution of image contours, the network's perception ability has been improved, resulting in effective enhancement of depth estimation in complex regions. Furthermore, a new loss function called depth consistency loss (DCL) is proposed, which is based on recursive recurrent networks. The DCL aims to measure the similarity between the output images of the first-order network and the second-order network, and the network's constraint on weak-texture regions has been strengthened. By employing this approach, the accuracy of estimating depth information in weakly-textured regions can be enhanced. Through adequate experimentation on the public indoor datasets, the results show that our network outperforms the compared algorithms in terms of accuracy and visualization of predicted depth. © The Author(s), under exclusive licence to Springer science+Business Media, LLC, part of Springer Nature 2024.

关键词： Complex networks

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：