检索结果-内蒙古大学图书馆

Adaptive ROI Optimization Pyramid Network: Lane Detection for FSD under Data Uncertainty

engineering Letters 2025年第2期33卷 282-291页

作者： Cao, Xu Liu, Weisheng Wang, Zhijian College of Computer Science and Software Engineering University of Science and Technology Liaoning Anshan China College of Computer Science and Software Engineering University of Science and Technology Liaoning Co Anshan114051 China

To mitigate the challenges posed by data uncertainty in Full-Self Driving (FSD) systems. This paper proposes a novel feature extraction learning model called Adaptive Region of Interest Optimized Pyramid Network (ARO). Specifically, ARO introduces a novel cross-layer fusion attention mechanism that dynamically assigns weighted attention to feature maps across different levels, facilitating deep feature fusion and enabling the effective extraction and utilization of salient feature information. Furthermore, ARO incorporates a feature replication layer to duplicate and refine feature maps at multiple levels, thereby enhancing its capability to capture fine-grained details and generate richer feature representations. Additionally, a multi-path upsampling strategy preserves fine-grained features during upsampling. Extensive experimental evaluations conducted on the benchmark CuLane and Tusimple datasets demonstrate that ARO achieves an F1 score of 80.60% on the CuLane dataset, outperforming state-of-the-art methods, showcasing the effectiveness of the proposed approach in handling data uncertainty for robust lane detection in autonomous driving. Our code are available at https://***/caoxu0109/lane detect ldfr. © 2025, International Association of Engineers. All rights reserved.

关键词： Attention mechanisms Culane Lane detection Multi-scale fusion network Tusimple

来源：评论

学校读者我要写书评

暂无评论

Improving UAV Image Target Detection: A Novel Approach Using OptiDETR with Swin Transformer

IAENG International Journal of Computer Science

引用

IAENG International Journal of Computer Science 2025年第3期52卷 771-780页

作者： Ma, Wenlong Liu, Weisheng School of Computer Science and Software Engineering University of Science and Technology Liaoning Anshan114051 China School of Computer Science and Software Engineering University of Science and Technology Liaoning Anshan114051 China

In the analysis of drone aerial images, object detection tasks are particularly challenging, especially in the presence of complex terrain structures, extreme differences in target sizes, suboptimal shooting angles, and varying lighting conditions, all of which exacerbate the difficulty of recognition. In recent years, the DETR model based on the Transformer architecture has eliminated traditional post-processing steps such as NMS(Non-Maximum Suppression), thereby simplifying the object detection process and improving detection accuracy, which has garnered widespread attention in the academic community. However, DETR has limitations such as slow training convergence, difficulty in query optimization, and high computational costs, which hinder its application in practical fields. To address these issues, this paper proposes a new object detection model called OptiDETR. This model first employs a more efficient hybrid encoder to replace the traditional Transformer encoder. The new encoder significantly enhances feature processing capabilities through internal and cross-scale feature interaction and fusion logic. Secondly, an IoU (Intersection over Union) aware query selection mechanism is introduced. This mechanism adds IoU constraints during the training phase to provide higher-quality initial object queries for the decoder, significantly improving the decoding performance. Additionally, the OptiDETR model integrates SW-Block into the DETR decoder, leveraging the advantages of Swin Transformer in global context modeling and feature representation to further enhance the performance and efficiency of object detection. To tackle the problem of small object detection, this study innovatively employs the SAHI algorithm for data augmentation. Through a series of experiments, It achieved a significant performance improvement of more than two percentage points in the mAP (mean Average Precision) metric compared to current mainstream object detection models. Furthermore, ther

关键词： Decoding

来源：评论

学校读者我要写书评

暂无评论

A recover-then-discriminate framework for robust anomaly detection

引用

Science China(Information Sciences) 2025年第4期68卷 300-318页

作者： Peng XING Dong ZHANG Jinhui TANG Zechao LI School of Computer Science and Engineering Nanjing University of Science and Technology Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology

Anomaly detection(AD) has been extensively studied and applied across various scenarios in recent years. However, gaps remain between the current performance and the desired recognition accuracy required for practical *** paper analyzes two fundamental failure cases in the baseline AD model and identifies key reasons that limit the recognition accuracy of existing approaches. Specifically, by Case-1, we found that the main reason detrimental to current AD methods is that the inputs to the recovery model contain a large number of detailed features to be recovered, which leads to the normal/abnormal area has not/has been recovered into its original state. By Case-2, we surprisingly found that the abnormal area that cannot be recognized in image-level representations can be easily recognized in the feature-level representation. Based on the above observations, we propose a novel recover-then-discriminate(ReDi) framework for *** takes a self-generated feature map(e.g., histogram of oriented gradients) and a selected prompted image as explicit input information to address the identified in Case-1. Additionally, a feature-level discriminative network is introduced to amplify abnormal differences between the recovered and input representations. Extensive experiments on two widely used yet challenging AD datasets demonstrate that ReDi achieves state-of-the-art recognition accuracy.

关键词： recovery network HOG prompt discriminative network self-correlation loss anomaly detection

来源：评论

学校读者我要写书评

暂无评论

Structured Light Center Extraction Study with Multiple Attention Mechanisms

IAENG International Journal of Computer Science

引用

IAENG International Journal of Computer Science 2024年第4期51卷 437-446页

作者： Sun, Hang Zhou, Ziwei School of Computer and Software Engineering University of Science and Technology Liaoning Anshan114051 China School of Computer and Software Engineering University of Science and Technology Liaoning Anshan114051 China

The integration of deep learning with conventional structured light center extraction techniques improves the accuracy of extracting structural gold centers. The method is divided into three steps. The initial step involves calibration, which aims to establish a correlation between image coordinates and world coordinates. The subsequent stage involves identifying the laser fringe area. This study employs a self-designed Multi-Att DeepLabV3+ encoder-decoder neural network architecture to extract the laser fringe region. The self-designed SE-ResSkipNet module is incorporated into the structure as the backbone. The decoder utilizes a parallel alternating dual attention mechanism. The third step involves extracting the center of the laser fringe utilizing the Steger algorithm, which is based on a Hessian matrix. Conducting experimental validation on a laser image dataset that is open-source. The experimental resluts indicate that the network architecture's mIoU, fwIoU, Acc, and Acc class evaluation metrics for complex laser fringe segmentation have shown improvements of 4.22%, 0.67%, 0.33%, and 4.96%, respectively. This algorithm demonstrates superior accuracy compared to other algorithms in laser fringe segmentation, playing a crucial role in the subsequent processes of 3D reconstruction and 3D measurement. © (2024), (International Association of Engineers). All Rights Reserved.

关键词： Decoding

来源：评论

学校读者我要写书评

暂无评论

How far are we to GPT-4V?Closing the gap to commercial multimodal models with open-source suites

引用

Science China(Information Sciences) 2024年第12期67卷 5-22页

作者： Zhe CHEN Weiyun WANG Hao TIAN Shenglong YE Zhangwei GAO Erfei CUI Wenwen TONG Kongzhi HU Jiapeng LUO Zheng MA Ji MA Jiaqi WANG Xiaoyi DONG Hang YAN Hewei GUO Conghui HE Botian SHI Zhenjiang JIN Chao XU Bin WANG Xingjian WEI Wei LI Wenjian ZHANG Bo ZHANG Pinlong CAI Licheng WEN Xiangchao YAN Min DOU Lewei LU Xizhou ZHU Tong LU Dahua LIN Yu QIAO Jifeng DAI Wenhai WANG State Key Laboratory for Novel Software Technology Nanjing University Shanghai AI Laboratory School of Computer Science Fudan University SenseTime Research Department of Information Engineering The Chinese University of Hong Kong Department of Electronic Engineering Tsinghua University

In this paper, we introduce InternVL 1.5, an open-source multimodal large language model(MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements.(1) Strong vision encoder: we explored a continuous learning strategy for the large-scale vision foundation model — InternViT-6B, boosting its visual understanding capabilities, and making it can be transferred and reused in different LLMs.(2) Dynamic high-resolution: we divide images into tiles ranging from 1 to 40 of 448×448 pixels according to the aspect ratio and resolution of the input images, which supports up to 4K resolution input.(3) High-quality bilingual dataset: we carefully collected a high-quality bilingual dataset that covers common scenes, document images,and annotated them with English and Chinese question-answer pairs, significantly enhancing performance in optical character recognition(OCR) and Chinese-related tasks. We evaluate InternVL 1.5 through a series of benchmarks and comparative studies. Compared to both open-source and proprietary commercial models, InternVL 1.5 shows competitive performance, achieving state-of-the-art results in 8 of 18 multimodal benchmarks. Code and models are available at https://***/OpenGVLab/InternVL.

关键词： multimodal model open-source vision encoder dynamic resolution bilingual dataset

来源：评论

学校读者我要写书评

暂无评论

A survey on model-based reinforcement learning

引用

Science China(Information Sciences) 2024年第2期67卷 59-84页

作者： Fan-Ming LUO Tian XU Hang LAI Xiong-Hui CHEN Weinan ZHANG Yang YU National Key Laboratory for Novel Software Technology Nanjing University Polixir. ai Department of Computer Science and Engineering Shanghai Jiao Tong University

Reinforcement learning(RL) interacts with the environment to solve sequential decision-making problems via a trial-and-error approach. Errors are always undesirable in real-world applications, even though RL excels at playing complex video games that permit several trial-and-error attempts. To improve sample efficiency and thus reduce errors, model-based reinforcement learning(MBRL) is believed to be a promising direction, as it constructs environment models in which trial-and-errors can occur without incurring actual costs. In this survey, we investigate MBRL with a particular focus on the recent advancements in deep RL. There is a generalization error between the learned model of a non-tabular environment and the actual environment. Consequently, it is crucial to analyze the disparity between policy training in the environment model and that in the actual environment, guiding algorithm design for improved model learning, model utilization, and policy training. In addition, we discuss the recent developments of model-based techniques in other forms of RL, such as offline RL, goal-conditioned RL, multi-agent RL, and meta-RL. Furthermore,we discuss the applicability and benefits of MBRL for real-world tasks. Finally, this survey concludes with a discussion of the promising future development prospects for MBRL. We believe that MBRL has great unrealized potential and benefits in real-world applications, and we hope this survey will encourage additional research on MBRL.

关键词： reinforcement learning model-based reinforcement learning planning model learning model learning with reduced error model usage

来源：评论

学校读者我要写书评

暂无评论

Energy efficiency aware dynamic rate and power adaptation in carrier sensing based WLANs under Rayleigh fading and shadowing

引用

Digital Communications and Networks 2024年第4期10卷 918-933页

作者： Forkan Uddin Department of Electrical and Electronic Engineering Bangladesh University of Engineering and TechnologyDhaka1205Bangladesh

We consider the problem of energy efficiency aware dynamic adaptation of data transmission rate and transmission power of the users in carrier sensing based Wireless Local Area Networks(WLANs)in the presence of path loss,Rayleigh fading and log-normal *** a data packet transmission,we formulate an optimization problem,solve the problem,and propose a rate and transmission power adaptation scheme with a restriction methodology of data packet transmission for achieving the optimal energy *** the restriction methodology of data packet transmission,a user does not transmit a data packet if the instantaneous channel gain of the user is lower than a *** evaluate the performance of the proposed scheme,we develop analytical models for computing the throughput and energy efficiency of WLANs under the proposed scheme considering a saturation traffic *** then validate the analytical models via *** find that the proposed scheme provides better throughput and energy efficiency with acceptable throughput fairness if the restriction methodology of data packet transmission is *** means of the analytical models and simulations,we demonstrate that the proposed scheme provides significantly higher throughput,energy efficiency and fairness index than a traditional non-adaptive scheme and an existing most relevant adaptive *** and energy efficiency gains obtained by the proposed scheme with respect to the existing adapting scheme are about 75%and 103%,respectively,for a fairness index of *** also study the effect of various system parameters on throughput and energy efficiency and provide various engineering insights.

关键词： Carrier sense multiple access Energy efficiency Fading and shadowing Rate and power adaptation Throughput Wireless local area networks

来源：评论

学校读者我要写书评

暂无评论

MDEV Model:A Novel Ensemble-Based Transfer Learning Approach for Pneumonia Classification Using CXR Images

引用

Computer Systems Science & engineering 2023年第7期46卷 287-302页

作者： Mehwish Shaikh Isma Farah Siddiqui Qasim Arain Jahwan Koo Mukhtiar Ali Unar Nawab Muhammad Faseeh Qureshi Department of Software Engineering Mehran University of Engineering and TechnologyJamshoroPakistan College of Software Sungkyunkwan UniversitySuwonKorea Department of Computer Systems Mehran University of Engineering and TechnologyJamshoroPakistan Department of Computer Education Sungkyunkwan UniversitySeoulKorea

Pneumonia is a dangerous respiratory disease due to which breathing becomes incredibly difficult and painful;thus,catching it early is *** physicians’time is limited in outdoor situations due to many patients;therefore,automated systems can be a *** input images from the X-ray equipment are also highly unpredictable due to variances in radiologists’***,radiologists require an automated system that can swiftly and accurately detect pneumonic lungs from chest *** medical classifications,deep convolution neural networks are commonly *** research aims to use deep pretrained transfer learning models to accurately categorize CXR images into binary classes,i.e.,Normal and *** MDEV is a proposed novel ensemble approach that concatenates four heterogeneous transfer learning models:Mobile-Net,DenseNet-201,EfficientNet-B0,and VGG-16,which have been finetuned and trained on 5,856 CXR *** evaluation matrices used in this research to contrast different deep transfer learning architectures include precision,accuracy,recall,AUC-roc,and *** model effectively decreases training loss while increasing *** findings conclude that the proposed MDEV model outperformed cutting-edge deep transfer learning models and obtains an overall precision of 92.26%,an accuracy of 92.15%,a recall of 90.90%,an auc-roc score of 90.9%,and f-score of 91.49%with minimal data pre-processing,data augmentation,finetuning and hyperparameter adjustment in classifying Normal and Pneumonia chests.

关键词： Deep transfer learning convolution neural network image processing computer vision ensemble learning pneumonia classification MDEV model

来源：评论

学校读者我要写书评

暂无评论

Enhanced facial action unit detection with adaptable patch sizes on representative landmarks

引用

Neural Computing and Applications 2025年第5期37卷 3777-3791页

作者： Cakir, Duygu Yilmaz, Gorkem Arica, Nafiz Department of Software Engineering Bahcesehir University Istanbul Turkey Department of Computer Engineering Bahcesehir University Istanbul Turkey Department of Computer Engineering Piri Reis University Istanbul Turkey

The human face displays expressions through the contraction of various facial muscles. The Facial Action Coding System (FACS) is a widely accepted taxonomy that describes all visible changes in the face in terms of action units (AUs). In this study, AUs are examined by finding the most active landmarks of the face and then examining the most representative patch sizes of each landmark for the AU detection task. Sparse learning is employed to learn the most active landmarks for each AU, and then the active landmark patches are fed to ViT and Perceiver mechanisms independently. Experiments indicate that using active landmark patches with their most representative size improves the results when compared to using all the landmarks, especially when it is used on more challenging datasets as a support for the attention mechanism of the classifier. The results demonstrate that the proposed method improves the performance of the employed models and are further supported by experiments conducted across different datasets. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.

关键词： Face recognition

来源：评论

学校读者我要写书评

暂无评论

A survey on cross-user federated recommendation

引用

Science China(Information Sciences) 2025年第4期68卷 7-32页

作者： Enyue YANG Yudi XIONG Wei YUAN Weike PAN Qiang YANG Zhong MING College of Computer Science and Software Engineering Shenzhen University School of Electrical Engineering and Computer Science The University of Queensland WeBank AI Lab WeBank Department of Computer Science and Engineering Hong Kong University of Science and Technology College of Big Data and Internet Shenzhen Technology University Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ)

Recommender systems are effective in mitigating information overload, yet the centralized storage of user data raises significant privacy concerns. Cross-user federated recommendation(CUFR) provides a promising distributed paradigm to address these concerns by enabling privacy-preserving recommendations directly on user devices. In this survey, we review and categorize current progress in CUFR, focusing on four key aspects: privacy, security, accuracy, and efficiency. Firstly,we conduct an in-depth privacy analysis, discuss various cases of privacy leakage, and then review recent methods for privacy protection. Secondly, we analyze security concerns and review recent methods for untargeted and targeted *** untargeted attack methods, we categorize them into data poisoning attack methods and parameter poisoning attack methods. For targeted attack methods, we categorize them into user-based methods and item-based methods. Thirdly,we provide an overview of the federated variants of some representative methods, and then review the recent methods for improving accuracy from two categories: data heterogeneity and high-order information. Fourthly, we review recent methods for improving training efficiency from two categories: client sampling and model compression. Finally, we conclude this survey and explore some potential future research topics in CUFR.

关键词： cross-user federated recommendation federated recommendation federated learning recommender systems user privacy

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：