检索结果-内蒙古大学图书馆

MSC-YOLO:Improved YOLOv7 Based on Multi-Scale Spatial Context for Small Object Detection in UAV-View

computers, Materials & Continua 2024年第4期79卷 983-1003页

作者： Xiangyan Tang Chengchun Ruan Xiulai Li Binbin Li Cebin Fu School of Computer Science and Technology Hainan UniversityHaikou570228China Hainan Blockchain Technology Engineering Research Center Hainan UniversityHaikou570228China School of Cyberspace Security(School of Cryptology) Hainan UniversityHaikou570228China

Accurately identifying small objects in high-resolution aerial images presents a complex and crucial task in thefield of small object detection on unmanned aerial vehicles(UAVs).This task is challenging due to variations inUAV flight altitude,differences in object scales,as well as factors like flight speed and motion *** enhancethe detection efficacy of small targets in drone aerial imagery,we propose an enhanced You Only Look Onceversion 7(YOLOv7)algorithm based on multi-scale spatial *** build the MSC-YOLO model,whichincorporates an additional prediction head,denoted as P2,to improve adaptability for small *** replaceconventional downsampling with a Spatial-to-Depth Convolutional Combination(CSPDC)module to mitigatethe loss of intricate feature details related to small ***,we propose a Spatial Context Pyramidwith Multi-Scale Attention(SCPMA)module,which captures spatial and channel-dependent features of smalltargets acrossmultiple *** module enhances the perception of spatial contextual features and the utilizationof multiscale feature *** the Visdrone2023 and UAVDT datasets,MSC-YOLO achieves remarkableresults,outperforming the baseline method YOLOv7 by 3.0%in terms ofmean average precision(mAP).The MSCYOLOalgorithm proposed in this paper has demonstrated satisfactory performance in detecting small targets inUAV aerial photography,providing strong support for practical applications.

关键词： Small object detection YOLOv7 multi-scale attention spatial context

来源：评论

学校读者我要写书评

暂无评论

CF-AMVRGO: Collaborative Filtering based Adaptive Moment Variance Reduction Gradient Optimizer for Movie Recommendations

引用

International Journal of computers and Applications 2022年第11期44卷 1015-1023页

作者： Lakshmi Chetana, V. Seetha, Hari School of Computer Science and Engineering VIT-AP University Andhra Pradesh Vijayawada India Center of Excellence AI and Robotics VIT-AP University Andhra Pradesh Vijayawada India

In recent decades, recommendation systems are used in a variety of applications like social connections, movies, music, and venues. The existing algorithms has certain limitations like data sparsity, cold start problem, and poor scalability to degrade the movie recommendation performance. To address the aforementioned concerns, a new matrix factorization-based collaborative filtering algorithm is developed to enhance movie recommendation performance. In a neural collaborative filtering algorithm, an adaptive moment variance reduction gradient optimizer is applied in low-rank matrix factorization for representing the users and items in a low dimensional latent space to obtain effective movie recommendation performance. In this paper, the proposed algorithm performance is evaluated on three benchmark datasets;movielens 100K, 1M, and 10M datasets. Simulation results showed that the proposed algorithm obtained the difference of 0.08 and 0.056 roots mean square error value in movielens 1M and 10M datasets, which are better compared to the existing collaborative filtering-based deep learning algorithm. © 2022 Informa UK Limited, trading as Taylor & Francis Group.

关键词： Collaborative filtering

来源：评论

学校读者我要写书评

暂无评论

A Normalizing Flow-Based Bidirectional Mapping Residual Network for Unsupervised Defect Detection

引用

computers, Materials & Continua 2024年第2期78卷 1631-1648页

作者： Lanyao Zhang Shichao Kan Yigang Cen Xiaoling Chen Linna Zhang Yansen Huang School of Mechanical Engineering Guizhou UniversityGuiyang550025China School of Computer Science and Engineering Central South UniversityChangsha410083China School of Computer and Information Technology Beijing Jiaotong UniversityBeijing100044China College of Civil Engineering Guizhou UniversityGuiyang550025China Guizhou Lianjian Civil Engineering Quality Inspection Monitoring Center Co. Ltd.Guiyang550025China

Unsupervised methods based on density representation have shown their abilities in anomaly detection,but detection performance still needs to be ***,approaches using normalizing flows can accurately evaluate sample distributions,mapping normal features to the normal distribution and anomalous features outside ***,this paper proposes a Normalizing Flow-based Bidirectional Mapping Residual Network(NF-BMR).It utilizes pre-trained Convolutional Neural Networks(CNN)and normalizing flows to construct discriminative source and target domain feature ***,to better learn feature information in both domain spaces,we propose the Bidirectional Mapping Residual Network(BMR),which maps sample features to these two spaces for anomaly *** two detection spaces effectively complement each other’s deficiencies and provide a comprehensive feature evaluation from two perspectives,which leads to the improvement of detection *** experimental results on the MVTec AD and DAGM datasets against the Bidirectional Pre-trained Feature Mapping Network(B-PFM)and other state-of-the-art methods demonstrate that the proposed approach achieves superior *** the MVTec AD dataset,NF-BMR achieves an average AUROC of 98.7%for all 15 ***,it achieves 100%optimal detection performance in five *** the DAGM dataset,the average AUROC across ten categories is 98.7%,which is very close to supervised methods.

关键词： Anomaly detection normalizing flow source domain feature space target domain feature space bidirectional mapping residual network

来源：评论

学校读者我要写书评

暂无评论

ViGT: proposal-free video grounding with a learnable token in the transformer

引用

science China(Information sciences) 2023年第10期66卷 196-212页

作者： Kun LI Dan GUO Meng WANG School of Computer Science and Information Engineering Hefei University of Technology Key Laboratory of Knowledge Engineering with Big Data Ministry of Education Intelligent Interconnected Systems Laboratory of Anhui Province Institute of Artificial Intelligence Hefei Comprehensive National Science Center

The video grounding(VG) task aims to locate the queried action or event in an untrimmed video based on rich linguistic descriptions. Existing proposal-free methods are trapped in the complex interaction between video and query, overemphasizing cross-modal feature fusion and feature correlation for VG. In this paper, we propose a novel boundary regression paradigm that performs regression token learning in a transformer. Particularly, we present a simple but effective proposal-free framework, namely video grounding transformer(ViGT), which predicts the temporal boundary using a learnable regression token rather than multi-modal or cross-modal features. In ViGT, the benefits of a learnable token are manifested as follows.(1) The token is unrelated to the video or the query and avoids data bias toward the original video and query.(2) The token simultaneously performs global context aggregation from video and query ***, we employed a sharing feature encoder to project both video and query into a joint feature space before performing cross-modal co-attention(i.e., video-to-query attention and query-to-video attention) to highlight discriminative features in each modality. Furthermore, we concatenated a learnable regression token [REG] with the video and query features as the input of a vision-language transformer. Finally, we utilized the token [REG] to predict the target moment and visual features to constrain the foreground and background probabilities at each timestamp. The proposed ViGT performed well on three public datasets:ANet-Captions, TACoS, and YouCookⅡ. Extensive ablation studies and qualitative analysis further validated the interpretability of ViGT.

关键词： video grounding temporal sentence grounding boundary regression token learning proposal-free

来源：评论

学校读者我要写书评

暂无评论

Masked Generative Light Field Prompting for Pixel-Level Structure Segmentations

引用

Research 2024年第4期2024卷 533-544页

作者： Mianzhao Wang Fan Shi Xu Cheng Shengyong Chen The Engineering Research Center of Learning-Based Intelligent System(Ministry of Education) Tianjin University of TechnologyTianjin 300384China Key Laboratory of Computer Vision and System(Ministry of Education) Tianjin University of TechnologyTianjin 300384China School of Computer Science and Engineering Tianjin University of TechnologyTianjin 300384China

Pixel-level structure segmentations have attracted considerable attention,playing a crucial role in autonomous driving within the metaverse and enhancing comprehension in light field-based machine ***,current light field modeling methods fail to integrate appearance and geometric structural information into a coherent semantic space,thereby limiting the capability of light field transmission for visual *** this paper,we propose a general light field modeling method for pixel-level structure segmentation,comprising a generative light field prompting encoder(LF-GPE)and a prompt-based masked light field pretraining(LF-PMP)*** LF-GPE,serving as a light field backbone,can extract both appearance and geometric structural cues *** aligns these features into a unified visual space,facilitating semantic ***,our LF-PMP,during the pretraining phase,integrates a mixed light field and a multi-view light field *** prioritizes considering the geometric structural properties of the light field,enabling the light field backbone to accumulate a wealth of prior *** evaluate our pretrained LF-GPE on two downstream tasks:light field salient object detection and semantic *** results demonstrate that LF-GPE can effectively learn high-quality light field features and achieve highly competitive performance in pixel-level segmentation tasks.

关键词： prompt backbone integrate

来源：评论

学校读者我要写书评

暂无评论

Dynamic Evolutionary Game-based Modeling,Analysis and Performance Enhancement of Blockchain Channels

引用

IEEE/CAA Journal of Automatica Sinica 2023年第1期10卷 188-202页

作者： PeiYun Zhang MengChu Zhou ChenXi Li Abdullah Abusorrah IEEE the School of Computer Science Nanjing University of Information Science&TechnologyNanjing 210044China the Helen and John C.Hartmann Department of Electrical and Computer Engineering New Jersey Institute of TechnologyNewarkNJ 07102 USA the School of Computer and Information Anhui Normal UniversityWuhu 241003China the Department of Electrical and Computer Engineering Faculty of Engineeringand Center of Research Excellence in Renewable Energy and Power SystemsKing Abdulaziz UniversityJeddah 21481Saudi Arabia

The recent development of channel technology has promised to reduce the transaction verification time in blockchain *** transactions are transmitted through the channels created by nodes,the nodes need to cooperate with each *** one party refuses to do so,the channel is unstable.A stable channel is thus *** nodes may show uncooperative behavior,they may have a negative impact on the stability of such *** order to address this issue,this work proposes a dynamic evolutionary game model based on node *** model considers various defense strategies'cost and attack success ratio under *** can dynamically adjust their strategies according to the behavior of attackers to achieve their effective *** equilibrium stability of the proposed model can be *** proposed model can be applied to general channel *** is compared with two state-of-the-art blockchain channels:Lightning network and Spirit *** experimental results show that the proposed model can be used to improve a channel's stability and keep it in a good cooperative stable *** its use enables a blockchain to enjoy higher transaction success ratio and lower transaction transmission delay than the use of its two peers.

关键词： Blockchain channel network evolutionary game malicious behavior secure computing stability analysis

来源：评论

学校读者我要写书评

暂无评论

Revealing Security Flaws in Cross-Modal Retrieval Models through Video Poisoning

引用

IEEE Transactions on Circuits and Systems for Video Technology 2025年第6期35卷 6184-6194页

作者： Jin, Ming Hu, Wenbo Hong, Richang Zhu, Lei Hefei University of Technology School of Computer and Information Hefei China Data Space Research Institute Hefei Comprehensive National Science Center China Tongji University School of Electronic and Information Engineering Shanghai China

Video-text cross-modal retrieval is widely studied to improve retrieval accuracy. However, the security of video-text cross-modal retrieval models receives little attention. If attackers exploit the security vulnerabilities in these models, it poses a significant threat to the retrieval models. Thus, identifying security flaws in video-text cross-modal retrieval models becomes the focus of our research. We are the first to design a video poisoning model to uncover security vulnerabilities in retrieval models. Existing poisoning models have certain limitations when it comes to exploiting vulnerabilities in retrieval models. These include failing to comprehensively embed malicious information into the original video and being unable to maintain visual consistency between the original and poisoned videos. These limitations can result in unsuccessful attacks on retrieval models and an inability to effectively identify security flaws within them. To address these shortcomings, we design an efficient poisoning model that embeds malicious information thoroughly into the original clean data to attack video-text cross-modal retrieval models. We are the first to use a poisoning model to attack retrieval models, thereby uncovering their security vulnerabilities. Second, we introduce a bi-level poisoning module to ensure that malicious information is thoroughly embedded into the original video, thereby enhancing the attack capability of the poisoning model. Finally, we design an adversarial module to improve visual consistency between the original and poisoned videos, thus enhancing the concealment of malicious information within the training data of retrieval models. Our poisoning model can identify security flaws in video-text cross-modal retrieval models, providing insights into improving the security of retrieval models. The effectiveness of our model is validated on the MSR-VTT, LSMDC, and MSVD datasets. © 1991-2012 IEEE.

关键词： Content based retrieval

来源：评论

学校读者我要写书评

暂无评论

TFENet: Topological Feature Extraction Network for 6D Object Pose Estimation

TFENet: Topological Feature Extraction Network for 6D Object...

引用

2023 China Automation Congress, CAC 2023

作者： Zhang, Zhihao Li, Zhengrong Lin, Jiacheng Li, Zhiyong Chen, Wenrui College of Computer Science and Electronic Engineering Hunan University Changsha China School of Robotics Hunan University Changsha China

ISBN: (纸本)9798350303759

Grasping is an essential and challenging skill for robots to interact with their environment. Accurately obtaining the 6D pose information of the grasped object is the key to successful grasping. Due to the influence of illumination changes and object textures, the previous methods had problems with low utilization of depth information and inaccurate feature information. We study feature information extraction and fusion to address these issues and proposed a topological feature extraction network (TFENet). Firstly, we design TFE module to achieve global feature extraction from object point clouds, through which features can be identified to use depth information efficiently. Secondly, we design a MFF module to fuse color and geometric embedding better. Finally, experiments show that the proposed method outperforms existing methods with two datasets, LineMOD and YCB-Video. © 2023 IEEE.

关键词： Feature extraction

来源：评论

学校读者我要写书评

暂无评论

Multi-Scale Fusion and Decomposition Network for Single Image Deraining

引用

IEEE Transactions on Image Processing 2024年 33卷 191-204页

作者： Wang, Qiong Jiang, Kui Wang, Zheng Ren, Wenqi Zhang, Jianhui Lin, Chia-Wen Wuhan University National Engineering Research Center for Multimedia Software School of Computer Science Wuhan430072 China Sun Yat-sen University School of Cyber Science and Technology Guangzhou510275 China Hangzhou Dianzi University College of Computer Science and Technology Hangzhou310018 China National Tsing Hua University Department of Electrical Engineering The Institute of Communications Engineering Hsinchu30013 Taiwan

Convolutional neural networks (CNNs) and self-attention (SA) have demonstrated remarkable success in low-level vision tasks, such as image super-resolution, deraining, and dehazing. The former excels in acquiring local connections with translation equivariance, while the latter is better at capturing long-range dependencies. However, both CNNs and Transformers suffer from individual limitations, such as limited receptive field and weak diversity representation of CNNs during low efficiency and weak local relation learning of SA. To this end, we propose a multi-scale fusion and decomposition network (MFDNet) for rain perturbation removal, which unifies the merits of these two architectures while maintaining both effectiveness and efficiency. To achieve the decomposition and association of rain and rain-free features, we introduce an asymmetrical scheme designed as a dual-path mutual representation network that enables iterative refinement. Additionally, we incorporate high-efficiency convolutions throughout the network and use resolution rescaling to balance computational complexity with performance. Comprehensive evaluations show that the proposed approach outperforms most of the latest SOTA deraining methods and is versatile and robust in various image restoration tasks, including underwater image enhancement, image dehazing, and low-light image enhancement. The source codes and pretrained models are available at https://***/qwangg/MFDNet. © 1992-2012 IEEE.

关键词： Image reconstruction

来源：评论

学校读者我要写书评

暂无评论

ARCHER:a ReRAM-based accelerator for compressed recommendation systems

引用

Frontiers of computer science 2024年第5期18卷 147-160页

作者： Xinyang SHEN Xiaofei LIAO Long ZHENG Yu HUANG Dan CHEN Hai JIN National Engineering Research Center for Big Data Technology and System Services Computing Technology and System LabClusters and Grid Computing LabSchool of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhan 430074China

Modern recommendation systems are widely used in modern data *** random and sparse embedding lookup operations are the main performance bottleneck for processing recommendation systems on traditional platforms as they induce abundant data movements between computing units and ***-based processing-in-memory(PIM)can resolve this problem by processing embedding vectors where they are ***,the embedding table can easily exceed the capacity limit of a monolithic ReRAM-based PIM chip,which induces off-chip accesses that may offset the PIM ***,we deploy the decomposed model on-chip and leverage the high computing efficiency of ReRAM to compensate for the decompression performance *** this paper,we propose ARCHER,a ReRAM-based PIM architecture that implements fully yon-chip recommendations under resource ***,we make a full analysis of the computation pattern and access pattern on the decomposed *** on the computation pattern,we unify the operations of each layer of the decomposed model in multiply-and-accumulate *** on the access observation,we propose a hierarchical mapping schema and a specialized hardware design to maximize resource *** the unified computation and mapping strategy,we can coordinatethe inter-processing elements *** evaluation shows that ARCHER outperforms the state-of-the-art GPU-based DLRM system,the state-of-the-art near-memory processing recommendation system RecNMP,and the ReRAM-based recommendation accelerator REREC by 15.79×,2.21×,and 1.21× in terms of performance and 56.06×,6.45×,and 1.71× in terms of energy savings,respectively.

关键词： recommendation system ReRAM processing-in-memory embedding layer

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：