Accurately identifying small objects in high-resolution aerial images presents a complex and crucial task in thefield of small object detection on unmanned aerial vehicles(UAVs).This task is challenging due to variati...
详细信息
Accurately identifying small objects in high-resolution aerial images presents a complex and crucial task in thefield of small object detection on unmanned aerial vehicles(UAVs).This task is challenging due to variations inUAV flight altitude,differences in object scales,as well as factors like flight speed and motion *** enhancethe detection efficacy of small targets in drone aerial imagery,we propose an enhanced You Only Look Onceversion 7(YOLOv7)algorithm based on multi-scale spatial *** build the MSC-YOLO model,whichincorporates an additional prediction head,denoted as P2,to improve adaptability for small *** replaceconventional downsampling with a Spatial-to-Depth Convolutional Combination(CSPDC)module to mitigatethe loss of intricate feature details related to small ***,we propose a Spatial Context Pyramidwith Multi-Scale Attention(SCPMA)module,which captures spatial and channel-dependent features of smalltargets acrossmultiple *** module enhances the perception of spatial contextual features and the utilizationof multiscale feature *** the Visdrone2023 and UAVDT datasets,MSC-YOLO achieves remarkableresults,outperforming the baseline method YOLOv7 by 3.0%in terms ofmean average precision(mAP).The MSCYOLOalgorithm proposed in this paper has demonstrated satisfactory performance in detecting small targets inUAV aerial photography,providing strong support for practical applications.
In recent decades, recommendation systems are used in a variety of applications like social connections, movies, music, and venues. The existing algorithms has certain limitations like data sparsity, cold start proble...
详细信息
Unsupervised methods based on density representation have shown their abilities in anomaly detection,but detection performance still needs to be ***,approaches using normalizing flows can accurately evaluate sample di...
详细信息
Unsupervised methods based on density representation have shown their abilities in anomaly detection,but detection performance still needs to be ***,approaches using normalizing flows can accurately evaluate sample distributions,mapping normal features to the normal distribution and anomalous features outside ***,this paper proposes a Normalizing Flow-based Bidirectional Mapping Residual Network(NF-BMR).It utilizes pre-trained Convolutional Neural Networks(CNN)and normalizing flows to construct discriminative source and target domain feature ***,to better learn feature information in both domain spaces,we propose the Bidirectional Mapping Residual Network(BMR),which maps sample features to these two spaces for anomaly *** two detection spaces effectively complement each other’s deficiencies and provide a comprehensive feature evaluation from two perspectives,which leads to the improvement of detection *** experimental results on the MVTec AD and DAGM datasets against the Bidirectional Pre-trained Feature Mapping Network(B-PFM)and other state-of-the-art methods demonstrate that the proposed approach achieves superior *** the MVTec AD dataset,NF-BMR achieves an average AUROC of 98.7%for all 15 ***,it achieves 100%optimal detection performance in five *** the DAGM dataset,the average AUROC across ten categories is 98.7%,which is very close to supervised methods.
The video grounding(VG) task aims to locate the queried action or event in an untrimmed video based on rich linguistic descriptions. Existing proposal-free methods are trapped in the complex interaction between video ...
详细信息
The video grounding(VG) task aims to locate the queried action or event in an untrimmed video based on rich linguistic descriptions. Existing proposal-free methods are trapped in the complex interaction between video and query, overemphasizing cross-modal feature fusion and feature correlation for VG. In this paper, we propose a novel boundary regression paradigm that performs regression token learning in a transformer. Particularly, we present a simple but effective proposal-free framework, namely video grounding transformer(ViGT), which predicts the temporal boundary using a learnable regression token rather than multi-modal or cross-modal features. In ViGT, the benefits of a learnable token are manifested as follows.(1) The token is unrelated to the video or the query and avoids data bias toward the original video and query.(2) The token simultaneously performs global context aggregation from video and query ***, we employed a sharing feature encoder to project both video and query into a joint feature space before performing cross-modal co-attention(i.e., video-to-query attention and query-to-video attention) to highlight discriminative features in each modality. Furthermore, we concatenated a learnable regression token [REG] with the video and query features as the input of a vision-language transformer. Finally, we utilized the token [REG] to predict the target moment and visual features to constrain the foreground and background probabilities at each timestamp. The proposed ViGT performed well on three public datasets:ANet-Captions, TACoS, and YouCookⅡ. Extensive ablation studies and qualitative analysis further validated the interpretability of ViGT.
Pixel-level structure segmentations have attracted considerable attention,playing a crucial role in autonomous driving within the metaverse and enhancing comprehension in light field-based machine ***,current light fi...
详细信息
Pixel-level structure segmentations have attracted considerable attention,playing a crucial role in autonomous driving within the metaverse and enhancing comprehension in light field-based machine ***,current light field modeling methods fail to integrate appearance and geometric structural information into a coherent semantic space,thereby limiting the capability of light field transmission for visual *** this paper,we propose a general light field modeling method for pixel-level structure segmentation,comprising a generative light field prompting encoder(LF-GPE)and a prompt-based masked light field pretraining(LF-PMP)*** LF-GPE,serving as a light field backbone,can extract both appearance and geometric structural cues *** aligns these features into a unified visual space,facilitating semantic ***,our LF-PMP,during the pretraining phase,integrates a mixed light field and a multi-view light field *** prioritizes considering the geometric structural properties of the light field,enabling the light field backbone to accumulate a wealth of prior *** evaluate our pretrained LF-GPE on two downstream tasks:light field salient object detection and semantic *** results demonstrate that LF-GPE can effectively learn high-quality light field features and achieve highly competitive performance in pixel-level segmentation tasks.
The recent development of channel technology has promised to reduce the transaction verification time in blockchain *** transactions are transmitted through the channels created by nodes,the nodes need to cooperate wi...
详细信息
The recent development of channel technology has promised to reduce the transaction verification time in blockchain *** transactions are transmitted through the channels created by nodes,the nodes need to cooperate with each *** one party refuses to do so,the channel is unstable.A stable channel is thus *** nodes may show uncooperative behavior,they may have a negative impact on the stability of such *** order to address this issue,this work proposes a dynamic evolutionary game model based on node *** model considers various defense strategies'cost and attack success ratio under *** can dynamically adjust their strategies according to the behavior of attackers to achieve their effective *** equilibrium stability of the proposed model can be *** proposed model can be applied to general channel *** is compared with two state-of-the-art blockchain channels:Lightning network and Spirit *** experimental results show that the proposed model can be used to improve a channel's stability and keep it in a good cooperative stable *** its use enables a blockchain to enjoy higher transaction success ratio and lower transaction transmission delay than the use of its two peers.
Video-text cross-modal retrieval is widely studied to improve retrieval accuracy. However, the security of video-text cross-modal retrieval models receives little attention. If attackers exploit the security vulnerabi...
详细信息
Grasping is an essential and challenging skill for robots to interact with their environment. Accurately obtaining the 6D pose information of the grasped object is the key to successful grasping. Due to the influence ...
详细信息
Convolutional neural networks (CNNs) and self-attention (SA) have demonstrated remarkable success in low-level vision tasks, such as image super-resolution, deraining, and dehazing. The former excels in acquiring loca...
详细信息
Modern recommendation systems are widely used in modern data *** random and sparse embedding lookup operations are the main performance bottleneck for processing recommendation systems on traditional platforms as they...
详细信息
Modern recommendation systems are widely used in modern data *** random and sparse embedding lookup operations are the main performance bottleneck for processing recommendation systems on traditional platforms as they induce abundant data movements between computing units and ***-based processing-in-memory(PIM)can resolve this problem by processing embedding vectors where they are ***,the embedding table can easily exceed the capacity limit of a monolithic ReRAM-based PIM chip,which induces off-chip accesses that may offset the PIM ***,we deploy the decomposed model on-chip and leverage the high computing efficiency of ReRAM to compensate for the decompression performance *** this paper,we propose ARCHER,a ReRAM-based PIM architecture that implements fully yon-chip recommendations under resource ***,we make a full analysis of the computation pattern and access pattern on the decomposed *** on the computation pattern,we unify the operations of each layer of the decomposed model in multiply-and-accumulate *** on the access observation,we propose a hierarchical mapping schema and a specialized hardware design to maximize resource *** the unified computation and mapping strategy,we can coordinatethe inter-processing elements *** evaluation shows that ARCHER outperforms the state-of-the-art GPU-based DLRM system,the state-of-the-art near-memory processing recommendation system RecNMP,and the ReRAM-based recommendation accelerator REREC by 15.79×,2.21×,and 1.21× in terms of performance and 56.06×,6.45×,and 1.71× in terms of energy savings,respectively.
暂无评论