检索结果-内蒙古大学图书馆

Multimodal Feature-Guided Pretraining for RGB-T Perception

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 2024年 17卷 16041-16050页

作者： Ouyang, Junlin Jin, Pengcheng Wang, Qingwang Kunming Univ Sci & Technol Fac Informat Engn & Automat Kunming 650500 Peoples R China

Wide-range multiscale object detection for multispectral scene perception from a drone perspective is challenging. Previous RGB-T perception methods directly use backbone pretrained on RGB for thermal infrared feature extraction, leading to unexpected domain shift. We propose a novel multimodal feature-guided masked reconstruction pretraining method, named M2FP, aimed at learning transferable representations for drone-based RGB-T environmental perception tasks without domain bias. This article includes two key innovations as follows. 1) We design a cross-modal feature interaction module in M2FP, which encourages modality-specific backbones to actively learn cross-modal feature representations and avoid modality bias issues. 2) We design a global-aware feature interaction and fusion module suitable for various downstream tasks, which enhances the model's environmental perception from a global perspective in wide-range drone-based scenes. We fine-tune M2FP on the drone-based object detection dataset (DroneVehicle) and semantic segmentation dataset (Kust4K). On these two tasks, compared to the second-best methods, M2FP achieves state-of-the-art performance, with an improvement of 1.8% in mean average precision and 0.9% in mean intersection over union, respectively.

关键词： Feature extraction Task analysis Image reconstruction Object detection Semantic segmentation Pipelines Visualization masked autoencoder multimodal object detection semantic segmentation unmanned aerial vehicle (UAV) remote sensing

来源：评论

学校读者我要写书评

暂无评论

Deep Closing: Enhancing Topological Connectivity in Medical Tubular Segmentation

引用

IEEE TRANSACTIONS ON MEDICAL IMAGING 2024年第11期43卷 3990-4003页

作者： Wu, Qian Chen, Yufei Liu, Wei Yue, Xiaodong Zhuang, Xiahai Tongji Univ Coll Elect & Informat Engn Shanghai 201804 Peoples R China Shanghai Univ Artificial Intelligence Inst Shanghai 200444 Peoples R China Fudan Univ Sch Data Sci Shanghai 200433 Peoples R China

Accurately segmenting tubular structures, such as blood vessels or nerves, holds significant clinical implications across various medical applications. However, existing methods often exhibit limitations in achieving satisfactory topological performance, particularly in terms of preserving connectivity. To address this challenge, we propose a novel deep-learning approach, termed Deep Closing, inspired by the well-established classic closing operation. Deep Closing first leverages an autoencoder trained in the masked Image Modeling (MIM) paradigm, enhanced with digital topology knowledge, to effectively learn the inherent shape prior of tubular structures and indicate potential disconnected regions. Subsequently, a Simple Components Erosion module is employed to generate topology-focused outcomes, which refines the preceding segmentation results, ensuring all the generated regions are topologically significant. To evaluate the efficacy of Deep Closing, we conduct comprehensive experiments on 4 datasets: DRIVE, CHASE_DB1, DCA1, and CREMI. The results demonstrate that our approach yields considerable improvements in topological performance compared with existing methods. Furthermore, Deep Closing exhibits the ability to generalize and transfer knowledge from external datasets, showcasing its robustness and adaptability. The code for this paper has been available at: https://***/5k5000/DeepClosing.

关键词： Topology Image segmentation Erosion Biomedical imaging Shape Optimization Image reconstruction Tubular structures topology preserving masked autoencoder deep closing

来源：评论

学校读者我要写书评

暂无评论

ViT-UperNet: a hybrid vision transformer with unified-perceptual-parsing network for medical image segmentation

引用

COMPLEX & INTELLIGENT SYSTEMS 2024年第3期10卷 3819-3831页

作者： Ruiping, Yang Kun, Liu Shaohua, Xu Jian, Yin Zhen, Zhang Shandong Univ Sci & Technol Coll Econ & Management Qianwangang Rd Qingdao 266590 Shandong Peoples R China Shandong Univ Sci & Technol Coll Comp Sci & Engn Qianwangang Rd Qingdao 266590 Shandong Peoples R China

The existing image semantic segmentation models have low accuracy in detecting tiny targets or multi-targets at overlapping regions. This work proposes a hybrid vision transformer with unified-perceptual-parsing network (ViT-UperNet) for medical image segmentation. A self-attention mechanism is embedded in a vision transformer to extract multi-level features. The image features are extracted hierarchically from low to high dimensions using 4 groups of Transformer blocks with different numbers. Then, it uses a unified-perceptual-parsing network based on a feature pyramid network (FPN) and a pyramid pooling module (PPM) for the fusion of multi-scale contextual features and semantic segmentation. FPN can naturally use hierarchical features, and generate strong semantic information on all scales. PPM can better use the global prior knowledge to understand complex scenes, and extract features with global context information to improve segmentation results. In the training process, a scalable self-supervised learner named masked autoencoder is used for pre-training, which strengthens the visual representation ability and improves the efficiency of the feature learning. Experiments are conducted on cardiac magnetic resonance image segmentation where the left and right atrium and ventricle are selected for segmentation. The pixels accuracy is 93.85%, the Dice coefficient is 92.61% and Hausdorff distance is 11.16, which are improved compared with the other methods. The results show the superiority of Vit-UperNet in medical images segmentation, especially for the low-recognition and serious-occlusion targets.

关键词： Medical image segmentation Deep learning Self-attention mechanism masked autoencoder

来源：评论

学校读者我要写书评

暂无评论

Slow feature-constrained decomposition autoencoder: Application to process anomaly detection and localization

引用

INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING 2024年

作者： Jia, Mingwei Jiang, Lingwei Hu, Junhao Liu, Yi Chen, Tao Zhejiang Univ Technol Inst Proc Equipment & Control Engn Hangzhou 310023 Peoples R China Univ Surrey Sch Chem & Chem Engn Guildford GU2 7XH Surrey England

Detecting anomalies in manufacturing processes is crucial for ensuring safety. However, noise significantly undermines the reliability of data-driven anomaly detection models. To address this challenge, we propose a slow feature-constrained decomposition autoencoder (SFC-DAE) for anomaly detection in noisy scenarios. Considering that the process can exhibit both long-term trends and periodic properties, the process data is decomposed into trends and cycles. The repetitive information is mitigated by slicing and randomly masking certain trends and cycles. Dependencies among slices are constructed to extract intrinsic information, while high-frequency noise is reduced using a slow feature-constrained loss. Anomalies are detected and localized through a reconstruction error strategy. The effectiveness of SFC-DAE is demonstrated using data from a sugar factory and a secure water treatment system.

关键词： anomaly detection masked autoencoder noisy data process industry time series decomposition

来源：评论

学校读者我要写书评

暂无评论

Image Retrieval Based on Vision Transformer and masked Learning

引用

Journal of Donghua University(English Edition) 2023年第5期40卷 539-547页

作者：李锋潘煌圣盛守祥王国栋 College of Computer Science and Technology Donghua UniversityShanghai 201620China Huafang Co. Ltd.Binzhou 256617China

Deep convolutional neural networks(DCNNs)are widely used in content-based image retrieval(CBIR)because of the advantages in image feature ***,the training of deep neural networks requires a large number of labeled data,which limits the ***-supervised learning is a more general approach in unlabeled scenarios.A method of fine-tuning feature extraction networks based on masked learning is *** autoencoders(MAE)are used in the fine-tune vision transformer(ViT)*** addition,the scheme of extracting image descriptors is *** encoder of the MAE uses the ViT to extract global features and performs self-supervised fine-tuning by reconstructing masked area *** method works well on category-level image retrieval datasets with marked improvements in instance-level *** the instance-level datasets Oxford5k and Paris6k,the retrieval accuracy of the base model is improved by 7%and 17%compared to that of the original model,respectively.

关键词： content-based image retrieval vision transformer masked autoencoder feature extraction

来源：评论

学校读者我要写书评

暂无评论

Self Supervised Temporal Ultrasound Reconstruction for Muscle Atrophy Evaluation 6th

Self Supervised Temporal Ultrasound Reconstruction for Muscl...

引用

6th Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

作者： Zhang, Yue Du, Getao Zhan, Yonghua Guo, Kaitai Zheng, Yang Guo, Jianzhong Chen, Xiaoping Liang, Jimin Xidian Univ Sch Elect Engn Xian 710071 Shaanxi Peoples R China Xidian Univ Sch Life Sci & Technol Xian 710071 Shaanxi Peoples R China Shanxi Normal Univ Sch Phys & Informat Technol Xian 710071 Shaanxi Peoples R China China Astronaut Res & Training Ctr Natl Key Lab Human Factors Engn Beijing Peoples R China

ISBN: (纸本)9789819985456;9789819985463

Muscle atrophy is a widespread disease that can reduce quality of life and increase morbidity and mortality. The development of noninvasive method to evaluate muscle atrophy is of great practical value. However, obtaining accurate criteria for the evaluation of muscle atrophy under non-invasive conditions is extremely difficult. This paper proposes a self-supervised temporal ultrasound reconstruction method based on masked autoencoder to explore the dynamic process of muscle atrophy. A score-position embedding is designed to realize the quantitative evaluation of muscle atrophy. Ultrasound images of the hind limb muscle of six macaque monkeys were acquired consecutively during 38 days of head-down bed rest experiments. Given an ultrasound image sequence, an asymmetric encoder-decoder structure is used to reconstruct the randomly masked images for the purpose of modelling the dynamic muscle atrophy process. We demonstrate the feasibility of using the position indicator as muscle atrophy score, which can be used to predict the degree of muscle atrophy. This study achieves the quantitative evaluation of muscle atrophy in the absence of accurate evaluation criteria for muscle atrophy.

关键词： masked autoencoder Muscle atrophy quantization Position embedding Ultrasound image

来源：评论

学校读者我要写书评

暂无评论

Community-Guided Contrastive Learning with Anomaly-Aware Reconstruction for Anomaly Detection on Attributed Networks 29th

Community-Guided Contrastive Learning with Anomaly-Aware Rec...

引用

29th International Conference on Database Systems for Advanced Applications (DASFAA)

作者： Wang, Yang Wang, Xinye He, Chengxin Chen, Xiaocong Luo, Zhaohang Duan, Lei Zuo, Jie Sichuan Univ Sch Comp Sci Chengdu Peoples R China Nucl Power Inst China Chengdu Peoples R China

ISBN: (纸本)9789819755745;9789819755752

Anomaly detection on attributed networks is of wide practical application in many domains, such as business and cybersecurity. Typically, existing methods mainly focus on utilizing the graph neural networks (GNNs) that aggregate information from neighbors to learn node representations for detecting anomalies. However, it may ignore the information beyond the neighbors like community associations. Furthermore, roughly stacking multiple GNNs layers may lead to the over-smoothing problem, making nodes representations more similar and anomalies undistinguishable. In this paper, we propose a novel method, named CARD, to tackle these issues. Specifically, we propose different augmentation strategies to offer diverse scale information for CARD. Then, to better capture community associations, we establish a community-guided contrastive learning module that can capture different scale of structure information as well. To capture multiple attribute information and aid in anomaly detection, we design an anomaly-aware masked autoencoder, effectively making anomalies more distinguished. Extensive experiments on nine datasets show the superiority of CARD. Our code are available at https://***/scu-kdde/OAM-CARD-2024.

关键词： Anomaly detection Attributed networks Community structure masked autoencoder Unsupervised learning

来源：评论

学校读者我要写书评

暂无评论

Graph Transformer for Recommendation 23

Graph Transformer for Recommendation

引用

46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)

作者： Li, Chaoliu Xia, Lianghao Ren, Xubin Ye, Yaowen Xu, Yong Huang, Chao South China Univ Technol Guangzhou Peoples R China Univ Hong Kong Hong Kong Peoples R China

ISBN: (纸本)9781450394086

This paper presents a novel approach to representation learning in recommender systems by integrating generative self-supervised learning with graph transformer architecture. We highlight the importance of high-quality data augmentation with relevant self-supervised pretext tasks for improving performance. Towards this end, we propose a newapproach that automates the self-supervision augmentation process through a rationale-aware generative SSL that distills informative user-item interaction patterns. The proposed recommender with Graph TransFormer (GFormer) that offers parameterized collaborative rationale discovery for selective augmentation while preserving global-aware user-item relationships. In GFormer, we allow the rationale-aware SSL to inspire graph collaborative filtering with task-adaptive invariant rationalization in graph transformer. The experimental results reveal that our GFormer has the capability to consistently improve the performance over baselines on different datasets. Several in-depth experiments further investigate the invariant rationale-aware augmentation from various aspects. The source code for this work is publicly available at: https://***/HKUDS/GFormer.

关键词： Recommendation Graph Transformer masked autoencoder

来源：评论

学校读者我要写书评

暂无评论

Self-Supervised Learning of 3D Point Clouds: A Survey 36

Self-Supervised Learning of 3D Point Clouds: A Survey

引用

36th Chinese Control and Decision Conference (CCDC)

作者： Guo, Qi Liu, Wei Zhao, Qixi Ning, Zuotao Cheng, Shuai Hu, Jun Northeastern Univ Coll Informat Sci & Engn Shenyang Peoples R China Neusoft Reach Automot Technol Ltd Shenyang Peoples R China

ISBN: (纸本)9798350387780;9798350387797

The use of 3D point cloud is widespread in robotics and autonomous driving systems. With the development of deep learning, an increasing number of models are being proposed to address tasks in 3D point cloud processing, including shape classification and 3D object detection. However, training these models requires large amounts of labeled data, which is expensive to obtain. Therefore, self-supervised learning methods for 3D point clouds have recently gained significant attention, which train the model with unlabeled data by designing pretext tasks. This paper aims to review these methods. Based on the pretext tasks designed, we divide those more than 30 existing methods into three categories: reconstruction-based, contrastive-based and MAE(masked autoencoder)-based methods. Then, we introduce the research motivations, implementations and characteristics of these methods one by one. Finally, two performance evaluation criteria are introduced and the performance of each selfsupervised learning method is assessed and analyzed based on these criteria.

关键词： Self-supervised learning Pre-train 3D Point clouds Contrastive learning masked autoencoder

来源：评论

学校读者我要写书评

暂无评论

MART: masked Affective RepresenTation Learning via masked Temporal Distribution Distillation

MART: Masked Affective RepresenTation Learning via Masked Te...

引用

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

作者： Zhang, Zhicheng Zhao, Pancheng Park, Eunil Yang, Jufeng Nankai Univ Coll Comp Sci VCIP & TMCC & DISSec Tianjin Peoples R China Nankai Int Adv Res Inst SHENZHEN FUTIAN Shenzhen Peoples R China Sungkyunkwan Univ Coll Comp Seoul South Korea

ISBN: (纸本)9798350353006

Limited training data is a long-standing problem for video emotion analysis (VEA). Existing works leverage the power of large-scale image datasets for transferring while failing to extract the temporal correlation of affective cues in the video. Inspired by psychology research and empirical theory, we verify that the degree of emotion may vary in different segments of the video, thus introducing the sentiment complementary and emotion intrinsic among temporal segments. We propose an MAE-style method for learning robust affective representation of videos via masking, termed MART. First, we extract the affective cues of the lexicon and verify the extracted one by computing its matching score with video content, in terms of sentiment and emotion scores alongside the temporal dimension. Then, with the verified cues, we propose masked affective modeling to recover temporal emotion distribution. We present temporal affective complementary learning that pulls the complementary part and pushes the intrinsic one of masked multimodal features, where the constraint is set with cross-modal attention among features to mask the video and recover the degree of emotion among segments. Extensive experiments on five benchmarks show the superiority of our method in video sentiment analysis, video emotion recognition, multimodal sentiment analysis, and multimodal emotion recognition.

关键词： masked autoencoder Video Emotion Analysis

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：