检索结果-内蒙古大学图书馆

Fourier Prompt Tuning for Modality-Incomplete Scene Segmentation

学校读者我要写书评

暂无评论

Fourier Prompt Tuning for Modality-Incomplete Scene Segmenta...

IEEE Symposium on Intelligent Vehicle

作者： Ruiping Liu Jiaming Zhang Kunyu Peng Yufan Chen Ke Cao Junwei Zheng M. Saquib Sarfraz Kailun Yang Rainer Stiefelhagen Institute for Anthropomatics and Robotics Karlsruhe Institute of Technology Germany Mercedes-Benz Tech Innovation Germany School of Robotics Hunan University China National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University China

ISBN: (数字)9798350348811

ISBN: (纸本)9798350348828

Integrating information from multiple modalities enhances the robustness of scene perception systems in autonomous vehicles, providing a more comprehensive and reliable sensory framework. However, the modality incompleteness in multi-modal segmentation remains under-explored. In this work, we establish a task called Modality-Incomplete Scene Segmentation (MISS), which encompasses both system-level modality absence and sensor-level modality errors. To avoid the predominant modality reliance in multi-modal fusion, we introduce a Missing-aware Modal Switch (MMS) strategy to proactively manage missing modalities during training. Utilizing bit-level batch-wise sampling enhances the model’s performance in both complete and incomplete testing scenarios. Furthermore, we introduce the Fourier Prompt Tuning (FPT) method to incorporate representative spectral information into a limited number of learnable prompts that maintain robustness against all MISS scenarios. Akin to fine-tuning effects but with fewer tunable parameters (1.1%). Extensive experiments prove the efficacy of our proposed approach, showcasing an improvement of 5.84% mIoU over the prior state-of-the-art parameter-efficient methods in modality missing. The source code is publicly available at https://***/RuipingL/MISS.

关键词： Training Rain Source coding Semantic segmentation Semantics Switches Benchmark testing

Semi-supervised Cloud Edge Collaborative Power Transmission Line Insulator Anomaly Detection Framework 11th

学校读者我要写书评

暂无评论

Semi-supervised Cloud Edge Collaborative Power Transmission ...

11th International Conference on Image and Graphics, ICIG 2021

作者： Yang, Yanqing Mao, Jianxu Zhang, Hui Chen, Yurong Zhong, Hang Huang, Zhihong Wang, Yaonan National Engineering Laboratory of Robot Visual Perception and Control Technology Hunan University Hunan Changsha China State Grid Hunan Electric Power Corporation Limited Research Institute Hunan Changsha China

ISBN: (纸本)9783030873547

The widely deployed power transmission line expedites developing the age of electricity. Thus, it is necessary to maintain a power system with a great quantity of manpower and material resources, especially for crucial equipment, such as insulator string. However, the current main inspection method relies on artificial with the problem of time-consuming and labor-intensive. There is a trend of utilizing deep learning techniques on unmanned aerial vehicles (UAVs) to accomplish the inspection task, but its development is restricted by the limitation of energy. In this paper, we propose a semi-supervised cloud edge collaborative insulator string anomaly detection framework. Specifically, an anchor-free object detector is deployed on the edge device for locating the insulator. On the cloud side, we propose a generative insulator defect detection model based on the autoencoder (AE) with a generator-discriminator pattern. Particularly, we introduce the variational memory encoder-decoder architecture to model defect-free insulator data distribution. Furthermore, the adversarial strategy is employed to regularize the generated data space with input data space. In the end, the anomaly can be detected if its data space is an outlier of training defect-free distribution. Comprehensive experiments demonstrate that our method can effectively reduce the computational load, meanwhile archiving superior performance, including accuracy (0.968) and recall (0.985), for defect recognition using a standard insulator data set. © 2021, Springer Nature Switzerland AG.

关键词： Deep learning

Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Jiao, Jianbin Cheng, Xina Chen, Weijie Yin, Xiaoting Shi, Hao Yang, Kailun School of Artificial Intelligence Xidian University China State Key Laboratory of Extreme Photonics and Instrumentation Zhejiang University China School of Robotics Hunan University China National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University China

3D human pose estimation captures the human joint points in three-dimensional space while keeping the depth information and physical structure. That is essential for applications that require precise pose information, such as human-computer interaction, scene understanding, and rehabilitation training. Due to the challenges in data collection, mainstream datasets of 3D human pose estimation are primarily composed of multi-view video data collected in laboratory environments, which contains rich spatial-temporal correlation information besides the image frame content. Given the remarkable self-attention mechanism of transformers, capable of capturing the spatial-temporal correlation from multi-view video datasets, we propose a multi-stage framework for 3D sequence-to-sequence (seq2seq) human pose detection. Firstly, the spatial module represents the human pose feature by intra-image content, while the frame-image relation module extracts temporal relationships and 3D spatial positional relationship features between the multi-perspective images. Secondly, the self-attention mechanism is adopted to eliminate the interference from non-human body parts and reduce computing resources. Our method is evaluated on Human3.6M, a popular 3D human pose detection dataset. Experimental results demonstrate that our approach achieves state-of-the-art performance on this dataset. The source code will be available at https://***/WUJINHUAN/3D-human-pose. Copyright © 2024, The Authors. All rights reserved.

关键词： Human computer interaction

Towards Single-Lens controllable Depth-of-Field Imaging via Depth-Aware Point Spread Functions

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Qian, Xiaolong Jiang, Qi Gao, Yao Gao, Shaohua Yi, Zhonghua Sun, Lei Wei, Kai Li, Haifeng Yang, Kailun Wang, Kaiwei Bai, Jian State Key Laboratory of Extreme Photonics and Instrumentation Zhejiang University Hangzhou310027 China School of Robotics Hunan University Changsha410012 China National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University Changsha410082 China

controllable Depth-of-Field (DoF) imaging commonly produces amazing visual effects based on heavy and expensive high-end lenses. However, confronted with the increasing demand for mobile scenarios, it is desirable to achieve a lightweight solution with Minimalist Optical Systems (MOS). This work centers around two major limitations of MOS, i.e., the severe optical aberrations and uncontrollable DoF, for achieving single-lens controllable DoF imaging via computational methods. A Depth-aware controllable DoF Imaging (DCDI) framework is proposed equipped with All-in-Focus (AiF) aberration correction and monocular depth estimation, where the recovered image and corresponding depth map are utilized to produce imaging results under diverse DoFs of any high-end lens via patch-wise convolution. To address the depth-varying optical degradation, we introduce a Depth-aware Degradation-adaptive Training (DA2T) scheme. At the dataset level, a Depth-aware Aberration MOS (DAMOS) dataset is established based on the simulation of Point Spread Functions (PSFs) under different object distances. Additionally, we design two plug-and-play depth-aware mechanisms to embed depth information into the aberration image recovery for better tackling depth-aware degradation. Furthermore, we propose a storage-efficient Omni-Lens-Field model to represent the 4D PSF library of various lenses. With the predicted depth map, recovered image, and depth-aware PSF map inferred by Omni-Lens-Field, single-lens controllable DoF imaging is achieved. To the best of our knowledge, we are the first to explore the single-lens controllable DoF imaging solution. Comprehensive experimental results demonstrate that the proposed framework enhances the recovery performance, and attains impressive single-lens controllable DoF imaging results, providing a seminal baseline for this field. The source code and the established dataset will be publicly available at https://***/XiaolongQian/DCDI. Copyright © 2024, The A

关键词： Optical transfer function

Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers

学校读者我要写书评

暂无评论

Towards Precise 3D Human Pose Estimation with Multi-Perspect...

International Joint Conference on Neural Networks (IJCNN)

作者： Jianbin Jiao Xina Cheng Weijie Chen Xiaoting Yin Hao Shi Kailun Yang School of Artificial Intelligence Xidian University China State Key Laboratory of Extreme Photonics and Instrumentation Zhejiang University China School of Robotics Hunan University China National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University China

ISBN: (数字)9798350359312

ISBN: (纸本)9798350359329

3D human pose estimation captures the human joint points in three-dimensional space while keeping the depth information and physical structure. That is essential for applications that require precise pose information, such as humancomputer interaction, scene understanding, and rehabilitation training. Due to the challenges in data collection, mainstream datasets of 3D human pose estimation are primarily composed of multi-view video data collected in laboratory environments, which contains rich spatial-temporal correlation information besides the image frame content. Given the remarkable selfattention mechanism of transformers, capable of capturing the spatial-temporal correlation from multi-view video datasets, we propose a multi-stage framework for 3D sequence-to-sequence (seq2seq) human pose detection. Firstly, the spatial module represents the human pose feature by intra-image content, while the frame-image relation module extracts temporal relationships and 3D spatial positional relationship features between the multiperspective images. Secondly, the self-attention mechanism is adopted to eliminate the interference from non-human body parts and reduce computing resources. Our method is evaluated on Human3.6M, a popular 3D human pose detection dataset. Experimental results demonstrate that our approach achieves stateof-the-art performance on this dataset. The source code will be available at https://***/WUJINHUAN/3D-human-pose.

关键词： Training Three-dimensional displays Correlation Computational modeling Source coding Pose estimation Video sequences

Exploring Quasi-Global Solutions to Compound Lens Based Computational Imaging Systems

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Gao, Yao Jiang, Qi Gao, Shaohua Sun, Lei Yang, Kailun Wang, Kaiwei State Key Laboratory of Extreme Photonics and Instrumentation Zhejiang University Hangzhou310027 China School of Robotics Hunan University Changsha410012 China National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University Changsha410082 China

Recently, joint design approaches that simultaneously optimize optical systems and downstream algorithms through data-driven learning have demonstrated superior performance over traditional separate design approaches. However, current joint design approaches heavily rely on the manual identification of initial lenses, posing challenges and limitations, particularly for compound lens systems with multiple potential starting points. In this work, we present Quasi-Global Search Optics (QGSO) to automatically design compound lens based computational imaging systems through two parts: (i) Fused Optimization Method for Automatic Optical Design (OptiFusion), which searches for diverse initial optical systems under certain design specifications;and (ii) Efficient Physic-aware Joint Optimization (EPJO), which conducts parallel joint optimization of initial optical systems and image reconstruction networks with the consideration of physical constraints, culminating in the selection of the optimal solution in all search results. Extensive experimental results illustrate that QGSO serves as a transformative end-to-end lens design paradigm for superior global search ability, which automatically provides compound lens based computational imaging systems with higher imaging quality compared to existing paradigms. The source code will be made publicly available at https://***/LiGpy/QGSO. Copyright © 2024, The Authors. All rights reserved.

关键词： Imaging systems

Skeleton-Based Human Action Recognition with Noisy Labels

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Xu, Yi Peng, Kunyu Wen, Di Liu, Ruiping Zheng, Junwei Chen, Yufan Zhang, Jiaming Roitberg, Alina Yang, Kailun Stiefelhagen, Rainer Institute for Anthropomatics and Robotics Karlsruhe Institute of Technology Germany Institute for Artificial Intelligence University of Stuttgart Germany School of Robotics Hunan University China National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University China

Understanding human actions from body poses is critical for assistive robots sharing space with humans in order to make informed and safe decisions about the next interaction. However, precise temporal localization and annotation of activity sequences is time-consuming and the resulting labels are often noisy. If not effectively addressed, label noise negatively affects the model’s training, resulting in lower recognition quality. Despite its importance, addressing label noise for skeleton-based action recognition has been overlooked so far. In this study, we bridge this gap by implementing a framework that augments well-established skeleton-based human action recognition methods with label-denoising strategies from various research areas to serve as the initial benchmark. Observations reveal that these baselines yield only marginal performance when dealing with sparse skeleton data. Consequently, we introduce a novel methodology, NoiseEraSAR, which integrates global sample selection, co-teaching, and Cross-Modal Mixture-of-Experts (CM-MOE) strategies, aimed at mitigating the adverse impacts of label noise. Our proposed approach demonstrates better performance on the established benchmark, setting new state-of-the-art standards. The source code for this study is accessible at https://***/xuyizdby/NoiseEraSAR. Copyright © 2024, The Authors. All rights reserved.

关键词： Musculoskeletal system

TS-CGNet: Temporal-Spatial Fusion Meets Centerline-Guided Diffusion for BEV Mapping

学校读者我要写书评

暂无评论

arXiv 2025年

作者： Hong, Xinying Li, Siyu Zeng, Kang Shi, Hao Peng, Bomin Yang, Kailun Li, Zhiyong The College of Computer Science and Electronic Engineering Hunan University Changsha410082 China The School of Robotics The National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University Changsha410082 China The State Key Laboratory of Extreme Photonics and Instrumentation Zhejiang University Hangzhou310027 China

Bird’s Eye View (BEV) perception technology is crucial for autonomous driving, as it generates top-down 2D maps for environment perception, navigation, and decision-making. Nevertheless, the majority of current BEV map generation studies focusing on visual map generation lack depth-aware reasoning capabilities. They exhibit limited efficacy in managing occlusions and handling complex environments, with a notable decline in perceptual performance under adverse weather conditions or low-light scenarios. Therefore, this paper proposes TS-CGNet, which leverages Temporal-Spatial fusion with Centerline-Guided diffusion. This visual framework, grounded in prior knowledge, is designed for integration into any existing network for building BEV maps. Specifically, this framework is decoupled into three parts: Local mapping system involves the initial generation of semantic maps using purely visual information;The Temporal-Spatial Aligner Module (TSAM) integrates historical information into mapping generation by applying transformation matrices;The Centerline-Guided Diffusion Model (CGDM) is a prediction module based on the diffusion model. CGDM incorporates centerline information through spatial-attention mechanisms to enhance semantic segmentation reconstruction. We construct BEV semantic segmentation maps by our methods on the public nuScenes and the robustness benchmarks under various corruptions. Our method improves 1.90%, 1.73%, and 2.87% for perceived ranges of 60×30m, 120×60m, and 240×60m in the task of BEV HD mapping. TS-CGNet attains an improvement of 1.92% for perceived ranges of 100×100m in the task of BEV semantic mapping. Moreover, TS-CGNet achieves an average improvement of 2.92% in detection accuracy under varying weather conditions and sensor interferences in the perception range of 240×60m. The source code will be publicly available at https://***/krabs-H/TS-CGNet. Copyright © 2025, The Authors. All rights reserved.

关键词： Decision making

Improved YOLOv7 Based on Transformer for Object Detection in UAV-Captured Images

学校读者我要写书评

暂无评论

Improved YOLOv7 Based on Transformer for Object Detection in...

IEEE International Conference on Systems, Man and Cybernetics

作者： Yuefan Luo Qing Zhu Zhen Zhou Lin Chen Jiaming Zhou Tianjian Jiang Yijiang Li Danwei Wang Yaonan Wang National Engineering Research Center for Robot Visual Perception and Control Technology Hunan University Changsha Hunan China School of Electrical and Electrical Engineering Nanyang Technological University Singapore Rep. of Singapore

As the drone captures image targets at different flying altitudes, their scales may vary significantly, which can pose challenges for the object detection model to accurately detect them. Additionally, tiny objects in the image contain minimal information, making them difficult to distinguish from the background. To overcome these two challenges, we proposed a network architecture that aims to improve the accuracy of tiny object detection in drone images. Specially, we designed a tiny object detector(TOD) that can effectively extract features of tiny objects and distinguish between tiny object features and image background. Furthermore, this TOD module contains a Convolutional visual Attention Network (CVAN) to better focus on the regions of tiny objects. Experimental results demonstrate that the proposed method achieves mAP@.5 accuracy of 53.9% on the VisDrone2021-test-dev dataset and improves by 2.8 % compared to YOLOv7.

关键词：