检索结果-内蒙古大学图书馆

arXiv 2024年

作者： Qian, Xiaolong Jiang, Qi Gao, Yao Gao, Shaohua Yi, Zhonghua Sun, Lei Wei, Kai Li, Haifeng Yang, Kailun Wang, Kaiwei Bai, Jian State Key Laboratory of Extreme Photonics and Instrumentation Zhejiang University Hangzhou310027 China School of Robotics Hunan University Changsha410012 China National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University Changsha410082 China

controllable Depth-of-Field (DoF) imaging commonly produces amazing visual effects based on heavy and expensive high-end lenses. However, confronted with the increasing demand for mobile scenarios, it is desirable to achieve a lightweight solution with Minimalist Optical Systems (MOS). This work centers around two major limitations of MOS, i.e., the severe optical aberrations and uncontrollable DoF, for achieving single-lens controllable DoF imaging via computational methods. A Depth-aware controllable DoF Imaging (DCDI) framework is proposed equipped with All-in-Focus (AiF) aberration correction and monocular depth estimation, where the recovered image and corresponding depth map are utilized to produce imaging results under diverse DoFs of any high-end lens via patch-wise convolution. To address the depth-varying optical degradation, we introduce a Depth-aware Degradation-adaptive Training (DA2T) scheme. At the dataset level, a Depth-aware Aberration MOS (DAMOS) dataset is established based on the simulation of Point Spread Functions (PSFs) under different object distances. Additionally, we design two plug-and-play depth-aware mechanisms to embed depth information into the aberration image recovery for better tackling depth-aware degradation. Furthermore, we propose a storage-efficient Omni-Lens-Field model to represent the 4D PSF library of various lenses. With the predicted depth map, recovered image, and depth-aware PSF map inferred by Omni-Lens-Field, single-lens controllable DoF imaging is achieved. To the best of our knowledge, we are the first to explore the single-lens controllable DoF imaging solution. Comprehensive experimental results demonstrate that the proposed framework enhances the recovery performance, and attains impressive single-lens controllable DoF imaging results, providing a seminal baseline for this field. The source code and the established dataset will be publicly available at https://***/XiaolongQian/DCDI. Copyright © 2024, The A

关键词： Optical transfer function

来源：评论

学校读者我要写书评

暂无评论

Materobot: Material Recognition in Wearable robotics for People with visual Impairments

MateRobot: Material Recognition in Wearable Robotics for Peo...

引用

IEEE International Conference on robotics and Automation (ICRA)

作者： Junwei Zheng Jiaming Zhang Kailun Yang Kunyu Peng Rainer Stiefelhagen Institute for Robotics and Anthropomatics Karlsruhe Institute of Technology Karlsruhe Germany Department of Engineering Science University of Oxford UK School of Robotics and the National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University Changsha China

ISBN: (数字)9798350384574

ISBN: (纸本)9798350384581

People with visual Impairments (PVI) typically recognize objects through haptic perception. Knowing objects and materials before touching is desired by the target users but under-explored in the field of human-centered robotics. To fill this gap, in this work, a wearable vision-based robotic system, MATErobot, is established for PVI to recognize materials and object categories beforehand. To address the computational constraints of mobile platforms, we propose a lightweight yet accurate model MATEViT to perform pixel-wise semantic segmentation, simultaneously recognizing both objects and materials. Our methods achieve respective 40.2% and 51.1% of mIoU on COCOStuff-10K and DMS datasets, surpassing the previous method with +5.7% and +7.0% gains. Moreover, on the field test with participants, our wearable system reaches a score of 28 in the NASA-Task Load Index, indicating low cognitive demands and ease of use. Our MATErobot demonstrates the feasibility of recognizing material property through visual cues and offers a promising step towards improving the functionality of wearable robots for PVI. The source code has been made publicly available at MATErobot.

关键词： visualization Source coding Semantic segmentation visual impairment Semantics Wearable robots Mobile applications

来源：评论

学校读者我要写书评

暂无评论

Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers

arXiv

引用

arXiv 2024年

作者： Jiao, Jianbin Cheng, Xina Chen, Weijie Yin, Xiaoting Shi, Hao Yang, Kailun School of Artificial Intelligence Xidian University China State Key Laboratory of Extreme Photonics and Instrumentation Zhejiang University China School of Robotics Hunan University China National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University China

3D human pose estimation captures the human joint points in three-dimensional space while keeping the depth information and physical structure. That is essential for applications that require precise pose information, such as human-computer interaction, scene understanding, and rehabilitation training. Due to the challenges in data collection, mainstream datasets of 3D human pose estimation are primarily composed of multi-view video data collected in laboratory environments, which contains rich spatial-temporal correlation information besides the image frame content. Given the remarkable self-attention mechanism of transformers, capable of capturing the spatial-temporal correlation from multi-view video datasets, we propose a multi-stage framework for 3D sequence-to-sequence (seq2seq) human pose detection. Firstly, the spatial module represents the human pose feature by intra-image content, while the frame-image relation module extracts temporal relationships and 3D spatial positional relationship features between the multi-perspective images. Secondly, the self-attention mechanism is adopted to eliminate the interference from non-human body parts and reduce computing resources. Our method is evaluated on Human3.6M, a popular 3D human pose detection dataset. Experimental results demonstrate that our approach achieves state-of-the-art performance on this dataset. The source code will be available at https://***/WUJINHUAN/3D-human-pose. Copyright © 2024, The Authors. All rights reserved.

关键词： Human computer interaction

来源：评论

学校读者我要写书评

暂无评论

Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers

Towards Precise 3D Human Pose Estimation with Multi-Perspect...

引用

International Joint Conference on Neural Networks (IJCNN)

作者： Jianbin Jiao Xina Cheng Weijie Chen Xiaoting Yin Hao Shi Kailun Yang School of Artificial Intelligence Xidian University China State Key Laboratory of Extreme Photonics and Instrumentation Zhejiang University China School of Robotics Hunan University China National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University China

ISBN: (数字)9798350359312

ISBN: (纸本)9798350359329

3D human pose estimation captures the human joint points in three-dimensional space while keeping the depth information and physical structure. That is essential for applications that require precise pose information, such as humancomputer interaction, scene understanding, and rehabilitation training. Due to the challenges in data collection, mainstream datasets of 3D human pose estimation are primarily composed of multi-view video data collected in laboratory environments, which contains rich spatial-temporal correlation information besides the image frame content. Given the remarkable selfattention mechanism of transformers, capable of capturing the spatial-temporal correlation from multi-view video datasets, we propose a multi-stage framework for 3D sequence-to-sequence (seq2seq) human pose detection. Firstly, the spatial module represents the human pose feature by intra-image content, while the frame-image relation module extracts temporal relationships and 3D spatial positional relationship features between the multiperspective images. Secondly, the self-attention mechanism is adopted to eliminate the interference from non-human body parts and reduce computing resources. Our method is evaluated on Human3.6M, a popular 3D human pose detection dataset. Experimental results demonstrate that our approach achieves stateof-the-art performance on this dataset. The source code will be available at https://***/WUJINHUAN/3D-human-pose.

关键词： Training Three-dimensional displays Correlation Computational modeling Source coding Pose estimation Video sequences

来源：评论

学校读者我要写书评

暂无评论

Exploring Quasi-Global Solutions to Compound Lens Based Computational Imaging Systems

arXiv

引用

arXiv 2024年

作者： Gao, Yao Jiang, Qi Gao, Shaohua Sun, Lei Yang, Kailun Wang, Kaiwei State Key Laboratory of Extreme Photonics and Instrumentation Zhejiang University Hangzhou310027 China School of Robotics Hunan University Changsha410012 China National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University Changsha410082 China

Recently, joint design approaches that simultaneously optimize optical systems and downstream algorithms through data-driven learning have demonstrated superior performance over traditional separate design approaches. However, current joint design approaches heavily rely on the manual identification of initial lenses, posing challenges and limitations, particularly for compound lens systems with multiple potential starting points. In this work, we present Quasi-Global Search Optics (QGSO) to automatically design compound lens based computational imaging systems through two parts: (i) Fused Optimization Method for Automatic Optical Design (OptiFusion), which searches for diverse initial optical systems under certain design specifications;and (ii) Efficient Physic-aware Joint Optimization (EPJO), which conducts parallel joint optimization of initial optical systems and image reconstruction networks with the consideration of physical constraints, culminating in the selection of the optimal solution in all search results. Extensive experimental results illustrate that QGSO serves as a transformative end-to-end lens design paradigm for superior global search ability, which automatically provides compound lens based computational imaging systems with higher imaging quality compared to existing paradigms. The source code will be made publicly available at https://***/LiGpy/QGSO. Copyright © 2024, The Authors. All rights reserved.

关键词： Imaging systems

来源：评论

学校读者我要写书评

暂无评论

PVPUFormer: Probabilistic visual Prompt Unified Transformer for Interactive Image Segmentation

arXiv

引用

arXiv 2023年

作者： Zhang, Xu Yang, Kailun Lin, Jiacheng Yuan, Jin Li, Zhiyong Li, Shutao College of Computer Science and Electronic Engineering Hunan University Changsha410082 China School of Robotics The National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University Changsha410082 China College of Electrical and Information Engineering The Key Laboratory of Visual Perception and Artificial Intelligence of Hunan Province Hunan University Changsha410082 China

Integration of diverse visual prompts like clicks, scribbles, and boxes in interactive image segmentation significantly facilitates users’ interaction as well as improves interaction efficiency. However, existing studies primarily encode the position or pixel regions of prompts without considering the contextual areas around them, resulting in insufficient prompt feedback, which is not conducive to performance acceleration. To tackle this problem, this paper proposes a simple yet effective Probabilistic visual Prompt Unified Transformer (PVPUFormer) for interactive image segmentation, which allows users to flexibly input diverse visual prompts with the probabilistic prompt encoding and feature post-processing to excavate sufficient and robust prompt features for performance boosting. Specifically, we first propose a Probabilistic Prompt-unified Encoder (PPuE) to generate a unified one-dimensional vector by exploring both prompt and non-prompt contextual information, offering richer feedback cues to accelerate performance improvement. On this basis, we further present a Prompt-to-Pixel Contrastive (P2C) loss to accurately align both prompt and pixel features, bridging the representation gap between them to offer consistent feature representations for mask prediction. Moreover, our approach designs a Dual-cross Merging Attention (DMA) module to implement bidirectional feature interaction between image and prompt features, generating notable features for performance improvement. A comprehensive variety of experiments on several challenging datasets demonstrates that the proposed components achieve consistent improvements, yielding state-of-the-art interactive segmentation performance. Our code is available at https://***/XuZhang1211/PVPUFormer. Copyright © 2023, The Authors. All rights reserved.

关键词： Pixels

来源：评论

学校读者我要写书评

暂无评论

TS-CGNet: Temporal-Spatial Fusion Meets Centerline-Guided Diffusion for BEV Mapping

arXiv

引用

arXiv 2025年

作者： Hong, Xinying Li, Siyu Zeng, Kang Shi, Hao Peng, Bomin Yang, Kailun Li, Zhiyong The College of Computer Science and Electronic Engineering Hunan University Changsha410082 China The School of Robotics The National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University Changsha410082 China The State Key Laboratory of Extreme Photonics and Instrumentation Zhejiang University Hangzhou310027 China

Bird’s Eye View (BEV) perception technology is crucial for autonomous driving, as it generates top-down 2D maps for environment perception, navigation, and decision-making. Nevertheless, the majority of current BEV map generation studies focusing on visual map generation lack depth-aware reasoning capabilities. They exhibit limited efficacy in managing occlusions and handling complex environments, with a notable decline in perceptual performance under adverse weather conditions or low-light scenarios. Therefore, this paper proposes TS-CGNet, which leverages Temporal-Spatial fusion with Centerline-Guided diffusion. This visual framework, grounded in prior knowledge, is designed for integration into any existing network for building BEV maps. Specifically, this framework is decoupled into three parts: Local mapping system involves the initial generation of semantic maps using purely visual information;The Temporal-Spatial Aligner Module (TSAM) integrates historical information into mapping generation by applying transformation matrices;The Centerline-Guided Diffusion Model (CGDM) is a prediction module based on the diffusion model. CGDM incorporates centerline information through spatial-attention mechanisms to enhance semantic segmentation reconstruction. We construct BEV semantic segmentation maps by our methods on the public nuScenes and the robustness benchmarks under various corruptions. Our method improves 1.90%, 1.73%, and 2.87% for perceived ranges of 60×30m, 120×60m, and 240×60m in the task of BEV HD mapping. TS-CGNet attains an improvement of 1.92% for perceived ranges of 100×100m in the task of BEV semantic mapping. Moreover, TS-CGNet achieves an average improvement of 2.92% in detection accuracy under varying weather conditions and sensor interferences in the perception range of 240×60m. The source code will be publicly available at https://***/krabs-H/TS-CGNet. Copyright © 2025, The Authors. All rights reserved.

关键词： Decision making

来源：评论

学校读者我要写书评

暂无评论

Learning to Learn Transferable Generative Attack for Person Re-Identification

arXiv

引用

arXiv 2024年

作者： Bian, Yuan Liu, Min Wang, Xueping Ma, Yunfeng Wang, Yaonan The College of Electrical and Information Engineering Hunan University National Engineering Research Center of Robot Visual Perception and Control Technology Hunan Changsha China The College of Information Science and Engineering Hunan Normal University Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing Hunan Changsha China

Deep learning-based person re-identification (re-id) models are widely employed in surveillance systems and inevitably inherit the vulnerability of deep networks to adversarial attacks. Existing attacks merely consider cross-dataset and cross-model transferability, ignoring the cross-test capability to perturb models trained in different domains. To powerfully examine the robustness of real-world re-id models, the Meta Transferable Generative Attack (MTGA) method is proposed, which adopts meta-learning optimization to promote the generative attacker producing highly transferable adversarial examples by learning comprehensively simulated transfer-based cross-model&dataset&test black-box meta attack tasks. Specifically, cross-model&dataset black-box attack tasks are first mimicked by selecting different re-id models and datasets for meta-train and meta-test attack processes. As different models may focus on different feature regions, the Perturbation Random Erasing module is further devised to prevent the attacker from learning to only corrupt model-specific features. To boost the attacker learning to possess cross-test transferability, the Normalization Mix strategy is introduced to imitate diverse feature embedding spaces by mixing multi-domain statistics of target models. Extensive experiments show the superiority of MTGA, especially in cross-model&dataset and cross-model&dataset&test attacks, our MTGA outperforms the SOTA methods by 21.5% and 11.3% on mean mAP drop rate, respectively. The code of MTGA will be released after the paper is accepted. Copyright © 2024, The Authors. All rights reserved.

关键词： Generative adversarial networks

来源：评论

学校读者我要写书评

暂无评论

CT-UIO: Continuous-Time UWB-Inertial-Odometer Localization Using Non-Uniform B-spline with Fewer Anchors

arXiv

引用

arXiv 2025年

作者： Sun, Jian Sun, Wei Zhang, Genwei Yang, Kailun Li, Song Meng, Xiangqi Deng, Na Tan, Chongbin National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University Changsha410012 China College of Electrical and Information Engineering Hunan University China School of Robotics Hunan University Changsha410012 China State Key Laboratory of NBC Protection for Civilian Beijing102205 China

Ultra-wideband (UWB) based positioning with fewer anchors has attracted significant research interest in recent years, especially under energy-constrained conditions. However, most existing methods rely on discrete-time representations and smoothness priors to infer a robot’s motion states, which often struggle with ensuring multi-sensor data synchronization. In this paper, we present an efficient UWB-Inertial-odometer localization system, utilizing a non-uniform B-spline framework with fewer anchors. Unlike traditional uniform B-spline-based continuous-time methods, we introduce an adaptive knot-span adjustment strategy for non-uniform continuous-time trajectory representation. This is accomplished by adjusting control points dynamically based on movement speed. To enable efficient fusion of IMU and odometer data, we propose an improved Extended Kalman Filter (EKF) with innovation-based adaptive estimation to provide short-term accurate motion prior. Furthermore, to address the challenge of achieving a fully observable UWB localization system under few-anchor conditions, the Virtual Anchor (VA) generation method based on multiple hypotheses is proposed. At the backend, we propose a CT-UIO factor graph with an adaptive sliding window for global trajectory estimation. Comprehensive experiments conducted on corridor and exhibition hall datasets validate the proposed system’s high precision and robust performance. The codebase and datasets of this work will be open-sourced at https://***/JasonSun623/CT-UIO. Copyright © 2025, The Authors. All rights reserved.

关键词： Continuous time systems

来源：评论

学校读者我要写书评

暂无评论

HNSRRT*:A Path Planning Algorithm Based On Heuristic Non-Uniform Sampling Method In Complex Obstacle Environment

HNSRRT*:A Path Planning Algorithm Based On Heuristic Non-Uni...

引用

Chinese control Conference (CCC)

作者： Zhiwen Xu Hui Zhang Bo Chen Xidong Zhou Songtao Yin Lian Yang College of Electrical and Information Engineering Changsha University of Science and Technology Hunan China College of Robotics and Robot Visual Perception and Control Technology National Engineering Research Center Hunan University Hunan China

The traditional sampling-based algorithm such as Rapidly Random-exploring Tree (RRT) and various varieties have achieved tremendous success in the area of path planning. However, their excessive exploration in the state space leads to long time to find the optimal solution, large memory usage and cannot guarantee the quality of the planned path(generally evaluated by the cost of search time and the length of path) in sophisticated space. In this article, we propose an optimal path planning algorithm based on heuristic non-uniform sampling, namely the HNSRRT*, which successfully plans path in complex obstacle environments with optimal length and minimum time cost. The HNSRRT* utilizes heuristic function to generate non-uniform sampling distribution by Gaussian distribution,and constraints on sampling points can reduce the time wasted and path length increase caused by excessive exploration. We test the proposed HNSRRT* in 2D and 3D complex obstacle environment,comparing it with the three traditional sampling-base algorithms. The simulation results indicated that the effectiveness and efficiency of HNSRRT* and have an obvious improvement in term of time cost, path length compared with the existing algorithms.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：