检索结果-内蒙古大学图书馆

arXiv 2024年

作者： Jiang, Qi Yi, Zhonghua Gao, Shaohua Gao, Yao Qian, Xiaolong Shi, Hao Sun, Lei Niu, JinXing Wang, Kaiwei Yang, Kailun Bai, Jian State Key Laboratory of Extreme Photonics and Instrumentation College of Optical Science and Engineering Zhejiang University Hangzhou310027 China National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University Changsha410082 China Intelligent Optics & Photonics Research Center Jiaxing Research Institute Zhejiang University Jiaxing314031 China School of Mechanical Engineering North China University of Water Resources and Electric Power Zhenzhou450045 China

Relying on paired synthetic data, existing learning-based Computational Aberration Correction (CAC) methods are confronted with the intricate and multifaceted synthetic-to-real domain gap, which leads to suboptimal performance in real-world applications. In this paper, in contrast to improving the simulation pipeline, we deliver a novel insight into real-world CAC from the perspective of Unsupervised Domain Adaptation (UDA). By incorporating readily accessible unpaired real-world data into training, we formalize the Domain Adaptive CAC (DACAC) task, and then introduce a comprehensive Real-world aberrated images (Realab) dataset to benchmark it. The setup task presents a formidable challenge due to the intricacy of understanding the target optical degradation domain. To this intent, we propose a novel Quantized Domain-Mixing Representation (QDMR) framework as a potent solution to the issue. Centering around representing and quantizing the optical degradation which is consistent across different images, QDMR adapts the CAC model to the target domain from three key aspects: (1) reconstructing aberrated images of both domains by a VQGAN to learn a Domain-Mixing Codebook (DMC) characterizing the optical degradation;(2) modulating the deep features in CAC model with DMC to transfer the target domain knowledge;and (3) leveraging the trained VQGAN to generate pseudo target aberrated images from the source ones for convincing target domain supervision. Extensive experiments on both synthetic and real-world benchmarks reveal that the models with QDMR consistently surpass the competitive methods in mitigating the synthetic-to-real gap, which produces visually pleasant real-world CAC results with fewer artifacts. Codes and datasets are made publicly available at https://***/zju-jiangqi/QDMR. Copyright © 2024, The Authors. All rights reserved.

关键词： Vector quantization

来源：评论

学校读者我要写书评

暂无评论

Offboard Occupancy Refinement with Hybrid Propagation for Autonomous Driving

arXiv

引用

arXiv 2024年

作者： Shi, Hao Wang, Song Zhang, Jiaming Yin, Xiaoting Wang, Zhongdao Wang, Guangming Zhu, Jianke Yang, Kailun Wang, Kaiwei The State Key Laboratory of Extreme Photonics and Instrumentation The National Engineering Research Center of Optical Instrumentation Zhejiang University Hangzhou310027 China The School of Robotics The National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University Changsha410082 China The College of Computer Science and Technology Zhejiang University Hangzhou310027 China The Institute for Anthropomatics and Robotics Karlsruhe Institute of Technology Karlsruhe76131 Germany Huawei Noah’s Ark Lab. Canada The Department of Engineering University of Cambridge CambridgeCB2 1PZ United Kingdom

vision-based occupancy prediction, also known as 3D Semantic Scene Completion (SSC), presents a significant challenge in computer vision. Previous methods, confined to onboard processing, struggle with simultaneous geometric and semantic estimation, continuity across varying viewpoints, and single-view occlusion. Our paper introduces OccFiner, a novel offboard framework designed to enhance the accuracy of vision-based occupancy predictions. OccFiner operates in two hybrid phases: 1) a multi-to-multi local propagation network that implicitly aligns and processes multiple local frames for correcting onboard model errors and consistently enhancing occupancy accuracy across all distances. 2) the region-centric global propagation, focuses on refining labels using explicit multi-view geometry and integrating sensor bias, especially to increase the accuracy of distant occupied voxels. Extensive experiments demonstrate that OccFiner improves both geometric and semantic accuracy across various types of coarse occupancy, setting a new state-of-the-art performance on the SemanticKITTI dataset. Notably, OccFiner elevates vision-based SSC models to a level even surpassing that of LiDAR-based onboard SSC models. Furthermore, OccFiner is the first to achieve automatic annotation of SSC in a purely vision-based approach. Quantitative experiments prove that OccFiner successfully facilitates occupancy data loop-closure in autonomous driving. Additionally, we quantitatively and qualitatively validate the superiority of the offboard approach on city-level SSC static maps. The source code will be made publicly available at https://***/MasterHow/OccFiner. Copyright © 2024, The Authors. All rights reserved.

关键词： Semantic Segmentation

来源：评论

学校读者我要写书评

暂无评论

GenMapping: Unleashing the Potential of Inverse Perspective Mapping for Robust Online HD Map Construction

arXiv

引用

arXiv 2024年

作者： Li, Siyu Yang, Kailun Shi, Hao Wang, Song Yao, You Li, Zhiyong The School of Robotics The National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University Changsha410082 China The State Key Laboratory of Extreme Photonics and Instrumentation Zhejiang University Hangzhou310027 China The College of Computer Science Zhejiang University Hangzhou310027 China Shanghai Supremind Technology Company Ltd. Shanghai201210 China The USC Viterbi School of Engineering The University of Southern California Los AngelesCA90089 United States

Online High-Definition (HD) maps have emerged as the preferred option for autonomous driving, overshadowing the counterpart offline HD maps due to flexible update capability and lower maintenance costs. However, contemporary online HD map models embed parameters of visual sensors into training, resulting in a significant decrease in generalization performance when applied to visual sensors with different parameters. Inspired by the inherent potential of Inverse Perspective Mapping (IPM), where camera parameters are decoupled from the training process, we have designed a universal map generation framework, GenMapping. The framework is established with a triadic synergy architecture, including principal and dual auxiliary branches. When faced with a coarse road image with local distortion translated via IPM, the principal branch learns robust global features under the state space models. The two auxiliary branches are a dense perspective branch and a sparse prior branch. The former exploits the correlation information between static and moving objects, whereas the latter introduces the prior knowledge of OpenStreetMap (OSM). The triple-enhanced merging module is crafted to synergistically integrate the unique spatial features from all three branches. To further improve generalization capabilities, a Cross-View Map Learning (CVML) scheme is leveraged to realize joint learning within the common space. Additionally, a Bidirectional Data Augmentation (BiDA) module is introduced to mitigate reliance on datasets concurrently. A thorough array of experimental results shows that the proposed model surpasses current state-of-the-art methods in both semantic mapping and vectorized mapping, while also maintaining a rapid inference speed. Moreover, in cross-dataset experiments, the generalization of semantic mapping is improved by 17.3% in mIoU, while vectorized mapping is improved by 12.1% in mAP. The source code will be publicly available at https://***/lynn-yu/GenMappin

关键词： Photomapping

来源：评论

学校读者我要写书评

暂无评论

Multi-Expert Adversarial Attack Detection in Person Re-identification Using Context Inconsistency

arXiv

引用

arXiv 2021年

作者： Wang, Xueping Li, Shasha Liu, Min Wang, Yaonan Roy-Chowdhury, Amit K. College of Electrical and Information Engineering Hunan University China National Engineering Laboratory for Robot Visual Perception and Control Technology China University of California Riverside United States

The success of deep neural networks (DNNs) has promoted the widespread applications of person reidentification (ReID). However, ReID systems inherit the vulnerability of DNNs to malicious attacks of visually inconspicuous adversarial perturbations. Detection of adversarial attacks is, therefore, a fundamental requirement for robust ReID systems. In this work, we propose a Multi-Expert Adversarial Attack Detection (MEAAD) approach to achieve this goal by checking context inconsistency, which is suitable for any DNN-based ReID systems. Specifically, three kinds of context inconsistencies caused by adversarial attacks are employed to learn a detector for distinguishing the perturbed examples, i.e., a) the embedding distances between a perturbed query person image and its top-K retrievals are generally larger than those between a benign query image and its top-K retrievals, b) the embedding distances among the top-K retrievals of a perturbed query image are larger than those of a benign query image, c) the top-K retrievals of a benign query image obtained with multiple expert ReID models tend to be consistent, which is not preserved when attacks are present. Extensive experiments on the Market1501 and DukeMTMC-ReID datasets show that, as the first adversarial attack detection approach for ReID, MEAAD effectively detects various adversarial attacks and achieves high ROC-AUC (over 97.5%). Copyright © 2021, The Authors. All rights reserved.

关键词： Deep neural networks

来源：评论

学校读者我要写书评

暂无评论

Beyond the Field-of-View: Enhancing Scene Visibility and perception with Clip-Recurrent Transformer

arXiv

引用

arXiv 2022年

作者： Shi, Hao Jiang, Qi Yang, Kailun Yin, Xiaoting Wang, Ze Wang, Kaiwei The State Key Laboratory of Extreme Photonics and Instrumentation the National Engineering Research Center of Optical Instrumentation Zhejiang University Hangzhou310027 China The School of Robotics The National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University Changsha410082 China Shanghai SUPREMIND Technology Company Ltd Shanghai201210 China

vision sensors are widely applied in vehicles, robots, and roadside infrastructure. However, due to limitations in hardware cost and system size, camera Field-of-View (FoV) is often restricted and may not provide sufficient coverage. Nevertheless, from a spatiotemporal perspective, it is possible to obtain information beyond the camera’s physical FoV from past video streams. In this paper, we propose the concept of online video inpainting for autonomous vehicles to expand the field of view, thereby enhancing scene visibility, perception, and system safety. To achieve this, we introduce the FlowLens architecture, which explicitly employs optical flow and implicitly incorporates a novel clip-recurrent transformer for feature propagation. FlowLens offers two key features: 1) FlowLens includes a newly designed Clip-Recurrent Hub with 3D-Decoupled Cross Attention (DDCA) to progressively process global information accumulated over time. 2) It integrates a multi-branch Mix Fusion Feed Forward Network (MixF3N) to enhance the precise spatial flow of local features. To facilitate training and evaluation, we derive the KITTI360 dataset with various FoV mask, which covers both outer- and inner FoV expansion scenarios. We also conduct both quantitative assessments and qualitative comparisons of beyond-FoV semantics and beyond-FoV object detection across different models. We illustrate that employing FlowLens to reconstruct unseen scenes even enhances perception within the field of view by providing reliable semantic context. Extensive experiments and user studies involving offline and online video inpainting, as well as beyond-FoV perception tasks, demonstrate that FlowLens achieves state-of-the-art performance. The source code and dataset are made publicly available at https://github. com/MasterHow/FlowLens. Copyright © 2022, The Authors. All rights reserved.

关键词： Optical flows

来源：评论

学校读者我要写书评

暂无评论

LF-VISLAM: A SLAM Framework for Large Field-of-View Cameras with Negative Imaging Plane on Mobile Agents

arXiv

引用

arXiv 2022年

作者： Wang, Ze Yang, Kailun Shi, Hao Li, Peng Gao, Fei Bai, Jian Wang, Kaiwei State Key Laboratory of Extreme Photonics and Instrumentation Zhejiang University China School of Robotics Hunan University China National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University China State Key Laboratory of Industrial Control Technology Zhejiang University China Huzhou Institute of Zhejiang University Zhejiang University China

Simultaneous Localization And Mapping (SLAM) has become a crucial aspect in the fields of autonomous driving and robotics. One crucial component of visual SLAM is the Field-of-View (FoV) of the camera, as a larger FoV allows for a wider range of surrounding elements and features to be perceived. However, when the FoV of the camera reaches the negative half-plane, traditional methods for representing image feature points using (u, v, 1)T become ineffective. While the panoramic FoV is advantageous for loop closure, its benefits are not easily realized under large-attitude-angle differences where loop-closure frames cannot be easily matched by existing methods. As loop closure on wide-FoV panoramic data further comes with a large number of outliers, traditional outlier rejection methods are not directly applicable. To address these issues, we propose LF-VISLAM, a Visual Inertial SLAM framework for cameras with extremely Large FoV with loop closure. A three-dimensional vector with unit length is introduced to effectively represent feature points even on the negative half-plane. The attitude information of the SLAM system is leveraged to guide the feature point detection of the loop closure. Additionally, a new outlier rejection method Copyright © 2022, The Authors. All rights reserved.

关键词： Cameras

来源：评论

学校读者我要写书评

暂无评论

Design, analysis, and manufacturing of a glass-plastic hybrid minimalist aspheric panoramic annular lens

arXiv

引用

arXiv 2024年

作者： Gao, Shaohua Jiang, Qi Liao, Yiqi Qiu, Yi Ying, Wanglei Yang, Kailun Wang, Kaiwei Zhang, Benhao Bai, Jian State Key Laboratory of Extreme Photonics and Instrumentation College of Optical Science and Engineering Zhejiang University Hangzhou310027 China Ningbo Lian Technology Co. Ltd Ningbo Lian Ningbo315500 China National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University Changsha410082 China Intelligent Optics & Photonics Research Center Jiaxing Research Institute Zhejiang University Jiaxing314031 China Central Research Institue of Sunny Optical Technology Sunny Optical Technology Hangzhou311215 China

We propose a high-performance glass-plastic hybrid minimalist aspheric panoramic annular lens (ASPAL) to solve several major limitations of the traditional panoramic annular lens (PAL), such as large size, high weight, and complex system. The field of view (FoV) of the ASPAL is 360◦×(35◦∼110◦) and the imaging quality is close to the diffraction limit. This large FoV ASPAL is composed of only 4 lenses. Moreover, we establish a physical structure model of PAL using the ray tracing method and study the influence of its physical parameters on compactness ratio. In addition, for the evaluation of local tolerances of annular surfaces, we propose a tolerance analysis method suitable for ASPAL. This analytical method can effectively analyze surface irregularities on annular surfaces and provide clear guidance on manufacturing tolerances for ASPAL. Benefiting from high-precision glass molding and injection molding aspheric lens manufacturing techniques, we finally manufactured 20 ASPALs in small batches. The weight of an ASPAL prototype is only 8.5 g. Our framework provides promising insights for the application of panoramic systems in space and weight-constrained environmental sensing scenarios such as intelligent security, micro-UAVs, and micro-robots. Copyright © 2024, The Authors. All rights reserved.

关键词： Optical design

来源：评论

学校读者我要写书评

暂无评论

FlowDriveNet: An End-to-End Network for Learning Driving Policies from Image Optical Flow and LiDAR Point Flow

FlowDriveNet: An End-to-End Network for Learning Driving Pol...

引用

IEEE International Conference on robotics and Automation (ICRA)

作者： Shuai Wang Jiahu Qin Menglin Li Yaonan Wang University of Science and Technology of China Hefei China College of Electrical and Information Engineering Hunan University Changsha China National Engineering Laboratory for Robot Visual Perception and Control Technology Changsha China

Learning driving policies using an end-to-end network has been proved a promising solution for autonomous driving. Due to the lack of a benchmark driver behavior dataset that contains both the visual and the LiDAR data, existing works solely focus on learning driving from visual sensors. Besides, most works are limited to predict steering angle yet neglect the more challenging vehicle speed control problem. In this paper, we propose a novel end-to-end network, FlowDriveNet, which takes advantages of sequential visual data and LiDAR data jointly to predict steering angle and vehicle speed. The main challenges of this problem are how to efficiently extract driving-related information from images and point clouds, and how to fuse them effectively. To tackle these challenges, we propose a concept of point flow and declare that image optical flow and LiDAR point flow are significant motion cues for driving policy learning. Specifically, we first create an enhanced dataset that consists of images, point clouds and corresponding human driver behaviors. Then, in FlowDriveNet, a deep but efficient visual feature extraction module and a point feature extraction module are utilized to extract spatial features from optical flow and point flow, respectively. Additionally, a novel temporal fusion and prediction module is designed to fuse temporal information from the extracted spatial feature sequences and predict vehicle driving commands. Finally, a series of ablation experiments verify the importance of optical flow and point flow and comparison experiments show that our flow-based method outperforms the existing image-based approaches on the task of driving policy learning.

关键词： Image motion analysis Computer vision Visualization Laser radar Fuses Predictive models Feature extraction

来源：评论

学校读者我要写书评

暂无评论

An Efficient Planning and control Framework for Quadrotors Based on Heuristic Search and Data-Driven Model Predictive control

引用

IEEE Transactions on Industrial Electronics 2025年

作者： Wang, Sifei Wang, Yaonan Miao, Zhiqiang Wang, Xiangke He, Wei Wang, Hesheng Hunan University College of Electrical and Information Engineering Changsha 410082 China National Engineering Research Center for Robot Visual Perception and Control Changsha 410082 China National University of Defense Technology College of Intelligence Science and Technology Changsha 410073 China Beijing Information Science and Technology University School of Automation Institute of Artificial Intelligence Beijing 100083 China University of Science and Technology School of Intelligence Science and Technology Beijing 100083 China Shanghai Jiao Tong University Department of Automation Shanghai 200240 China Ministry of Education of China Key Laboratory of System Control and Information Processing Shanghai 200240 China Harbin Institute of Technology State Key Laboratory of Robotics and System Harbin 150001 China

This article presents a framework for quadrotors that integrate planning and control, which employs a heuristic depth-first search (HDFS) with data-driven model predictive control (MPC). The proposed framework intends to enable multiquadrotor systems to traverse cluttered environments containing obstacles and disturbances. The heuristic function enables quadrotors to identify a secure route in a timely manner. The heuristic function setup employed by HDFS results in a reduction in the number of rasters accessed, particularly in scenarios that are not labyrinthine. This consequently leads to an enhancement in computational speed for multiquadrotor systems when compared to the A* algorithm. Furthermore, data-driven MPC enhances the nominal model of the quadrotor by modeling the disturbance while tracking the desired states. Consequently, data-driven MPC is capable of attaining superior tracking precision in a disturbance environment when compared to MPC without data-driven. The transformation from the path to the desired states of the quadrotor is achieved through the utilization of B-splines and differential flatness. Simulations and experiments substantiate the efficacy of the proposed approach. © 1982-2012 IEEE.

关键词： Collision avoidance data-driven depth-first search model predictive control multiquadrotor systems

来源：评论

学校读者我要写书评

暂无评论

DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction

arXiv

引用

arXiv 2024年

作者： Li, Siyu Lin, Jiacheng Shi, Hao Zhang, Jiaming Wang, Song Yao, You Li, Zhiyong Yang, Kailun The School of Robotics The National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University Changsha410082 China The College of Computer Science and Electronic Engineering Hunan University Changsha410082 China The State Key Laboratory of Extreme Photonics and Instrumentation The National Engineering Research Center of Optical Instrumentation Zhejiang University Hangzhou310027 China The Institute for Anthropomatics and Robotics Karlsruhe Institute of Technology Karlsruhe76131 Germany The College of Computer Science Zhejiang University Hangzhou310027 China The USC Viterbi School of Engineering The University of Southern California Los AngelesCA90089 United States

Temporal information plays a pivotal role in Bird’s-Eye-View (BEV) driving scene understanding, which can alleviate the visual information sparsity. However, the indiscriminate temporal fusion method will cause the barrier of feature redundancy when constructing vectorized High-Definition (HD) maps. In this paper, we revisit the temporal fusion of vectorized HD maps, focusing on temporal instance consistency and temporal map consistency learning. To improve the representation of instances in single-frame maps, we introduce a novel method, DTCLMapper. This approach uses a dual-stream temporal consistency learning module that combines instance embedding with geometry maps. In the instance embedding component, our approach integrates temporal Instance Consistency Learning (ICL), ensuring consistency from vector points and instance features aggregated from points. A vectorized points pre-selection module is employed to enhance the regression efficiency of vector points from each instance. Then aggregated instance features obtained from the vectorized points preselection module are grounded in contrastive learning to realize temporal consistency, where positive and negative samples are selected based on position and semantic information. The geometry mapping component introduces Map Consistency Learning (MCL) designed with self-supervised learning. The MCL enhances the generalization capability of our consistent learning approach by concentrating on the global location and distribution constraints of the instances. Extensive experiments on well-recognized benchmarks indicate that the proposed DTCLMapper achieves state-of-the-art performance in vectorized mapping tasks, reaching 61.9% and 65.1% mAP scores on the nuScenes and Argoverse datasets, respectively. The source code is available at https://***/lynn-yu/DTCLMapper. Copyright © 2024, The Authors. All rights reserved.

关键词： Contrastive Learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：