检索结果-内蒙古大学图书馆

FlowDriveNet: An End-to-End Network for Learning Driving Policies from Image Optical Flow and LiDAR Point Flow

学校读者我要写书评

暂无评论

FlowDriveNet: An End-to-End Network for Learning Driving Pol...

IEEE International Conference on robotics and Automation (ICRA)

作者： Shuai Wang Jiahu Qin Menglin Li Yaonan Wang University of Science and Technology of China Hefei China College of Electrical and Information Engineering Hunan University Changsha China National Engineering Laboratory for Robot Visual Perception and Control Technology Changsha China

Learning driving policies using an end-to-end network has been proved a promising solution for autonomous driving. Due to the lack of a benchmark driver behavior dataset that contains both the visual and the LiDAR data, existing works solely focus on learning driving from visual sensors. Besides, most works are limited to predict steering angle yet neglect the more challenging vehicle speed control problem. In this paper, we propose a novel end-to-end network, FlowDriveNet, which takes advantages of sequential visual data and LiDAR data jointly to predict steering angle and vehicle speed. The main challenges of this problem are how to efficiently extract driving-related information from images and point clouds, and how to fuse them effectively. To tackle these challenges, we propose a concept of point flow and declare that image optical flow and LiDAR point flow are significant motion cues for driving policy learning. Specifically, we first create an enhanced dataset that consists of images, point clouds and corresponding human driver behaviors. Then, in FlowDriveNet, a deep but efficient visual feature extraction module and a point feature extraction module are utilized to extract spatial features from optical flow and point flow, respectively. Additionally, a novel temporal fusion and prediction module is designed to fuse temporal information from the extracted spatial feature sequences and predict vehicle driving commands. Finally, a series of ablation experiments verify the importance of optical flow and point flow and comparison experiments show that our flow-based method outperforms the existing image-based approaches on the task of driving policy learning.

关键词： Image motion analysis Computer vision visualization Laser radar Fuses Predictive models Feature extraction

Generative Adversarial Network with Separate Learning Rule for Image Generation

学校读者我要写书评

暂无评论

Journal of Donghua University(English Edition) 2020年第2期37卷 121-129页

作者： YIN Feng CHEN Xinyu QIU Jie KANG Yongliang College of Automation and Electronic Information Xiangtan UniversityXiangtan 411105China National Engineering Laboratory of Robot Vision Perception and Control Technology Changsha 410012China

Boundary equilibrium generative adversarial networks(BEGANs)are the improved version of generative adversarial networks(GANs).In this paper,an improved BEGAN with a skip-connection technique in the generator and the discriminator is ***,an alternative time-scale update rule is adopted to balance the learning rate of the generator and the ***,the performance of the proposed method is quantitatively evaluated by Fréchet inception distance(FID)and inception score(IS).The test results show that the performance of the proposed method is better than that of the original BEGAN.

关键词： generative adversarial network(GAN) boundary equilibrium generative adversarial network(BEGAN) Fréchet inception distance(FID) inception score(IS)

Design, analysis, and manufacturing of a glass-plastic hybrid minimalist aspheric panoramic annular lens

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Gao, Shaohua Jiang, Qi Liao, Yiqi Qiu, Yi Ying, Wanglei Yang, Kailun Wang, Kaiwei Zhang, Benhao Bai, Jian State Key Laboratory of Extreme Photonics and Instrumentation College of Optical Science and Engineering Zhejiang University Hangzhou310027 China Ningbo Lian Technology Co. Ltd Ningbo Lian Ningbo315500 China National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University Changsha410082 China Intelligent Optics & Photonics Research Center Jiaxing Research Institute Zhejiang University Jiaxing314031 China Central Research Institue of Sunny Optical Technology Sunny Optical Technology Hangzhou311215 China

We propose a high-performance glass-plastic hybrid minimalist aspheric panoramic annular lens (ASPAL) to solve several major limitations of the traditional panoramic annular lens (PAL), such as large size, high weight, and complex system. The field of view (FoV) of the ASPAL is 360◦×(35◦∼110◦) and the imaging quality is close to the diffraction limit. This large FoV ASPAL is composed of only 4 lenses. Moreover, we establish a physical structure model of PAL using the ray tracing method and study the influence of its physical parameters on compactness ratio. In addition, for the evaluation of local tolerances of annular surfaces, we propose a tolerance analysis method suitable for ASPAL. This analytical method can effectively analyze surface irregularities on annular surfaces and provide clear guidance on manufacturing tolerances for ASPAL. Benefiting from high-precision glass molding and injection molding aspheric lens manufacturing techniques, we finally manufactured 20 ASPALs in small batches. The weight of an ASPAL prototype is only 8.5 g. Our framework provides promising insights for the application of panoramic systems in space and weight-constrained environmental sensing scenarios such as intelligent security, micro-UAVs, and micro-robots. Copyright © 2024, The Authors. All rights reserved.

关键词： Optical design

DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Li, Siyu Lin, Jiacheng Shi, Hao Zhang, Jiaming Wang, Song Yao, You Li, Zhiyong Yang, Kailun The School of Robotics The National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University Changsha410082 China The College of Computer Science and Electronic Engineering Hunan University Changsha410082 China The State Key Laboratory of Extreme Photonics and Instrumentation The National Engineering Research Center of Optical Instrumentation Zhejiang University Hangzhou310027 China The Institute for Anthropomatics and Robotics Karlsruhe Institute of Technology Karlsruhe76131 Germany The College of Computer Science Zhejiang University Hangzhou310027 China The USC Viterbi School of Engineering The University of Southern California Los AngelesCA90089 United States

Temporal information plays a pivotal role in Bird’s-Eye-View (BEV) driving scene understanding, which can alleviate the visual information sparsity. However, the indiscriminate temporal fusion method will cause the barrier of feature redundancy when constructing vectorized High-Definition (HD) maps. In this paper, we revisit the temporal fusion of vectorized HD maps, focusing on temporal instance consistency and temporal map consistency learning. To improve the representation of instances in single-frame maps, we introduce a novel method, DTCLMapper. This approach uses a dual-stream temporal consistency learning module that combines instance embedding with geometry maps. In the instance embedding component, our approach integrates temporal Instance Consistency Learning (ICL), ensuring consistency from vector points and instance features aggregated from points. A vectorized points pre-selection module is employed to enhance the regression efficiency of vector points from each instance. Then aggregated instance features obtained from the vectorized points preselection module are grounded in contrastive learning to realize temporal consistency, where positive and negative samples are selected based on position and semantic information. The geometry mapping component introduces Map Consistency Learning (MCL) designed with self-supervised learning. The MCL enhances the generalization capability of our consistent learning approach by concentrating on the global location and distribution constraints of the instances. Extensive experiments on well-recognized benchmarks indicate that the proposed DTCLMapper achieves state-of-the-art performance in vectorized mapping tasks, reaching 61.9% and 65.1% mAP scores on the nuScenes and Argoverse datasets, respectively. The source code is available at https://***/lynn-yu/DTCLMapper. Copyright © 2024, The Authors. All rights reserved.

关键词： Contrastive Learning

TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Liu, Ruiping Yang, Kailun Roitberg, Alina Zhang, Jiaming Peng, Kunyu Liu, Huayao Wang, Yaonan Stiefelhagen, Rainer The Institute for Anthropomatics and Robotics Karlsruhe Institute of Technology Karlsruhe76131 Germany The School of Robotics The National Engineering Laboratory of Robot Visual Perception and Control Technology Hunan University Changsha410082 China The Institute for Artificial Intelligence The University of Stuttgart Stuttgart70569 Germany The Institute for Visual Computing ETH Zurich Zurich8092 Switzerland NIO Shanghai201804 China

Semantic segmentation benchmarks in the realm of autonomous driving are dominated by large pre-trained transformers, yet their widespread adoption is impeded by substantial computational costs and prolonged training durations. To lift this constraint, we look at efficient semantic segmentation from a perspective of comprehensive knowledge distillation and aim to bridge the gap between multi-source knowledge extractions and transformer-specific patch embeddings. We put forward the Transformer-based Knowledge Distillation (TransKD) framework which learns compact student transformers by distilling both feature maps and patch embeddings of large teacher transformers, bypassing the long pre-training process and reducing the FLOPs by >85.0%. Specifically, we propose two fundamental modules to realize feature map distillation and patch embedding distillation, respectively: (1) Cross Selective Fusion (CSF) enables knowledge transfer between cross-stage features via channel attention and feature map distillation within hierarchical transformers;(2) Patch Embedding Alignment (PEA) performs dimensional transformation within the patchifying process to facilitate the patch embedding distillation. Furthermore, we introduce two optimization modules to enhance the patch embedding distillation from different perspectives: (1) Global-Local Context Mixer (GL-Mixer) extracts both global and local information of a representative embedding;(2) Embedding Assistant (EA) acts as an embedding method to seamlessly bridge teacher and student models with the teacher’s number of channels. Experiments on Cityscapes, ACDC, NYUv2, and Pascal VOC2012 datasets show that TransKD outperforms state-of-the-art distillation frameworks and rivals the time-consuming pre-training method. The source code is publicly available at https://***/RuipingL/TransKD. Copyright © 2022, The Authors. All rights reserved.

关键词： Semantic Segmentation

Quantum machine learning for multiclass classification beyond kernel methods

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Ding, Chao Wang, Shi Wang, Yaonan Gao, Weibo College of Electrical and Information Engineering Hunan University Changsha410082 China Division of Physics and Applied Physics School of Physical and Mathematical Sciences Nanyang Technological University Singapore637371 Singapore National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University Changsha410082 China Centre for Quantum Technologies National University of Singapore Singapore117543 Singapore The Photonics Institute Centre for Disruptive Photonic Technologies Nanyang Technological University Singapore637371 Singapore

Quantum machine learning is considered one of the current research fields with great potential. In recent years, Havlíček et al. [Nature 567, 209-212 (2019)] have proposed a quantum machine learning algorithm with quantum-enhanced feature spaces, which effectively addressed a binary classification problem on a superconducting processor and offered a potential pathway to achieving quantum advantage. However, a straightforward binary classification algorithm falls short in solving multiclass classification problems. In this paper, we propose a quantum algorithm that rigorously demonstrates that quantum kernel methods enhance the efficiency of multiclass classification in real-world applications, providing quantum advantage. To demonstrate quantum advantage, we design six distinct quantum kernels within the quantum algorithm to map input data into quantum state spaces and estimate the corresponding quantum kernel matrices. The results from quantum simulations reveal that the quantum algorithm outperforms its classical counterpart in handling six real-world multiclass classification problems. Furthermore, we leverage a variety of performance metrics to comprehensively evaluate the classification and generalization performance of the quantum algorithm. The results demonstrate that the quantum algorithm achieves superior classification and better generalization performance relative to classical counterparts. Copyright © 2024, The Authors. All rights reserved.

关键词： Quantum efficiency

Domain-invariant Prototypes for Semantic Segmentation

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Yang, Zhengeng Yu, Hongshan Sun, Wei Li-Cheng Mian, Ajmal The National Engineering Laboratory for Robot Visual Perception and Control Technology College of Electrical and Information Engineering Hunan University Lushan South Rd. Yuelu410082 China The Department of Electrical and Computer Engineering University of Alberta EdmontonAB Canada The Department of Computer Science The University of Western Australia PerthWA6009 Australia

Deep Learning has greatly advanced the performance of semantic segmentation, however, its success relies on the availability of large amounts of annotated data for training. Hence, many efforts have been devoted to domain adaptive semantic segmentation that focuses on transferring semantic knowledge from a labeled source domain to an unlabeled target domain. Existing self-training methods typically require multiple rounds of training, while another popular framework based on adversarial training is known to be sensitive to hyper-parameters. In this paper, we present an easy-to-train framework that learns domain-invariant prototypes for domain adaptive semantic segmentation. In particular, we show that domain adaptation shares a common character with few-shot learning in that both aim to recognize some types of unseen data with knowledge learned from large amounts of seen data. Thus, we propose a unified framework for domain adaptation and few-shot learning. The core idea is to use the class prototypes extracted from few-shot annotated target images to classify pixels of both source images and target images. Our method involves only one-stage training and does not need to be trained on large-scale un-annotated target images. Moreover, our method can be extended to variants of both domain adaptation and few-shot learning. Experiments on adapting GTA5-to-Cityscapes and SYNTHIA-to-Cityscapes show that our method achieves competitive performance to state-of-the-art. Copyright © 2022, The Authors. All rights reserved.

关键词： Semantic Segmentation

Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object Pose Estimation

学校读者我要写书评

暂无评论

arXiv 2025年

作者： Liu, Jian Sun, Wei Yang, Hui Deng, Pengchao Liu, Chongpei Sebe, Nicu Rahmani, Hossein Mian, Ajmal National Engineering Research Center for Robot Visual Perception and Control Technology College of Electrical and Information Engineering School of Robotics State Key Laboratory of Advanced Design and Manufacturing for Vehicle Body Hunan University Changsha410082 China Institute of Artificial Intelligence and Robotics Xi’an Jiaotong University Xi’an710049 China Department of Information Engineering and Computer Science University of Trento Trento38123 Italy School of Computing and Communications Lancaster University LA1 4YW United Kingdom Department of Computer Science The University of Western Australia WA6009 Australia

Nine-degrees-of-freedom (9-DoF) object pose and size estimation is crucial for enabling augmented reality and robotic manipulation. Category-level methods have received extensive research attention due to their potential for generalization to intra-class unknown objects. However, these methods require manual collection and labeling of large-scale real-world training data. To address this problem, we introduce a diffusion-based paradigm for domain-generalized category-level 9-DoF object pose estimation. Our motivation is to leverage the latent generalization ability of the diffusion model to address the domain generalization challenge in object pose estimation. This entails training the model exclusively on rendered synthetic data to achieve generalization to real-world scenes. We propose an effective diffusion model to redefine 9-DoF object pose estimation from a generative perspective. Our model does not require any 3D shape priors during training or inference. By employing the Denoising Diffusion Implicit Model, we demonstrate that the reverse diffusion process can be executed in as few as 3 steps, achieving near real-time performance. Finally, we design a robotic grasping system comprising both hardware and software components. Through comprehensive experiments on two benchmark datasets and the real-world robotic system, we show that our method achieves state-of-the-art domain generalization performance. Our code will be made public at https://***/CNJianLiu/Diff9D. Copyright © 2025, The Authors. All rights reserved.

关键词： Augmented reality

Computational Imaging for Machine perception: Transferring Semantic Segmentation beyond Aberrations

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Jiang, Qi Shi, Hao Gao, Shaohua Zhang, Jiaming Yang, Kailun Sun, Lei Ni, Huajian Wang, Kaiwei The State Key Laboratory of Extreme Photonics and Instrumentation The National Engineering Research Center of Optical Instrumentation Zhejiang University Hangzhou310027 China The Institute for Anthropomatics and Robotics Karlsruhe Institute of Technology Karlsruhe76131 Germany The School of Robotics The National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University Changsha410082 China Shanghai SUPREMIND Technology Company Ltd Shanghai201210 China

Semantic scene understanding with Minimalist Optical Systems (MOS) in mobile and wearable applications remains a challenge due to the corrupted imaging quality induced by optical aberrations. However, previous works only focus on improving the subjective imaging quality through the Computational Imaging (CI) technique, ignoring the feasibility of advancing semantic segmentation. In this paper, we pioneer the investigation of Semantic Segmentation under Optical Aberrations (SSOA) with MOS. To benchmark SSOA, we construct Virtual Prototype Lens (VPL) groups through optical simulation, generating Cityscapes-ab and KITTI-360-ab datasets under different behaviors and levels of aberrations. We look into SSOA via an unsupervised domain adaptation perspective to address the scarcity of labeled aberration data in real-world scenarios. Further, we propose Computational Imaging Assisted Domain Adaptation (CIADA) to leverage prior knowledge of CI for robust performance in SSOA. Based on our benchmark, we conduct experiments on the robustness of classical segmenters against aberrations. In addition, extensive evaluations of possible solutions to SSOA reveal that CIADA achieves superior performance under all aberration distributions, bridging the gap between computational imaging and downstream applications for MOS. The project page is at https://***/zju-jiangqi/CIADA. Copyright © 2022, The Authors. All rights reserved.

关键词： Aberrations