检索结果-内蒙古大学图书馆

9th International Forum on Digital Multimedia Communication, IFTC 2022

作者： Gong, Yafei Lian, Xinkang Ma, Xuanchao Xia, Zhifang Zhou, Chengxu Faculty of Information Technology Beijing University of Technology Beijing China Engineering Research Center of Intelligent Perception and Autonomous Control Ministry of Education Beijing China Beijing Laboratory of Smart Environmental Protection Beijing China Beijing Key Laboratory of Computational Intelligence and Intelligent System Beijing China Beijing Artificial Intelligence Institute Beijing China National Information Center Beijing China School of Electronic and Information Engineering Liaoning University of Technology Liaoning Jinzhou China Key Laboratory of Intelligent Control and Optimization for Industrial Equipment of Ministry of Education Dalian University of Technology Dalian China

ISBN: (纸本)9789819908554

Smoke detection plays a crucial role in the safety production of petrochemical enterprises and fire prevention. Image-based machine learning and deep learning methods have been widely studied. Recently, many works have applied the transformer to solve problems faced by computer vision tasks (such as classification and object detection). To our knowledge, there are few studies using the transformer structure to detect smoke. In order to research the application potential and improve the performance of the transformer in the smoke detection field, we propose a model consisting of two transformer encoders and a convolutional neural network (CNN) module. The first transformer encoder can be used to establish the global relationship of an image, and the CNN structure can provide additional local information to the transformer. The fusion of global information and local information is conducive to the second transfer encoder to make better decisions. Experiments results on large-size dataset for industrial smoke detection illustrate the effectiveness of the proposed model. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

关键词： Smoke

来源：评论

学校读者我要写书评

暂无评论

Generative Adversarial Network with Separate Learning Rule for Image Generation

引用

Journal of Donghua University(English Edition) 2020年第2期37卷 121-129页

作者： YIN Feng CHEN Xinyu QIU Jie KANG Yongliang College of Automation and Electronic Information Xiangtan UniversityXiangtan 411105China National Engineering Laboratory of Robot Vision Perception and Control Technology Changsha 410012China

Boundary equilibrium generative adversarial networks(BEGANs)are the improved version of generative adversarial networks(GANs).In this paper,an improved BEGAN with a skip-connection technique in the generator and the discriminator is ***,an alternative time-scale update rule is adopted to balance the learning rate of the generator and the ***,the performance of the proposed method is quantitatively evaluated by Fréchet inception distance(FID)and inception score(IS).The test results show that the performance of the proposed method is better than that of the original BEGAN.

关键词： generative adversarial network(GAN) boundary equilibrium generative adversarial network(BEGAN) Fréchet inception distance(FID) inception score(IS)

来源：评论

学校读者我要写书评

暂无评论

A Muti-stage Selection Filter Based on Wavelet Packet and2DCNN for Fault Diagnosis of Rotating Machinery

A Muti-stage Selection Filter Based on Wavelet Packet and2DC...

引用

第42届中国控制会议

作者： Wenbin He Jianxu Mao Li Liu Zhe Li Miao Yang Yaonan Wang College of Electrical and Information Engineering Hunan University National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University State Grid Hunan Electric Power Company Limited Research Institute

Fault information of rotating machinery is often drowned in strong noise signals, so it is crucial to accurately identify faults from high-intensity noise signals. In this article, an end-to-end fault diagnosis model is developed, which consists of a multi-stage selection filter based on wavelet packet and 2D-CNN. First, the original measured mechanical signals were processed by the three-level wavelet packet decomposition to obtain eight sub-bands with coefficient matrices. Second, the signal is reconstructed using different numbers of sub-bands, where the number is increased by one at a time to obtain eight different multi-stage reconstructed signals. Third, the reconstructed signals are reorganized into 2D signal maps;and a parallel training network is constructed using signal maps and 2D-CNN to achieve fault classification. Then, guided by the training results, eight parallel classification results are compared, so as to train the best fault diagnosis model. Finally, the simulation experiment based on a bearing data set illustrates the proposed multi-stage selection filter is effective and feasible in application.

关键词：

来源：评论

学校读者我要写书评

暂无评论

LanCOPE: Language-Guided Category-Level Object Pose Estimation from a Single RGB Image

引用

IEEE robotics and Automation Letters 2025年第7期10卷 7555-7562页

作者： Yang, Hui Sun, Wei Liu, Jian Zheng, Jin Dai, Zhenqi Mian, Ajmal Hunan University National Engineering Research Center for Robot Visual Perception and Control Technology College of Electrical and Information Engineering State Key Laboratory of Advanced Design and Manufacturing for Vehicle Body Changsha 410082 China Central South University School of Architecture and Art Changsha 410082 China The University of Western Australia Department of Computer Science and Software Engineering 6009 WA Australia

Monocular RGB-based category-level object pose estimation is more practical and cost-effective for robotics. However, existing methods do not fully exploit the rich semantic and contextual information in multimodal data (e.g. language) that provides additional object attributes to guide the model in extracting category features more reliably. We propose a language-guided category-level object pose estimation method (LanCOPE), taking a single RGB image as input. Our method uses DINOv2 to recover depth from a single RGB image and converts it into point cloud to perceive the object's geometry. We then introduce language descriptions for the RGB image, estimated point cloud and overall scene to better guide the point cloud encoder and image encoder in learning category features. We develop a cross-modal differential perception feature fusion network to fuse multimodal features. This network employs a differential perception module to eliminate redundant information across different modalities, highlighting signifcant semantic differences and similarities. Furthermore, it uses a cross-attention mechanism to fuse the semantic information of the language and vision features, improving the overall perception. Finally, we design a denoising network based on the skip fusion transformer to recover the object pose accurately. Extensive experiments on REAL275 and Wild6D datasets show that LanCOPE achieves state-of-the-art performance. Our code is available at LanCOPE. © 2016 IEEE.

关键词： category-level object pose estimation languag RGB image

来源：评论

学校读者我要写书评

暂无评论

Towards Consistent Object Detection via LiDAR-Camera Synergy

arXiv

引用

arXiv 2024年

作者： Luo, Kai Wu, Hao Yi, Kefu Yang, Kailun Hao, Wei Hu, Rongdong School of Traffic and Transportation Engineering Changsha University of Science and Technology China College of Automotive and Mechanical Engineering Changsha University of Science and Technology China School of Robotics Hunan University China National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University China Changsha Intelligent Driving Institute China

As human-machine interaction continues to evolve, the capacity for environmental perception is becoming increasingly crucial. Integrating the two most common types of sensory data, images, and point clouds, can enhance detection accuracy. Currently, there is no existing model capable of detecting an object’s position in both point clouds and images while also determining their corresponding relationship. This information is invaluable for human-machine interactions, offering new possibilities for their enhancement. In light of this, this paper introduces an end-to-end Consistency Object Detection (COD) algorithm framework that requires only a single forward inference to simultaneously obtain an object’s position in both point clouds and images and establish their correlation. Furthermore, to assess the accuracy of the object correlation between point clouds and images, this paper proposes a new evaluation metric, Consistency Precision (CP). To verify the effectiveness of the proposed framework, an extensive set of experiments has been conducted on the KITTI and DAIR-V2X datasets. The study also explored how the proposed consistency detection method performs on images when the calibration parameters between images and point clouds are disturbed, compared to existing post-processing methods. The experimental results demonstrate that the proposed method exhibits excellent detection performance and robustness, achieving end-to-end consistency detection. The source code will be made publicly available at https://***/xifen523/COD. Copyright © 2024, The Authors. All rights reserved.

关键词： Image enhancement

来源：评论

学校读者我要写书评

暂无评论

Towards Consistent Object Detection via LiDAR-Camera Synergy

Towards Consistent Object Detection via LiDAR-Camera Synergy

引用

IEEE International Conference on Systems, Man and Cybernetics

作者： Kai Luo Hao Wu Kefu Yi Kailun Yang Wei Hao Rongdong Hu College of Automotive and Mechanical Engineering Changsha University of Science and Technology China School of Traffic and Transportation Engineering Changsha University of Science and Technology China School of Robotics Hunan University China National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University China Changsha Intelligent Driving Institute China

ISBN: (数字)9781665410205

ISBN: (纸本)9781665410212

As human-machine interaction continues to evolve, the capacity for environmental perception is becoming increasingly crucial. Integrating the two most common types of sensory data, images, and point clouds, can enhance detection accuracy. Currently, there is no existing model capable of detecting an object's position in both point clouds and images while also determining their corresponding relationship. This information is invaluable for human-machine interactions, offering new possibilities for their enhancement. In light of this, this paper introduces an end-to-end Consistency Object Detection (COD) algorithm framework that requires only a single forward inference to simultaneously obtain an object's position in both point clouds and images and establish their correlation. Furthermore, to assess the accuracy of the object correlation between point clouds and images, this paper proposes a new evaluation metric, Consistency Precision (CP). To verify the effectiveness of the proposed framework, an extensive set of experiments has been conducted on the KITTI and DAIR-V2X datasets. The study also explored how the proposed consistency detection method performs on images when the calibration parameters between images and point clouds are disturbed, compared to existing post-processing methods. The experimental results demonstrate that the proposed method exhibits ex-cellent detection performance and robustness, achieving end-to-end consistency detection. The source code will be made publicly available at https://***/xifen523/COD.

关键词： Point cloud compression Human computer interaction Measurement Accuracy Correlation Source coding Instruments Object detection Robustness Inference algorithms

来源：评论

学校读者我要写书评

暂无评论

Towards Source-free Domain Adaptive Semantic Segmentation via Importance-aware and Prototype-contrast Learning

arXiv

引用

arXiv 2023年

作者： Cao, Yihong Zhang, Hui Lu, Xiao Xiao, Zheng Yang, Kailun Wang, Yaonan College of Computer Science and Electronic Engineering Hunan University Changsha410082 China National Engineering Research Center of Robot Vision Perception and Control Technology School of Robotics Hunan University Changsha410082 China College of Engineering and Design Hunan Normal University Changsha410082 China

Domain adaptive semantic segmentation enables robust pixel-wise understanding in real-world driving scenes. Source-free domain adaptation, as a more practical technique, addresses the concerns of data privacy and storage limitations in typical unsupervised domain adaptation methods, making it especially relevant in the context of intelligent vehicles. It utilizes a well-trained source model and unlabeled target data to achieve adaptation in the target domain. However, in the absence of source data and target labels, current solutions cannot sufficiently reduce the impact of domain shift and fully leverage the information from the target data. In this paper, we propose an end-to-end source-free domain adaptation semantic segmentation method via Importance-Aware and Prototype-Contrast (IAPC) learning. The proposed IAPC framework effectively extracts domain-invariant knowledge from the well-trained source model and learns domain-specific knowledge from the unlabeled target domain. Specifically, considering the problem of domain shift in the prediction of the target domain by the source model, we put forward an importance-aware mechanism for the biased target prediction probability distribution to extract domain-invariant knowledge from the source model. We further introduce a prototype-contrast strategy, which includes a prototype-symmetric cross-entropy loss and a prototype-enhanced cross-entropy loss, to learn target intra-domain knowledge without relying on labels. A comprehensive variety of experiments on two domain adaptive semantic segmentation benchmarks demonstrates that the proposed end-to-end IAPC solution outperforms existing state-of-the-art methods. The source code is publicly available at https://***/yihong-97/Source-free-IAPC. Copyright © 2023, The Authors. All rights reserved.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous Driving

arXiv

引用

arXiv 2024年

作者： Lin, Jiacheng Chen, Jiajun Peng, Kunyu He, Xuan Li, Zhiyong Stiefelhagen, Rainer Yang, Kailun The College of Computer Science and Electronic Engineering Hunan University Changsha410082 China The School of Robotics Hunan University Changsha410012 China The National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University Changsha410082 China The Institute for Robotics and An-Thropomatics Karlsruhe Institute of Technology Karlsruhe76131 Germany

This paper introduces the task of Auditory Referring Multi-Object Tracking (AR-MOT), which dynamically tracks specific objects in a video sequence based on audio expressions and appears as a challenging problem in autonomous driving. Due to the lack of semantic modeling capacity in audio and video, existing works have mainly focused on text-based multiobject tracking, which often comes at the cost of tracking quality, interaction efficiency, and even the safety of assistance systems, limiting the application of such methods in autonomous driving. In this paper, we delve into the problem of AR-MOT from the perspective of audio-video fusion and audio-video tracking. We put forward EchoTrack, an end-to-end AR-MOT framework with dual-stream vision transformers. The dual streams are intertwined with our Bidirectional Frequency-domain Cross-attention Fusion Module (Bi-FCFM), which bidirectionally fuses audio and video features from both frequency- and spatiotemporal domains. Moreover, we propose the Audio-visual Contrastive Tracking Learning (ACTL) regime to extract homogeneous semantic features between expressions and visual objects by learning homogeneous features between different audio and video objects effectively. Aside from the architectural design, we establish the first set of large-scale AR-MOT benchmarks, including Echo-KITTI, Echo-KITTI+, and Echo-BDD. Extensive experiments on the established benchmarks demonstrate the effectiveness of the proposed EchoTrack and its components. The source code and datasets are available at https://***/lab206/EchoTrack. Copyright © 2024, The Authors. All rights reserved.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

Minimalist and High-Quality Panoramic Imaging with PSF-aware Transformers

arXiv

引用

arXiv 2023年

作者： Jiang, Qi Gao, Shaohua Gao, Yao Yang, Kailun Yi, Zhonghua Shi, Hao Sun, Lei Wang, Kaiwei The State Key Laboratory of Modern Optical Instrumentation The National Engineering Research Center of Optical Instrumentation Zhejiang University Hangzhou310027 China The School of Robotics The National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University Changsha410082 China

High-quality panoramic images with a Field of View (FoV) of 360° are essential for contemporary panoramic computer vision tasks. However, conventional imaging systems come with sophisticated lens designs and heavy optical components. This disqualifies their usage in many mobile and wearable applications where thin and portable, minimalist imaging systems are desired. In this paper, we propose a Panoramic Computational Imaging Engine (PCIE) to achieve minimalist and high-quality panoramic imaging. With less than three spherical lenses, a Minimalist Panoramic Imaging Prototype (MPIP) is constructed based on the design of the Panoramic Annular Lens (PAL), but with low-quality imaging results due to aberrations and small image plane size. We propose two pipelines, i.e. Aberration Correction (AC) and Super-Resolution and Aberration Correction (SR&AC), to solve the image quality problems of MPIP, with imaging sensors of small and large pixel size, respectively. To leverage the prior information of the optical system, we propose a Point Spread Function (PSF) representation method to produce a PSF map as an additional modality. A PSF-aware Aberration-image Recovery Transformer (PART) is designed as a universal network for the two pipelines, in which the self-attention calculation and feature extraction are guided by the PSF map. We train PART on synthetic image pairs from simulation and put forward the PALHQ dataset to fill the gap of real-world high-quality PAL images for low-level vision. A comprehensive variety of experiments on synthetic and real-world benchmarks demonstrates the impressive imaging results of PCIE and the effectiveness of the PSF representation. We further deliver heuristic experimental findings for minimalist and high-quality panoramic imaging, in terms of the choices of prototype and pipeline, network architecture, training strategies, and dataset construction. Our dataset and code will be available at https://***/zju-jiangqi/PCIE-PART. Copyrig

关键词： Optical transfer function

来源：评论

学校读者我要写书评

暂无评论

OAFuser: Towards Omni-Aperture Fusion for Light Field Semantic Segmentation

arXiv

引用

arXiv 2023年

作者： Teng, Fei Zhang, Jiaming Peng, Kunyu Wang, Yaonan Stiefelhagen, Rainer Yang, Kailun The School of Robotics The National Engineering Research Center of Robot Visual Perception and Control Technology Hunan University Changsha410082 China The Institute for Anthropomatics and Robotics Karlsruhe Institute of Technology Karlsruhe76131 Germany The Institute for Visual Computing ETH Zurich Zurich8092 Switzerland

Light field cameras are capable of capturing intricate angular and spatial details. This allows for acquiring complex light patterns and details from multiple angles, significantly enhancing the precision of image semantic segmentation. However, two significant issues arise: (1) The extensive angular information of light field cameras contains a large amount of redundant data, which is overwhelming for the limited hardware resources of intelligent agents. (2) A relative displacement difference exists in the data collected by different micro-lenses. To address these issues, we propose an Omni-Aperture Fusion model (OAFuser) that leverages dense context from the central view and extracts the angular information from sub-aperture images to generate semantically consistent results. To simultaneously streamline the redundant information from the light field cameras and avoid feature loss during network propagation, we present a simple yet very effective Sub-Aperture Fusion Module (SAFM). This module efficiently embeds sub-aperture images in angular features, allowing the network to process each sub-aperture image with a minimal computational demand of only (∼1GFlops). Furthermore, to address the mismatched spatial information across viewpoints, we present a Center Angular Rectification Module (CARM) to realize feature resorting and prevent feature occlusion caused by misalignment. The proposed OAFuser achieves state-of-the-art performance on four UrbanLF datasets in terms of all evaluation metrics and sets a new record of 84.93% in mIoU on the UrbanLF-Real Extended dataset, with a gain of +3.69%. The source code for OAFuser is available at https://***/FeiBryantkit/OAFuser. Impact Statement-To solve the data abundance problem, we have reduced the significant computational consumption of light field cameras while not introducing any additional parameters. The proposed method has practical value for the deployment and application of light field cameras. The proposed

关键词： Semantic Segmentation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：