检索结果-内蒙古大学图书馆

Image debanding using cross-scale invertible networks with banded deformable convolutions

NEURAL NETWORKS 2025年 187卷 107270页

作者： Quan, Yuhui He, Xuyi Xu, Ruotao Xu, Yong Ji, Hui South China Univ Technol Sch Comp Sci & Engn Guangzhou 510006 Peoples R China South China Univ Technol Inst Super Robot Guangzhou Peoples R China Key Lab Large Model Embodied Intelligent Humanoid Guangzhou Peoples R China Peng Cheng Lab Shenzhen Peoples R China Guangdong Prov Key Lab Multimodal Big Data Intelli Guangzhou Peoples R China Natl Univ Singapore Dept Math Singapore Singapore

Banding artifacts in images stem from limitations in color bit depth, image compression, or over-editing, significantly degrades image quality, especially in regions with smooth gradients. Image debanding is about eliminating these artifacts while preserving the authenticity of image details. This paper introduces a novel approach to image debanding using a cross-scale invertible neural network (INN). The proposed INN is information-lossless and enhanced by amore effective cross-scale scheme. Additionally, we present a technique called banded deformable convolution, which fully leverages the anisotropic properties of banding artifacts. This technique is more compact, efficient, and exhibits better generalization compared to existing deformable convolution methods. Our proposed INN exhibits superior performance in both quantitative metrics and visual quality, as evidenced by the results of the experiments.

关键词： Image debanding Invertible networks Banded deformable convolution

来源：评论

学校读者我要写书评

暂无评论

Self-Correcting robot Manipulation via Gaussian-Splatted Foresight 39

Self-Correcting Robot Manipulation via Gaussian-Splatted For...

引用

39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025

作者： Pan, Shaohui Xu, Yong Xu, Ruotao Zhou, Zihan Wu, Si Yu, Zhuliang School of Computer Science and Engineering South China University of Technology Guangzhou510006 China Guangzhou510000 China Guangdong Provincial Key Laboratory of Multimodal Big Data Intelligent Analysis Guangzhou510000 China Peng Cheng Laboratory of Shenzhen Shenzhen518120 China Key Laboratory of Large-Model Embodied-Intelligent Humanoid Robot 510000 China School of College of Mathematics and Informatics South China Agricultural University Guangzhou510640 China Shien-Ming Wu School of Intelligent Engineering South China University of Technology Guangzhou510006 China School of Automation Science and Engineering South China University of Technology Guangzhou510006 China

ISBN: (纸本)157735897X

Language-conditioned robotic manipulation in unstructured environments presents significant challenges for intelligent robotic systems. However, due to partial observation or imprecise action prediction, failure may be unavoidable for learned policies. Moreover, operational failures can lead to the robotic arm entering an untrained state, potentially causing destructive results. Consequently, the ability to detect and self-correct failures is crucial for the development of practical robotic systems. To address this challenge, we propose a foresight-driven failure detection and self-correction module for robot manipulation. By leveraging 3D Gaussian Splatting, we represent the current scene with multiple Gaussians. Subsequently, we train a prediction network to forecast the Gaussian representation of future scenes conditioned on planned actions. Failure is detected when the predicted future significantly deviates from the real observation after action execution. In such cases, the end-effector rolls back to the previous action to avoid an untrained state. Integrating this approach with the PerAct framework, we develop a self-correcting robot manipulation policy. Evaluations on ten RL-Bench tasks with 166 variations demonstrate the superior performance of the proposed method, which outperforms state-of-the-art methods by 12.0% success rate on average. Copyright © 2025, Association for the Advancement of Artificial Intelligence (***). All rights reserved.

关键词： intelligent robots

来源：评论

学校读者我要写书评

暂无评论

Detection of key Components of Transmission Lines Based on the Improved YOLOv8 3

Detection of Key Components of Transmission Lines Based on t...

引用

3rd International Conference on Image Processing, Computer Vision and Machine Learning, ICICML 2024

作者： Xie, Yuhang Tian, Lianfang Du, Qiliang Yuan, Ling South China University of Technology School of Automation Science and Engineering Guangzhou China South China University of Technology Guangdong Engineering Research Center of Cloud-Edge-End Collaboration Technology for Smart City School of Automation Science and Engineering Guangzhou China South China University of Technology Key Laboratory of Large-Model Embodied-Intelligent Humanoid Robot (2024KSYS004) School of Automation Science and Engineering Guangzhou China

ISBN: (纸本)9798350355413

The detection of key components in transmission lines faces challenges such as significant variations in object scales, complex backgrounds, and difficulties in detecting small targets, leading to low detection accuracy and missed detections. In this study, we propose an improved YOLOv8 algorithm for detecting key components of transmission lines. First, we incorporate Deformable Convolution (DCNv3) to improve the backbone network's feature extraction capability and mitigate accuracy degradation caused by occlusion and angle variations. Subsequently, we use the Adaptively Spatial Feature Fusion (ASFF) to progressively fuse features of different scales and add same-layer skip connections, enabling efficient feature fusion and strengthening the model's ability to detect small objects. Finally, we replaced the original CIoU with the RIoU to further boost the detection capability for small objects. The modified algorithm achieves an mAP50 of 96.5%, an mAP50:95 of 79.2%, representing a 3.1% and 4.8% improvement compared to the original YOLOv8 model. © 2024 IEEE.

关键词： ASFF component DCNv3 Obeject Detection Transmission line YOLOv8

来源：评论

学校读者我要写书评

暂无评论

Tightly-coupled Lidar-Visual-Inertial SLAM for Mobile robot

Tightly-coupled Lidar-Visual-Inertial SLAM for Mobile Robot

引用

Image Processing, Computer Vision and Machine Learning (ICICML), International Conference on

作者： Qiliang Du Bojie Chen Lianfang Tian Ling Yuan School of Automation Science and Engineering South China University of Technology Key Laboratory of Large-Model Embodied-Intelligent Humanoid Robot (2024KSYS004) Guangzhou China School of Automation Science and Engineering South China University of Technology Guangzhou China Guangdong Engineering Research Center of Cloud-Edge-End Collaboration Technology for Smart City Guangzhou China

ISBN: (数字)9798350355413

ISBN: (纸本)9798350355420

Simultaneous Localization and Mapping (SLAM) enables robots to perform localization and mapping in unknown environments. Currently, mainstream single-sensor Lidar SLAM (Visual SLAM) tends to diverge and fail when faced with unstructured or textureless scenes. To address this issue, this paper proposes a robust tightly-coupled Lidar-Vision-Inertial Odometry framework (LRI-LVIO) that achieves high-precision and robust SLAM. LRI-LVIO consists of two subsystems: a visual-inertial system and a lidar-inertial system. These two subsystems are tightly coupled through an error-state iterative Kalman filter, allowing one subsystem to maintain stable operation even if the other fails, thus enhancing the robustness of SLAM in textureless and featureless environments. Additionally, the depth of visual feature points is recovered using lidar point cloud information, improving the algorithm's efficiency and accuracy. Finally, LRI-LVIO incorporates loop closure detection based on lidar point cloud keyframes, further reducing cumulative errors and enhancing localization precision. Experiments conducted in various indoor and outdoor environments show that by combining the advantages of each sensor, LRI-LVIO achieves higher accuracy and stronger robustness compared to single-sensor SLAM.

关键词： Point cloud compression Location awareness Visualization Simultaneous localization and mapping Laser radar Accuracy Machine learning algorithms Robustness Odometry Mobile robots

来源：评论

学校读者我要写书评

暂无评论

VPT: Video portraits transformer for realistic talking face generation

引用

NEURAL NETWORKS 2025年 184卷 107122页

作者： Zhang, Zhijun Zhang, Jian Mai, Weijian South China Univ Technol Sch Automat Sci & Engn Guangzhou Peoples R China Minist Educ Key Lib Autonomous Syst & Network Control Beijing Peoples R China Nanchang Univ Jiangxi Thousand Talents Plan Nanchang Jiangxi Peoples R China Jishou Univ Coll Comp Sci & Engn Jishou Peoples R China Pazhou Lab Guangdong Artificial Intelligence & Digital Econ Guangzhou Peoples R China Shaanxi Univ Technol Sch Mech Engn Shaanxi Prov Key Lab Ind Automat Hanzhong Peoples R China Changsha Normal Univ Sch Informat Sci & Engn Changsha Peoples R China Guangdong Univ Petrochem Technol Sch Automat Sci & Engn Maoming Peoples R China Guangdong Univ Petrochem Technol Inst Artificial Intelligence & Automat Maoming Peoples R China Key Lab Large Model Embodied Intelligent Humanoid Guangzhou Peoples R China Inst Super Robot Huangpu Guangzhou Peoples R China

Talking face generation is a promising approach within various domains, such as digital assistants, video editing, and virtual video conferences. Previous works with audio-driven talking faces focused primarily on the synchronization between audio and video. However, existing methods still have certain limitations in synthesizing photo-realistic video with high identity preservation, audiovisual synchronization, and facial details like blink movements. To solve these problems, a novel talking face generation framework, termed video portraits transformer (VPT) with controllable blink movements is proposed and applied. It separates the process of video generation into two stages, i.e., audio-to-landmark and landmark-to-face stages. In the audio- to-landmark stage, the transformer encoder serves as the generator used for predicting whole facial landmarks from given audio and continuous eye aspect ratio (EAR). During the landmark-to-face stage, the video-to-video (vid-to-vid) network is employed to transfer landmarks into realistic talking face videos. Moreover, to imitate real blink movements during inference, a transformer-based spontaneous blink generation module is devised to generate the EAR sequence. Extensive experiments demonstrate that the VPT method can produce photo- realistic videos of talking faces with natural blink movements, and the spontaneous blink generation module can generate blink movements close to the real blink duration distribution and frequency.

关键词： Talking face generation Transformer Blink movement generation

来源：评论

学校读者我要写书评

暂无评论

Multi-Resolution Convolution for 3D Semantic Segmentation

Multi-Resolution Convolution for 3D Semantic Segmentation

引用

intelligent robotics and Automatic Control (IRAC), International Conference on

作者： Qihui Li Lianfang Tian Qiliang Du Yuhang Xie School of Automation Science and Engineering South China University of Technology Guangzhou China School of Automation Science and Engineering South China University of Technology Guangdong Engineering Research Center of Cloud-Edge-End Collaboration Technology for Smart City Guangzhou China School of Automation Science and Engineering South China University of Technology Key Laboratory of Large-Model Embodied-Intelligent Humanoid Robot (2024KSYS004) Guangzhou China

ISBN: (数字)9798350389807

ISBN: (纸本)9798350389814

Effective capture of multi-scale features is crucial for improving performance in 3D point cloud semantic segmentation tasks. This paper introduces a novel framework that enhances the extraction of semantic information from complex objects in 3D point clouds using multi-resolution techniques. By utilizing varying voxel resolutions and convolutional kernel sizes, we integrate high-resolution voxels to capture fine details and low-resolution voxels to extract global features, achieving robust feature fusion. Experimental results demonstrate the effectiveness of our proposed network validated on the ScanNet v2 dataset, particularly excelling in the semantic segmentation task for small objects and complex scenes. This study highlights the significance of multi-resolution strategies in 3D scene understanding, providing new insights for future research in the field.

关键词： Point cloud compression Solid modeling Three-dimensional displays Convolution Semantic segmentation Semantics Feature extraction Data models Kernel robots

来源：评论

学校读者我要写书评

暂无评论

Detection of key Components of Transmission Lines Based on the Improved YOLOv8

Detection of Key Components of Transmission Lines Based on t...

引用

Image Processing, Computer Vision and Machine Learning (ICICML), International Conference on

作者： Yuhang Xie Lianfang Tian Qiliang Du Ling Yuan School of Automation Science and Engineering South China University of Technology Guangzhou China School of Automation Science and Engineering South China University of Technology Guangdong Engineering Research Center of Cloud-Edge-End Collaboration Technology for Smart City Guangzhou China School of Automation Science and Engineering South China University of Technology Key Laboratory of Large-Model Embodied-Intelligent Humanoid Robot (2024KSYS004) Guangzhou China

ISBN: (数字)9798350355413

ISBN: (纸本)9798350355420

The detection of key components in transmission lines faces challenges such as significant variations in object scales, complex backgrounds, and difficulties in detecting small targets, leading to low detection accuracy and missed detections. In this study, we propose an improved YOLOv8 algorithm for detecting key components of transmission lines. First, we incorporate Deformable Convolution (DCNv3) to improve the backbone network's feature extraction capability and mitigate accuracy degradation caused by occlusion and angle variations. Subsequently, we use the Adaptively Spatial Feature Fusion (ASFF) to progressively fuse features of different scales and add same-layer skip connections, enabling efficient feature fusion and strengthening the model's ability to detect small objects. Finally, we replaced the original CIoU with the RIoU to further boost the detection capability for small objects. The modified algorithm achieves an mAP 50 of 96.5%, an mAP 50:95 of 79.2%, representing a 3.1% and 4.8% improvement compared to the original YOLOv8 model.

关键词： Power transmission lines Accuracy Convolution Shape Computational modeling Feature extraction Propagation losses Neck Standards Load modeling

来源：评论

学校读者我要写书评

暂无评论

Mobile robot Navigation Using ORB-SLAM3: A Multi-Modal Autonomous System

Mobile Robot Navigation Using ORB-SLAM3: A Multi-Modal Auton...

引用

Electrical Automation and Artificial Intelligence (ICEAAI), International Conference on

作者： Ariano Da Conceicao Lianfang Tian Qiliang Du Ling Yuan School of Automation Science and Engineering South China University of Technology Guangzhou China Research Institute of Modern Industrial Innovation South China University of Technology Guangzhou China Guangdong Engineering Research Center of Cloud-Edge-End Collaboration Technology for Smart City Guangzhou China Key Laboratory of Autonomous Systems and Network Control of Ministry of Education Guangzhou China Key Laboratory of Large-Model Embodied-Intelligent Humanoid Robot (2024KSYS004) Guangzhou China Sino-Singapore International Joint Research Institute Guangzhou China

ISBN: (数字)9798331506797

ISBN: (纸本)9798331506803

Indoor mobile robots require reliable solutions for mapping, localization, and navigation tasks. This paper presents a mobile robot system that implements the ability to realize mapping using visual SLAM with ORB-SLAM3 and LiDAR-based mapping with Gmapping for indoor perception and navigation through a modular ROS architecture. The framework also implements autonomous navigation and an autonomous Rapidly-exploring Random Trees (RRT) exploration module with adaptive reset mechanisms. The system operates on a ROS-based platform equipped with an Intel RealSense D435i RGB-D camera and LDS-02 LiDAR scanner. Experimental validation in simulated indoor environments demonstrates the system's capabilities and the presented architecture achieves computational efficiency through selective utilization of mapping modules while maintaining mapping accuracy through complementary sensor modalities.

关键词： Visualization Simultaneous localization and mapping Accuracy Laser radar Navigation Service robots Heuristic algorithms Computer architecture Mobile robots Autonomous robots

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：