Banding artifacts in images stem from limitations in color bit depth, image compression, or over-editing, significantly degrades image quality, especially in regions with smooth gradients. Image debanding is about eli...
详细信息
Banding artifacts in images stem from limitations in color bit depth, image compression, or over-editing, significantly degrades image quality, especially in regions with smooth gradients. Image debanding is about eliminating these artifacts while preserving the authenticity of image details. This paper introduces a novel approach to image debanding using a cross-scale invertible neural network (INN). The proposed INN is information-lossless and enhanced by amore effective cross-scale scheme. Additionally, we present a technique called banded deformable convolution, which fully leverages the anisotropic properties of banding artifacts. This technique is more compact, efficient, and exhibits better generalization compared to existing deformable convolution methods. Our proposed INN exhibits superior performance in both quantitative metrics and visual quality, as evidenced by the results of the experiments.
Language-conditioned robotic manipulation in unstructured environments presents significant challenges for intelligentrobotic systems. However, due to partial observation or imprecise action prediction, failure may b...
详细信息
The detection of key components in transmission lines faces challenges such as significant variations in object scales, complex backgrounds, and difficulties in detecting small targets, leading to low detection accura...
详细信息
Simultaneous Localization and Mapping (SLAM) enables robots to perform localization and mapping in unknown environments. Currently, mainstream single-sensor Lidar SLAM (Visual SLAM) tends to diverge and fail when face...
详细信息
ISBN:
(数字)9798350355413
ISBN:
(纸本)9798350355420
Simultaneous Localization and Mapping (SLAM) enables robots to perform localization and mapping in unknown environments. Currently, mainstream single-sensor Lidar SLAM (Visual SLAM) tends to diverge and fail when faced with unstructured or textureless scenes. To address this issue, this paper proposes a robust tightly-coupled Lidar-Vision-Inertial Odometry framework (LRI-LVIO) that achieves high-precision and robust SLAM. LRI-LVIO consists of two subsystems: a visual-inertial system and a lidar-inertial system. These two subsystems are tightly coupled through an error-state iterative Kalman filter, allowing one subsystem to maintain stable operation even if the other fails, thus enhancing the robustness of SLAM in textureless and featureless environments. Additionally, the depth of visual feature points is recovered using lidar point cloud information, improving the algorithm's efficiency and accuracy. Finally, LRI-LVIO incorporates loop closure detection based on lidar point cloud keyframes, further reducing cumulative errors and enhancing localization precision. Experiments conducted in various indoor and outdoor environments show that by combining the advantages of each sensor, LRI-LVIO achieves higher accuracy and stronger robustness compared to single-sensor SLAM.
Talking face generation is a promising approach within various domains, such as digital assistants, video editing, and virtual video conferences. Previous works with audio-driven talking faces focused primarily on the...
详细信息
Talking face generation is a promising approach within various domains, such as digital assistants, video editing, and virtual video conferences. Previous works with audio-driven talking faces focused primarily on the synchronization between audio and video. However, existing methods still have certain limitations in synthesizing photo-realistic video with high identity preservation, audiovisual synchronization, and facial details like blink movements. To solve these problems, a novel talking face generation framework, termed video portraits transformer (VPT) with controllable blink movements is proposed and applied. It separates the process of video generation into two stages, i.e., audio-to-landmark and landmark-to-face stages. In the audio- to-landmark stage, the transformer encoder serves as the generator used for predicting whole facial landmarks from given audio and continuous eye aspect ratio (EAR). During the landmark-to-face stage, the video-to-video (vid-to-vid) network is employed to transfer landmarks into realistic talking face videos. Moreover, to imitate real blink movements during inference, a transformer-based spontaneous blink generation module is devised to generate the EAR sequence. Extensive experiments demonstrate that the VPT method can produce photo- realistic videos of talking faces with natural blink movements, and the spontaneous blink generation module can generate blink movements close to the real blink duration distribution and frequency.
Effective capture of multi-scale features is crucial for improving performance in 3D point cloud semantic segmentation tasks. This paper introduces a novel framework that enhances the extraction of semantic informatio...
详细信息
ISBN:
(数字)9798350389807
ISBN:
(纸本)9798350389814
Effective capture of multi-scale features is crucial for improving performance in 3D point cloud semantic segmentation tasks. This paper introduces a novel framework that enhances the extraction of semantic information from complex objects in 3D point clouds using multi-resolution techniques. By utilizing varying voxel resolutions and convolutional kernel sizes, we integrate high-resolution voxels to capture fine details and low-resolution voxels to extract global features, achieving robust feature fusion. Experimental results demonstrate the effectiveness of our proposed network validated on the ScanNet v2 dataset, particularly excelling in the semantic segmentation task for small objects and complex scenes. This study highlights the significance of multi-resolution strategies in 3D scene understanding, providing new insights for future research in the field.
The detection of key components in transmission lines faces challenges such as significant variations in object scales, complex backgrounds, and difficulties in detecting small targets, leading to low detection accura...
详细信息
ISBN:
(数字)9798350355413
ISBN:
(纸本)9798350355420
The detection of key components in transmission lines faces challenges such as significant variations in object scales, complex backgrounds, and difficulties in detecting small targets, leading to low detection accuracy and missed detections. In this study, we propose an improved YOLOv8 algorithm for detecting key components of transmission lines. First, we incorporate Deformable Convolution (DCNv3) to improve the backbone network's feature extraction capability and mitigate accuracy degradation caused by occlusion and angle variations. Subsequently, we use the Adaptively Spatial Feature Fusion (ASFF) to progressively fuse features of different scales and add same-layer skip connections, enabling efficient feature fusion and strengthening the model's ability to detect small objects. Finally, we replaced the original CIoU with the RIoU to further boost the detection capability for small objects. The modified algorithm achieves an mAP 50 of 96.5%, an mAP 50:95 of 79.2%, representing a 3.1% and 4.8% improvement compared to the original YOLOv8 model.
Indoor mobile robots require reliable solutions for mapping, localization, and navigation tasks. This paper presents a mobile robot system that implements the ability to realize mapping using visual SLAM with ORB-SLAM...
详细信息
ISBN:
(数字)9798331506797
ISBN:
(纸本)9798331506803
Indoor mobile robots require reliable solutions for mapping, localization, and navigation tasks. This paper presents a mobile robot system that implements the ability to realize mapping using visual SLAM with ORB-SLAM3 and LiDAR-based mapping with Gmapping for indoor perception and navigation through a modular ROS architecture. The framework also implements autonomous navigation and an autonomous Rapidly-exploring Random Trees (RRT) exploration module with adaptive reset mechanisms. The system operates on a ROS-based platform equipped with an Intel RealSense D435i RGB-D camera and LDS-02 LiDAR scanner. Experimental validation in simulated indoor environments demonstrates the system's capabilities and the presented architecture achieves computational efficiency through selective utilization of mapping modules while maintaining mapping accuracy through complementary sensor modalities.
暂无评论