We propose a comprehensive computer vision framework that integrates multi-scale signal processing with an enhanced ConvNeXt-YOLO architecture for robust object detection. Our framework addresses three critical challe...
详细信息
ISBN:
(纸本)9798350377040;9798350377033
We propose a comprehensive computer vision framework that integrates multi-scale signal processing with an enhanced ConvNeXt-YOLO architecture for robust object detection. Our framework addresses three critical challenges in visual recognition: multi-scale feature representation, signal quality enhancement, and model generalization. The framework implements a sophisticated signalprocessing pipeline for image preprocessing. Initially, we develop an adaptive resolution normalization algorithm that maintains consistent feature quality across varying input dimensions. Subsequently, we design a context-aware Gaussian filtering mechanism that optimizes the signal-to-noise ratio while preserving essential feature characteristics. These preprocessing techniques significantly enhance the framework's capability to extract discriminative features and maintain computational stability. To optimize the learning process, we introduce a systematic data augmentation strategy incorporating both geometric and signal-level transformations. Our approach combines predetermined rotation sampling (90 degrees, 180 degrees, 270 degrees) with continuous-space ROI augmentation during inference. This hybrid strategy enables the framework to achieve rotation invariance and enhanced generalization capabilities, particularly beneficial for complex object detection scenarios. The core innovation lies in our architectural integration of ConvNeXt with YOLO. We redesign the feature extraction backbone using hierarchical ConvNeXt blocks, enabling efficient multi-scale feature learning. The cross-branch information fusion mechanism, coupled with our signal-aware design, substantially improves the model's representational capacity. Experimental results on standard computer vision benchmarks demonstrate superior performance, achieving state-of-the-art accuracy (improvement of X%) and recall rates (improvement of Y%) compared to conventional approaches.
In virtual reality, there is a growing demand for reconstructing accurate full-body 3D avatars from the sparse motion captured through head-mounted displays and hand-held controllers. However, due to the limited infor...
详细信息
In virtual reality, there is a growing demand for reconstructing accurate full-body 3D avatars from the sparse motion captured through head-mounted displays and hand-held controllers. However, due to the limited information from sparse inputs, precisely reconstructing full body poses is an ill-posed and challenging task. Existing methods often exhibit notable errors in lower body poses, which results in unrealistic poses and occasional floor penetration artifacts. To address the above issue, a MLP-based model with Kinematic Constraints and Temporal Diversity (KCTD) was proposed for full body poses reconstruction, which incorporates Kinematic Constraints Hierarchical Decoder with Temporal Diversity Awareness Module and a Generative Feedback Module to further improve the accuracy of the reconstruction of the full body poses. Specifically, the potential constraints of human kinematic chain are incorporated into the model through a hierarchical decoder, which elevates overall precision through the interaction of the human kinematic chain. Then, a temporal diversity awareness module is integrated into the hierarchical decoder to help the model capture information at different frequency in the time domain. In addition, a generative feedback module is imposed on leg poses reconstruction to further improve its accuracy without increasing the model's inference time. Test results on the AMASS dataset demonstrate that, the proposed model effectively improves the reconstruction accuracy of the full-body poses with the mean per joint rotation error and position error of of 2.60 and 3.62 respectively, which surpasses the state-of-the-art methods. Particularly, the proposed model can alleviate irrational poses in the lower body and reduce the floor penetration artifacts.
暂无评论