检索结果-内蒙古大学图书馆

7th International Conference on Automation Electronics and Electrical Engineering

作者： Jiang, Bojun Lin, Zekai Li, Zhuoyuan Jia, Ce Lu, Haiyang Northeastern Univ Shenyang Peoples R China Liaoning Tech Univ Coll Min Fuxin Peoples R China Liaoning Geol Engn Vocat Coll Dandong Peoples R China

ISBN: (纸本)9798350377040;9798350377033

We propose a comprehensive computer vision framework that integrates multi-scale signal processing with an enhanced ConvNeXt-YOLO architecture for robust object detection. Our framework addresses three critical challenges in visual recognition: multi-scale feature representation, signal quality enhancement, and model generalization. The framework implements a sophisticated signal processing pipeline for image preprocessing. Initially, we develop an adaptive resolution normalization algorithm that maintains consistent feature quality across varying input dimensions. Subsequently, we design a context-aware Gaussian filtering mechanism that optimizes the signal-to-noise ratio while preserving essential feature characteristics. These preprocessing techniques significantly enhance the framework's capability to extract discriminative features and maintain computational stability. To optimize the learning process, we introduce a systematic data augmentation strategy incorporating both geometric and signal-level transformations. Our approach combines predetermined rotation sampling (90 degrees, 180 degrees, 270 degrees) with continuous-space ROI augmentation during inference. This hybrid strategy enables the framework to achieve rotation invariance and enhanced generalization capabilities, particularly beneficial for complex object detection scenarios. The core innovation lies in our architectural integration of ConvNeXt with YOLO. We redesign the feature extraction backbone using hierarchical ConvNeXt blocks, enabling efficient multi-scale feature learning. The cross-branch information fusion mechanism, coupled with our signal-aware design, substantially improves the model's representational capacity. Experimental results on standard computer vision benchmarks demonstrate superior performance, achieving state-of-the-art accuracy (improvement of X%) and recall rates (improvement of Y%) compared to conventional approaches.

关键词： Computer Vision Framework multi-scale signal processing Deep Learning Architecture ConvNeXt-YOLO Integration Feature Fusion Advanced Object Detection Semi-supervised Learning Object Summary

来源：评论

学校读者我要写书评

暂无评论

Enhancing Motion Reconstruction From Sparse Tracking Inputs With Kinematic Constraints

引用

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING 2025年 22卷 5029-5037页

作者： Dai, Xiaokun Zhang, Xinkang Li, Shiman Chen, Xinrong Fudan Univ Acad Engn & Technol Shanghai 200433 Peoples R China Fudan Univ Yiwu Res Inst Yiwu 322000 Peoples R China Fudan Univ Sch Basic Med Sci Shanghai 200433 Peoples R China Shanghai Key Lab Med Image Comp & Comp Assisted In Shanghai 200032 Peoples R China Fudan Univ Acad Engn & Technol Shanghai 200433 Peoples R China Shanghai Key Lab Med Image Comp & Comp Assisted In Shanghai 200032 Peoples R China

In virtual reality, there is a growing demand for reconstructing accurate full-body 3D avatars from the sparse motion captured through head-mounted displays and hand-held controllers. However, due to the limited information from sparse inputs, precisely reconstructing full body poses is an ill-posed and challenging task. Existing methods often exhibit notable errors in lower body poses, which results in unrealistic poses and occasional floor penetration artifacts. To address the above issue, a MLP-based model with Kinematic Constraints and Temporal Diversity (KCTD) was proposed for full body poses reconstruction, which incorporates Kinematic Constraints Hierarchical Decoder with Temporal Diversity Awareness Module and a Generative Feedback Module to further improve the accuracy of the reconstruction of the full body poses. Specifically, the potential constraints of human kinematic chain are incorporated into the model through a hierarchical decoder, which elevates overall precision through the interaction of the human kinematic chain. Then, a temporal diversity awareness module is integrated into the hierarchical decoder to help the model capture information at different frequency in the time domain. In addition, a generative feedback module is imposed on leg poses reconstruction to further improve its accuracy without increasing the model's inference time. Test results on the AMASS dataset demonstrate that, the proposed model effectively improves the reconstruction accuracy of the full-body poses with the mean per joint rotation error and position error of of 2.60 and 3.62 respectively, which surpasses the state-of-the-art methods. Particularly, the proposed model can alleviate irrational poses in the lower body and reduce the floor penetration artifacts.

关键词： Kinematics Accuracy Solid modeling Image reconstruction Decoding Task analysis Predictive models Virtual reality pose reconstruction human kinematic prior multi-scale signal processing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：