检索结果-内蒙古大学图书馆

A Spatial Transformation Based Next Frame Predictor

IEEE ACCESS 2025年 13卷 5406-5423页

作者： Mokssit, Saad Licea, Daniel Bonilla Guermah, Bassma Ghogho, Mounir Int Univ Rabat Coll Engn & Architecture TICLab Rabat 11103 Morocco Mohammed VI Polytech Univ Coll Comp Ben Guerir 43150 Morocco Univ Leeds Fac Engn Leeds LS2 9JT England

In recent years, the automobile industry has achieved astonishing success in making autonomous cars safer, more affordable, and more reliable. However, current autonomous driving technology is mainly based on reactive controllers that attempt to respond to the various events the car encounters. Yet, achieving a truly safe and reliable autonomous system necessitates anticipating such events and planning the correct actions in advance to avoid undesirable behavior. Recent advances in deep learning have shown remarkable performance in predicting future frames from video sequences. However, most of these approaches can only handle a few moving elements in the scene and perform poorly when the camera is in motion. This is mainly due to the difficulty of disentangling camera intrinsic motion from object-dependent motion. In this work, we equip autonomous cars with an object-oriented next-frame predictor that leverages Transformer architecture to extract, for each moving object in the scene, a spatial transformation applied to the object to predict its configuration in the next frame. Static elements of the scene are then used to estimate camera intrinsic motion, which is applied to the background to predict how it will be viewed in the next frame. Notably, our approach significantly reduces the complexity typically associated with such models by requiring the estimation of only 14 parameters per moving object, independent of image resolution. We have validated the generalization capabilities of our model through training on simulated datasets and testing on real-world datasets. The results indicate that our model not only outperforms existing models trained solely on real data but also exhibits superior resilience to occlusions and incomplete data in the input sequences. These findings underscore the potential of our model to significantly improve the predictive analytics capabilities of autonomous driving systems, thereby enhancing their safety and reliability in dynamic

关键词： Next frame prediction moving camera image reconstruction deep learning projective and affine transforms Next frame prediction moving camera image reconstruction deep learning projective and affine transforms

来源：评论

学校读者我要写书评

暂无评论

MTFusion: Reconstructing Any 3D Object from Single image Using Multi-word Textual Inversion 7th

MTFusion: Reconstructing Any 3D Object from Single Image Usi...

引用

7th Chinese conference on Pattern Recognition and Computer vision

作者： Liu, Yu Wang, Ruowei Li, Jiaqi Xu, Zixiang Zhao, Qijun Sichuan Univ Natl Key Lab Fundamental Sci Synthet Vis Chengdu Peoples R China Sichuan Univ Coll Comp Sci Chengdu Peoples R China

ISBN: (纸本)9789819785070;9789819785087

Reconstructing 3D models from single-view images is a longstanding problem in computer vision. The latest advances for singleimage 3D reconstruction extract a textual description from the input image and further utilize it to synthesize 3D models. However, existing methods focus on capturing a single key attribute of the image (e.g., object type, artistic style) and fail to consider the multi-perspective information required for accurate 3D reconstruction, such as object shape and material properties. Besides, the reliance on Neural Radiance Fields hinders their ability to reconstruct intricate surfaces and texture details. In this work, we propose MTFusion, which leverages both image data and textual descriptions for high-fidelity 3D reconstruction. Our approach consists of two stages. First, we adopt a novel multi-word textual inversion technique to extract a detailed text description capturing the image's characteristics. Then, we use this description and the image to generate a 3D model with FlexiCubes. Additionally, MTFusion enhances FlexiCubes by employing a special decoder network for Signed Distance Functions, leading to faster training and finer surface representation. Extensive evaluations demonstrate that our MTFusion surpasses existing image-to-3D methods on a wide range of synthetic and real-world images. Furthermore, the ablation study proves the effectiveness of our network designs.

关键词： 3D reconstruction Diffusion Model Textual Inversion

来源：评论

学校读者我要写书评

暂无评论

Research on depth estimation and human body reconstruction effects in single-image scene reconstruction 2

Research on depth estimation and human body reconstruction e...

引用

2nd International conference on Big data, Computational Intelligence, and Applications, BDCIA 2024

作者： Zhou, Xiuyuan Li, Yuanzhen Zhu, Yaling Liu, Ruilin Yang, Lina Jing, Yongxia Lanzhou Institute of Technology Gansu Lanzhou730030 China Yunnan University Yunnan Kunming650091 China Qiongtai Normal University Hainan Haikou571100 China

ISBN: (纸本)9781510689053

With the rapid development of digital technology and deep learning, recovering 3D scene information and reconstructing human bodies from a single image has become a focal point of research in computer vision and computer graphics. This technology has also found widespread application in fields such as cultural relic restoration, autonomous driving, virtual reality, and medical image analysis. In this paper, we explore the challenges posed by the network's contextual perception abilities and the influence of loss functions, which can lead to issues like incomplete depth structures, depth drift, and texture copying. To overcome these obstacles, we propose refined methods that produce highly accurate and structurally sound depth estimates, effectively resolving problems such as texture copying and depth drift. Our methods demonstrate strong generalization capabilities in human depth estimation models, enabling precise depth estimation across various scenarios. © 2025 SPIE.

关键词： virtual reality

来源：评论

学校读者我要写书评

暂无评论

Realistic and visually-Pleasing 3D Generation of Indoor Scenes from a Single image 7th

Realistic and Visually-Pleasing 3D Generation of Indoor Scen...

引用

7th Chinese conference on Pattern Recognition and Computer vision

作者： Li, Jie Wang, Lei Chen, Gongbin Li, Ang Qiu, Yuhao Wu, Jiaji Cheng, Jun Chinese Acad Sci Shenzhen Inst Adv Technol Shenzhen 518055 Peoples R China Univ Chinese Acad Sci CAS Beijing Peoples R China Shenzhen MSU BIT Univ Shenzhen Peoples R China Xidian Univ Sch Elect Engn Xian 710071 Peoples R China

ISBN: (纸本)9789819785070;9789819785087

Artificial Intelligence Generated Content (AIGC) has experienced significant advancements, particularly in the areas of natural language processing and 2D image generation. However, the generation of three-dimensional (3D) content from a single image still poses challenges, particularly when the input image contains complex backgrounds. This limitation hinders the potential applications of AIGC in areas such as human-machine interaction, virtual reality (VR), and architectural design. Despite the progress made so far, existing methods face difficulties when dealing with single images that have intricate backgrounds. Their reconstructed 3D shapes tend to be incomplete, noisy, or lack of partial geometric structures. In this paper, we introduce a 3D generation framework for indoor scenes from a single image to generate realistic and visually-pleasing 3D geometry shapes, without the requirement of point clouds, multi-view images, depth or masks as input. The main idea of our method is clustering-based 3D shape learning and prediction, followed by a shape deformation. Since more than one objects tend to be existing in indoor scenes, our framework will simultaneously generate multi-objects and predict the layout with a camera pose, as well as 3D object bounding boxes for holistic 3D scene understanding. We have evaluated the proposed framework on benchmark datasets including ShapeNet, SUN RGB-D and Pix3D, and state-of-the-art performance has been achieved. We have also given examples to illustrate immediate applications in virtual reality.

关键词： 3D mesh reconstruction Deep neural networks

来源：评论

学校读者我要写书评

暂无评论

Advanced compressed sensing photoacoustic imaging techniques using iteratively enhanced L1 regularization

Advanced compressed sensing photoacoustic imaging techniques...

引用

2024 International conference on Computer Application and Information Security, ICCAIS 2024

作者： Wang, Lu Yao, Yaru Hao, Ziqi Hu, Xiaolong School of Mathematical Sciences Liaocheng University Shandong Liaocheng252000 China

ISBN: (纸本)9781510689312

Photoacoustic imaging (PAI) offers significant advantages but faces challenges in data processing and reconstruction. Sparse reconstruction techniques and compressed sensing theory have advanced its development. Regularization methods are commonly used to address the ill-posed problem of reconstructing PAI images from incomplete and noisy experimental data. While the L1-norm efficiently promotes sparsity in regularization methods, it often results in undesirable edge smoothing, which can degrade image quality. To address this limitation, this paper employs the Modified Iterative Reweighted L1 (MIRL1) regularization method, which aims to enhance sparsity while preserving critical image features such as edges. Experiments with simulated signals containing Gaussian noise show that MIRL1 outperforms traditional L1-norm-based SPGL1 and L0-norm-based SL0 algorithms in terms of image quality and noise robustness. MIRL1 enhances image clarity and maintains strong resistance to noise, providing superior image recovery results. This finding is significant for the development of sparse sampling PAI imaging algorithms. © The Authors.

关键词： image enhancement

来源：评论

学校读者我要写书评

暂无评论

9th International Skin Imaging Collaboration Workshop, ISIC 2024, 7th International Workshop on Interpretability of Machine Intelligence in Medical image Computing, iMIMIC 2024, Embodied AI and Robotics for HealTHcare Workshop, EARTH 2024 and 5th MICCAI Workshop on Distributed, Collaborative and Federated Learning, DeCaF 2024 held at 27th International conference on Medical image Computing and Computer Assisted Intervention, MICCAI 2024

9th International Skin Imaging Collaboration Workshop, ISIC ...

引用

9th International Skin Imaging Collaboration Workshop, ISIC 2024, 7th International Workshop on Interpretability of Machine Intelligence in Medical image Computing, iMIMIC 2024, Embodied AI and Robotics for HealTHcare Workshop, EARTH 2024 and 5th MICCAI Workshop on Distributed, Collaborative and Federated Learning, DeCaF 2024 held at 27th International conference on Medical image Computing and Computer Assisted Intervention, MICCAI 2024

ISBN: (纸本)9783031776090

The proceedings contain 23 papers. The special focus in this conference is on Skin Imaging Collaboration, Interpretability of Machine Intelligence in Medical image Computing, Embodied AI and Robotics for HealTHcare Workshop and MICCAI Workshop on Distributed, Collaborative and Federated Learning. The topics include: DeCaF 2024 Preface;i2M2Net: Inter/Intra-modal Feature Masking Self-distillation for incomplete Multimodal Skin Lesion Diagnosis;from Majority to Minority: A Diffusion-Based Augmentation for Underrepresented Groups in Skin Lesion Analysis;segmentation Style Discovery: Application to Skin Lesion images;a vision Transformer with Adaptive Cross-image and Cross-Resolution Attention;lesion Elevation Prediction from Skin images Improves Diagnosis;DWARF: Disease-Weighted Network for Attention Map Refinement;PIPNet3D: Interpretable Detection of Alzheimer in MRI Scans;Detecting Unforeseen data Properties with Diffusion Autoencoder Embeddings Using Spine MRI data;interpretability of Uncertainty: Exploring Cortical Lesion Segmentation in Multiple Sclerosis;TextCAVs: Debugging vision Models Using Text;evaluating visual Explanations of Attention Maps for Transformer-Based Medical Imaging;Exploiting XAI Maps to Improve MS Lesion Segmentation and Detection in MRI;EndoGS: Deformable Endoscopic Tissues reconstruction with Gaussian Splatting;viSAGE: video Synthesis Using Action Graphs for Surgery;a Review of 3D reconstruction Techniques for Deformable Tissues in Robotic Surgery;SurgTrack: CAD-Free 3D Tracking of Real-World Surgical Instruments;MUTUAL: Towards Holistic Sensing and Inference in the Operating Room;Complex-Valued Federated Learning with Differential Privacy and MRI Applications;enhancing Privacy in Federated Learning: Secure Aggregation for Real-World Healthcare Applications;federated Impression for Learning with Distributed Heterogeneous data;A Federated Learning-Friendly Approach for Parameter-Efficient Fine-Tuning of SAM in 3D Segmentation;probing the Effic

关键词：

来源：评论

学校读者我要写书评

暂无评论

A Pose-Aware Auto-Augmentation Framework for 3D Human Pose and Shape Estimation from Partial Point Clouds 7th

A Pose-Aware Auto-Augmentation Framework for 3D Human Pose a...

引用

7th Chinese conference on Pattern Recognition and Computer vision

作者： Wang, Kangkan Yin, Sliihao Fang, Chenghao Nanjing Univ Sci & Technol Key Lab Intelligent Percept & Syst High Dimens In Minist Educ Nanjing Peoples R China Nanjing Univ Sci & Technol Sch Comp Sci & Engn Jiangsu Key Lab Image & Video Understanding Socia Nanjing Peoples R China

ISBN: (纸本)9789819785070;9789819785087

This work mainly addresses the challenges in 3D human pose and shape estimation from real partial point clouds. Existing 3D human estimation methods from point clouds usually have limited generalization ability on real data due to factors such as self-occlusion and random noise and domain gap between real data and synthetic data. In this paper, we propose a pose-aware auto-augmentation framework for 3D human pose and shape estimation from partial point clouds. Specifically, we design an occlusion-aware module for the estimator network that can obtain refined features to accurately regress human pose and shape parameters from partial point clouds, even if the point clouds are self-occlusive. Based on the pose parameters and global features of the point clouds from estimator network, we carefully design a learnable augmentor network that can intelligently drive and deform real data to enrich data diversity during the training of estimator network. To guide the augmentor network to generate challenging augmented samples, we adopt an adversarial learning strategy according to the error feedback of the estimator. The experimental results on real data and synthetic data demonstrate that the proposed approach can accurately estimate the 3D human pose and shape from partial point clouds and outperform prior works in terms of reconstruction accuracy.

关键词： 3D human pose and shape estimation Pose-aware auto-augmentation Occlusion-aware network Partial point clouds Real data

来源：评论

学校读者我要写书评

暂无评论

Compressive Imaging reconstruction via Conditional Diffusion Model With Augmented Measurements

Compressive Imaging Reconstruction via Conditional Diffusion...

引用

International conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Emmanuel Martinez Leon Suarez Romario Gualdrón-Hurtado Roman Jacome Henry Arguello Department of Systems and Informatics Engineering Universidad Industrial de Santander Bucaramanga Colombia Department of Electrical Electronics and Telecommunications Engineering Universidad Industrial de Santander Bucaramanga Colombia

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

Compressive imaging (CI) consists of reconstructing images from incomplete observed data. The reconstruction process involves solving an ill-posed inverse problem which is highly dependent on the number of real measurements, with a greater number of measurements typically leading to more accurate reconstructions. Due to their ability to learn data distributions, diffusion models (DM) have emerged as promising techniques for various inverse problems. Mainly, DMs solve inverse problems by conditioning the generation process to the acquired measurements. In this work, we introduce a new approach to improve this conditioning by exploiting synthetic measurements, which come from a synthetic sensing matrix. Synthetic measurements are estimated from real data via a neural network. The combined real and synthetic measurements form an augmented set, which is input into the conditional DM to enhance reconstruction capacity. Computational experiments demonstrate that augmenting measurements with the conditional DM improves performance compared to using only real measurements.

关键词： image coding Inverse problems Imaging Signal processing Diffusion models Robustness Sensors Noise measurement Speech processing image reconstruction

来源：评论

学校读者我要写书评

暂无评论

Generative adversarial networks in medical image reconstruction: A systematic literature review

引用

Computers in Biology and Medicine 2025年 191卷 110094-110094页

作者： Hussain, Jabbar Båth, Magnus Ivarsson, Jonas Dept. of Applied IT University of Gothenburg Forskningsgången 6 417 56 Sweden Department of Medical Radiation Sciences University of Gothenburg Sweden

Purpose: Recent advancements in generative adversarial networks (GANs) have demonstrated substantial potential in medical image processing. Despite this progress, reconstructing images from incomplete data remains a challenge, impacting image quality. This systematic literature review explores the use of GANs in enhancing and reconstructing medical imaging data. Method: A document survey of computing literature was conducted using the ACM Digital Library to identify relevant articles from journals and conference proceedings using keyword combinations, such as "generative adversarial networks or generative adversarial network," "medical image or medical imaging," and "image reconstruction." Results: Across the reviewed articles, there were 122 datasets used in 175 instances, 89 top metrics employed 335 times, 10 different tasks with a total count of 173, 31 distinct organs featured in 119 instances, and 18 modalities utilized in 121 instances, collectively depicting significant utilization of GANs in medical imaging. The adaptability and efficacy of GANs were showcased across diverse medical tasks, organs, and modalities, utilizing top public as well as private/synthetic datasets for disease diagnosis, including the identification of conditions like cancer in different anatomical regions. The study emphasized GAN's increasing integration and adaptability in diverse radiology modalities, showcasing their transformative impact on diagnostic techniques, including cross-modality tasks. The intricate interplay between network size, batch size, and loss function refinement significantly impacts GAN's performance, although challenges in training persist. Conclusions: The study underscores GANs as dynamic tools shaping medical imaging, contributing significantly to image quality, training methodologies, and overall medical advancements, positioning them as substantial components driving medical advancements. © 2025 The Authors

关键词： Medical image processing

来源：评论

学校读者我要写书评

暂无评论

User-Driven Customization in 3D Generation: Improving Stable Fast 3D with Inpainting Methods

User-Driven Customization in 3D Generation: Improving Stable...

引用

International conference on Inventive Computation Technologies (ICICT)

作者： Rahul A J. Anitha Division of Computer Science and Engineering Karunya Institute of Technology and Sciences Coimbatore India

ISBN: (数字)9798331512248

ISBN: (纸本)9798331512255

Stable Fast 3D is widely recognized for its remarkable capacity to generate 3D models from a single 2D image in as little as 0.5 seconds. This can be further improved upon by utilizing text-to-image latent diffusion especially using the inpainting technique in the stable diffusion. The purpose of this work is to improve the quality and fidelity of the generation of 3D models by allowing user-guided customizations during the reconstruction process. Inpainting confronts two significant challenges: incomplete or noisy input data, and visualization differences, by completing unobserved areas and improving input textures. Inpainting enables users to iteratively modify their inputs, and potentially provide more coherent and aesthetically pleasing final 3D models. Experimental results indicate that by utilizing inpainting incoporated with Stable Fast 3D, increases the model precision, while retaining the original speed of model generation. The method proposed in this paper expands the use of 3D reconstruction techniques to other domains including gaming, virtual reality, and product design by providing a solution that is both more interactive and easier to create high-quality 3D assets.

关键词： Solid modeling visualization Three-dimensional displays image resolution Computational modeling Lighting virtual reality Hardware Usability image reconstruction

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：