In recent years, the automobile industry has achieved astonishing success in making autonomous cars safer, more affordable, and more reliable. However, current autonomous driving technology is mainly based on reactive...
详细信息
In recent years, the automobile industry has achieved astonishing success in making autonomous cars safer, more affordable, and more reliable. However, current autonomous driving technology is mainly based on reactive controllers that attempt to respond to the various events the car encounters. Yet, achieving a truly safe and reliable autonomous system necessitates anticipating such events and planning the correct actions in advance to avoid undesirable behavior. Recent advances in deep learning have shown remarkable performance in predicting future frames fromvideo sequences. However, most of these approaches can only handle a few moving elements in the scene and perform poorly when the camera is in motion. This is mainly due to the difficulty of disentangling camera intrinsic motion from object-dependent motion. In this work, we equip autonomous cars with an object-oriented next-frame predictor that leverages Transformer architecture to extract, for each moving object in the scene, a spatial transformation applied to the object to predict its configuration in the next frame. Static elements of the scene are then used to estimate camera intrinsic motion, which is applied to the background to predict how it will be viewed in the next frame. Notably, our approach significantly reduces the complexity typically associated with such models by requiring the estimation of only 14 parameters per moving object, independent of image resolution. We have validated the generalization capabilities of our model through training on simulated datasets and testing on real-world datasets. The results indicate that our model not only outperforms existing models trained solely on real data but also exhibits superior resilience to occlusions and incompletedata in the input sequences. These findings underscore the potential of our model to significantly improve the predictive analytics capabilities of autonomous driving systems, thereby enhancing their safety and reliability in dynamic
Reconstructing 3D models from single-view images is a longstanding problem in computer vision. The latest advances for singleimage 3D reconstruction extract a textual description from the input image and further utili...
详细信息
ISBN:
(纸本)9789819785070;9789819785087
Reconstructing 3D models from single-view images is a longstanding problem in computer vision. The latest advances for singleimage 3D reconstruction extract a textual description from the input image and further utilize it to synthesize 3D models. However, existing methods focus on capturing a single key attribute of the image (e.g., object type, artistic style) and fail to consider the multi-perspective information required for accurate 3D reconstruction, such as object shape and material properties. Besides, the reliance on Neural Radiance Fields hinders their ability to reconstruct intricate surfaces and texture details. In this work, we propose MTFusion, which leverages both imagedata and textual descriptions for high-fidelity 3D reconstruction. Our approach consists of two stages. First, we adopt a novel multi-word textual inversion technique to extract a detailed text description capturing the image's characteristics. Then, we use this description and the image to generate a 3D model with FlexiCubes. Additionally, MTFusion enhances FlexiCubes by employing a special decoder network for Signed Distance Functions, leading to faster training and finer surface representation. Extensive evaluations demonstrate that our MTFusion surpasses existing image-to-3D methods on a wide range of synthetic and real-world images. Furthermore, the ablation study proves the effectiveness of our network designs.
With the rapid development of digital technology and deep learning, recovering 3D scene information and reconstructing human bodies from a single image has become a focal point of research in computer vision and compu...
详细信息
Artificial Intelligence Generated Content (AIGC) has experienced significant advancements, particularly in the areas of natural language processing and 2D image generation. However, the generation of three-dimensional...
详细信息
ISBN:
(纸本)9789819785070;9789819785087
Artificial Intelligence Generated Content (AIGC) has experienced significant advancements, particularly in the areas of natural language processing and 2D image generation. However, the generation of three-dimensional (3D) content from a single image still poses challenges, particularly when the input image contains complex backgrounds. This limitation hinders the potential applications of AIGC in areas such as human-machine interaction, virtual reality (VR), and architectural design. Despite the progress made so far, existing methods face difficulties when dealing with single images that have intricate backgrounds. Their reconstructed 3D shapes tend to be incomplete, noisy, or lack of partial geometric structures. In this paper, we introduce a 3D generation framework for indoor scenes from a single image to generate realistic and visually-pleasing 3D geometry shapes, without the requirement of point clouds, multi-view images, depth or masks as input. The main idea of our method is clustering-based 3D shape learning and prediction, followed by a shape deformation. Since more than one objects tend to be existing in indoor scenes, our framework will simultaneously generate multi-objects and predict the layout with a camera pose, as well as 3D object bounding boxes for holistic 3D scene understanding. We have evaluated the proposed framework on benchmark datasets including ShapeNet, SUN RGB-D and Pix3D, and state-of-the-art performance has been achieved. We have also given examples to illustrate immediate applications in virtual reality.
Photoacoustic imaging (PAI) offers significant advantages but faces challenges in data processing and reconstruction. Sparse reconstruction techniques and compressed sensing theory have advanced its development. Regul...
详细信息
The proceedings contain 23 papers. The special focus in this conference is on Skin Imaging Collaboration, Interpretability of Machine Intelligence in Medical image Computing, Embodied AI and Robotics for HealTHcare Wo...
ISBN:
(纸本)9783031776090
The proceedings contain 23 papers. The special focus in this conference is on Skin Imaging Collaboration, Interpretability of Machine Intelligence in Medical image Computing, Embodied AI and Robotics for HealTHcare Workshop and MICCAI Workshop on Distributed, Collaborative and Federated Learning. The topics include: DeCaF 2024 Preface;i2M2Net: Inter/Intra-modal Feature Masking Self-distillation for incomplete Multimodal Skin Lesion Diagnosis;from Majority to Minority: A Diffusion-Based Augmentation for Underrepresented Groups in Skin Lesion Analysis;segmentation Style Discovery: Application to Skin Lesion images;a vision Transformer with Adaptive Cross-image and Cross-Resolution Attention;lesion Elevation Prediction from Skin images Improves Diagnosis;DWARF: Disease-Weighted Network for Attention Map Refinement;PIPNet3D: Interpretable Detection of Alzheimer in MRI Scans;Detecting Unforeseen data Properties with Diffusion Autoencoder Embeddings Using Spine MRI data;interpretability of Uncertainty: Exploring Cortical Lesion Segmentation in Multiple Sclerosis;TextCAVs: Debugging vision Models Using Text;evaluating visual Explanations of Attention Maps for Transformer-Based Medical Imaging;Exploiting XAI Maps to Improve MS Lesion Segmentation and Detection in MRI;EndoGS: Deformable Endoscopic Tissues reconstruction with Gaussian Splatting;viSAGE: video Synthesis Using Action Graphs for Surgery;a Review of 3D reconstruction Techniques for Deformable Tissues in Robotic Surgery;SurgTrack: CAD-Free 3D Tracking of Real-World Surgical Instruments;MUTUAL: Towards Holistic Sensing and Inference in the Operating Room;Complex-Valued Federated Learning with Differential Privacy and MRI Applications;enhancing Privacy in Federated Learning: Secure Aggregation for Real-World Healthcare Applications;federated Impression for Learning with Distributed Heterogeneous data;A Federated Learning-Friendly Approach for Parameter-Efficient Fine-Tuning of SAM in 3D Segmentation;probing the Effic
This work mainly addresses the challenges in 3D human pose and shape estimation from real partial point clouds. Existing 3D human estimation methods from point clouds usually have limited generalization ability on rea...
详细信息
ISBN:
(纸本)9789819785070;9789819785087
This work mainly addresses the challenges in 3D human pose and shape estimation from real partial point clouds. Existing 3D human estimation methods from point clouds usually have limited generalization ability on real data due to factors such as self-occlusion and random noise and domain gap between real data and synthetic data. In this paper, we propose a pose-aware auto-augmentation framework for 3D human pose and shape estimation from partial point clouds. Specifically, we design an occlusion-aware module for the estimator network that can obtain refined features to accurately regress human pose and shape parameters from partial point clouds, even if the point clouds are self-occlusive. Based on the pose parameters and global features of the point clouds from estimator network, we carefully design a learnable augmentor network that can intelligently drive and deform real data to enrich data diversity during the training of estimator network. To guide the augmentor network to generate challenging augmented samples, we adopt an adversarial learning strategy according to the error feedback of the estimator. The experimental results on real data and synthetic data demonstrate that the proposed approach can accurately estimate the 3D human pose and shape from partial point clouds and outperform prior works in terms of reconstruction accuracy.
Compressive imaging (CI) consists of reconstructing images fromincomplete observed data. The reconstruction process involves solving an ill-posed inverse problem which is highly dependent on the number of real measur...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Compressive imaging (CI) consists of reconstructing images fromincomplete observed data. The reconstruction process involves solving an ill-posed inverse problem which is highly dependent on the number of real measurements, with a greater number of measurements typically leading to more accurate reconstructions. Due to their ability to learn data distributions, diffusion models (DM) have emerged as promising techniques for various inverse problems. Mainly, DMs solve inverse problems by conditioning the generation process to the acquired measurements. In this work, we introduce a new approach to improve this conditioning by exploiting synthetic measurements, which come from a synthetic sensing matrix. Synthetic measurements are estimated from real datavia a neural network. The combined real and synthetic measurements form an augmented set, which is input into the conditional DM to enhance reconstruction capacity. Computational experiments demonstrate that augmenting measurements with the conditional DM improves performance compared to using only real measurements.
Purpose: Recent advancements in generative adversarial networks (GANs) have demonstrated substantial potential in medical image processing. Despite this progress, reconstructing images fromincompletedata remains a c...
详细信息
Stable Fast 3D is widely recognized for its remarkable capacity to generate 3D models from a single 2D image in as little as 0.5 seconds. This can be further improved upon by utilizing text-to-image latent diffusion e...
详细信息
ISBN:
(数字)9798331512248
ISBN:
(纸本)9798331512255
Stable Fast 3D is widely recognized for its remarkable capacity to generate 3D models from a single 2D image in as little as 0.5 seconds. This can be further improved upon by utilizing text-to-image latent diffusion especially using the inpainting technique in the stable diffusion. The purpose of this work is to improve the quality and fidelity of the generation of 3D models by allowing user-guided customizations during the reconstruction process. Inpainting confronts two significant challenges: incomplete or noisy input data, and visualization differences, by completing unobserved areas and improving input textures. Inpainting enables users to iteratively modify their inputs, and potentially provide more coherent and aesthetically pleasing final 3D models. Experimental results indicate that by utilizing inpainting incoporated with Stable Fast 3D, increases the model precision, while retaining the original speed of model generation. The method proposed in this paper expands the use of 3D reconstruction techniques to other domains including gaming, virtual reality, and product design by providing a solution that is both more interactive and easier to create high-quality 3D assets.
暂无评论