Cardiac Magnetic Resonance Imaging (CMRI) presents complex morphological features in 3d space due to different cardiac cycles, respiratory motions and individual differences, with diverse variations in global morpholo...
详细信息
digitizing physical objects into the virtual world has the potential to unlock new research and applications in embodied AI and mixed reality. This work focuses on recreating interactive digital twins of real-world ar...
详细信息
ISBN:
(数字)9781665469463
ISBN:
(纸本)9781665469463
digitizing physical objects into the virtual world has the potential to unlock new research and applications in embodied AI and mixed reality. This work focuses on recreating interactive digital twins of real-world articulated objects, which can be directly imported into virtual environments. We introduce ditto to learn articulation model estimation and3d geometry reconstruction of an articulated object through interactive perception. Given a pair of visual observations of an articulated object before and after interaction, ditto reconstructs part-level geometry and estimates the articulation model of the object. We employ implicit neural representations for joint geometry and articulation modeling. Our experiments show that ditto effectively builds digital twins of articulated objects in a category-agnostic way. We also apply ditto to real-world objects anddeploy the recreateddigital twins in physical simulation. Code and additional results are available at https://***/ditto/
The proceedings contain 69 papers. The special focus in this conference is on Big data, IoT and Machine Learning. The topics include: Phylogeny reconstruction Using k-mer derived Transition Features;developing an Inte...
ISBN:
(纸本)9789819989362
The proceedings contain 69 papers. The special focus in this conference is on Big data, IoT and Machine Learning. The topics include: Phylogeny reconstruction Using k-mer derived Transition Features;developing an Interpretable Machine Learning Model for divorce Prediction;riot Perception and Safety Navigation of Autonomous Vehicles Using deep Learning;An Explainable AI Enable Approach to Reveal Feature Influences on Social Media Customer Purchase decisions;Field Programmable Gate Array in dNA Computing;XAI-driven Model Explainability and Prediction of P2P Bank Loan default Network;design Implication of a Compact-Sized, Low-Fidelity Rover for Tough Terrain Exploration;VioNet: An Enhanced Violence detection Approach for Videos Using a Fusion Model of Vision Transformer with Bi-LSTM and3d Convolutional Neural Networks;rank Your Summaries: Enhancing Bengali Text Summarization Via Ranking-Based Approach;an Efficient Machine Learning Classification Model for Rainfall Prediction in Bangladesh;study on the Analysis and Prediction of drug Addiction Among University Students of Bangladesh Using Machine Learning;A deep CNN-Based Approach for Revolutionizing Bengali Handwritten Numeral Recognition;performance Analysis of Multiple deep Learning Models for Image Retrieval Problems;Advancing Lung Cancer diagnosis Through deep Learning and Grad-CAM-Basedvisualization Techniques;a Novel Approach to detect Stroke from 2d Images Using deep Learning;Enhancing Pneumonia diagnosis: An Ensemble of deep CNN architectures for Accurate Chest X-Ray Image Analysis;dataset for Road Roughness Assessment Using Image Classification Techniques anddeep Learning Models: A Case Study on Bangladeshi National Highways;Noise-Aware-Based Texture descriptor, Evaluation Adjacent distance Local Ternary Pattern EAdLTP for Image Classification.
Using photorealistic look-alike avatars may enhance the likeability and realism of avatars in collaborative virtual environments. This research seeks to determine the influence of head shape, texture fidelity and head...
详细信息
ISBN:
(数字)9798350374490
ISBN:
(纸本)9798350374506
Using photorealistic look-alike avatars may enhance the likeability and realism of avatars in collaborative virtual environments. This research seeks to determine the influence of head shape, texture fidelity and head orientation of a look-alike avatar on perception of likeability and visual realism, especially when judged by other people. Two textured look-alike avatars were generated using: (i) three-dimensional (3d) stereophotogrammetry; and (ii) 3d face reconstruction from a single full-face image. Participants compared three different head orientations (0°, 45°, 90°) of the look-alike avatars' textured heads to their corresponding head silhouettes, to emphasize the differences in head shapes. Results suggest that participants prefer geometrically accurate photorealistic avatars of other people due to the accuracy of the head shape and texture fidelity. Participants ranked the likeability and realism of the look-alike avatars similarly regardless of the head orientation.
Inherent to Computed tomography (CT) is image reconstruction, constructing 3d voxel values from noisy projection data. Modeling this inverse operation is not straightforward. Given the ill-posed nature of inverse prob...
详细信息
ISBN:
(数字)9781510649385
ISBN:
(纸本)9781510649385;9781510649378
Inherent to Computed tomography (CT) is image reconstruction, constructing 3d voxel values from noisy projection data. Modeling this inverse operation is not straightforward. Given the ill-posed nature of inverse problem in CT reconstruction, data-driven methods need regularization to enhance the accuracy of the reconstructed images. Besides, generalization of the results hinges upon the availability of large training datasets with access to ground truth. This paper offers a new strategy to reconstruct CT images with the advantage of ground truth accessible through a virtual imaging trial (VIT) platform. A learned primal-dual deep neural network (LPd-dNN) employed the forward model and its adjoint as a surrogate of the imaging's geometry and physics. VIT offered simulated CT projections paired with ground truth labels from anthropomorphic human models without image noise and resolution degradation. The models included a library of anthropomorphic, computational patient models (XCAT). The dukeSim simulator was utilized to form realistic projection data emulating the impact of the physics and geometry of a commercial-equivalent CT scanner. The resultant noisy sinogram data associated with each slice was thus generated for training. Corresponding linear attenuation coefficients of phantoms' materials at the effective energy of the x-ray spectrum were used as the ground truth labels. The LPd-dNN was deployed to learn the complex operators and hyper-parameters in the proximal primal-dual optimization The obtained validation results showed a 12% normalized root mean square error with respect to the ground truth labels, a peak signal-to-noise ratio of 32 dB, a signal-to-noise ratio of 1.5, and a structural similarity index of 96%. These results were highly favorable compared to standard filtered-back projection reconstruction (65%, 17 dB, 1.0, 26%).
Estimating depth from multiple images to recover 3d object surfaces is an important and challenging computer vision task. In this paper, an effective multi-view stereo network is proposed with a multi-scale feature ag...
详细信息
Face recognition system includes face detection, face positioning, and face identification. ith the advent of the information age, identity authentication has become more and more important, and face recognition has b...
详细信息
ISBN:
(数字)9781510651838
ISBN:
(纸本)9781510651838;9781510651821
Face recognition system includes face detection, face positioning, and face identification. ith the advent of the information age, identity authentication has become more and more important, and face recognition has become the mainstream method of identity verification due to its non-invasiveness, easy availability, and high reliability. With the current rapiddevelopment of artificial intelligence, it is of practical significance to introduce deep learning methods into face recognition. My work uses the caffe framework to design an 18-layer network model. At the same time, 448,808 pictures of 1,583 objects in the YouTube Face data set were used as training set, and 111,403 pictures of 1583 objects were used as verification sets. After preprocessing. These images, we entered them into the network for training our model. The facerecognition accuracy of the final model reached 99.7%. Next, my work did a forward-looking work for 3d face verification: single-view 3d face reconstruction. At present, the lack of open source 3d face database can be said to be the biggest obstacle to 3d face verification if you want to use deep learning algorithms. In order to solve this problem, my work will focus on the reconstruction of 3d human faces. Traditional 3d face reconstruction methods are either unstable or over-regularized. So I tried to apply the deep learning method to the 3d face reconstruction. First of all, in order to solve the problem of lacking 3d model data, we improved the current Multi-view 3d face reconstruction methods, using the 3d Morphable Models (3dMM) to generate huge numbers of labeled examples. Next, design the network and train it, the network finally implemented the function of constructing a 3d model of the object from a 2d picture of itself.
3-d image super-resolution (3-d-SR), the use of convolutional neural networks (CNNs), is a technique that is being explored to decorate the decision of three-d pics. With the aid of leveraging the electricity of deep ...
详细信息
ISBN:
(数字)9798350354171
ISBN:
(纸本)9798350354188
3-d image super-resolution (3-d-SR), the use of convolutional neural networks (CNNs), is a technique that is being explored to decorate the decision of three-d pics. With the aid of leveraging the electricity of deep mastering and CNN architectures, 3-d-SR is able to produce 3-d pictures from low-resolution 3d pix. The goal of this method is to get a better understanding of the missing functions from a low-resolution 3-d photograph that allows you to recover the lost details. The three-d-SR version makes use of a deep mastering framework to examine a mapping among a low-decision 3d picture and a high-decision 3dreconstruction. This mapping is then used to reconstruct a better-decision 3-d image of the equal scene from the low-decision model. Via the usage of this method, a clean development in resolution may be executed, which can then be used to permit further three-d image analysis.
We propose LIVE-GS, a highly realistic interactive Gaussian splatting system in VR environments powered by LLM. Our pipeline supports reconstructions and physically-based interactions in VR, integrating object-aware r...
详细信息
ISBN:
(数字)9798331514846
ISBN:
(纸本)9798331525637
We propose LIVE-GS, a highly realistic interactive Gaussian splatting system in VR environments powered by LLM. Our pipeline supports reconstructions and physically-based interactions in VR, integrating object-aware reconstruction, GPT-assisted inpainting, and a computationally efficient simulation framework. To enhance scene understanding, we prompt GPT-4o to analyze the physical properties of objects in the scene, thereby guiding physical simulations to align with real-world phenomena. Our experimental results demonstrate that with the assistance of LLM’s understanding and scene enhancement, our VR system can support complex and realistic interactions without requiring additional manual design or annotation.
Creating relightable and animatable human characters from monocular video at a low cost is a critical task for digital human modeling andvirtual reality applications. This task is complexdue to intricate articulatio...
ISBN:
(纸本)9798350307184
Creating relightable and animatable human characters from monocular video at a low cost is a critical task for digital human modeling andvirtual reality applications. This task is complexdue to intricate articulation motion, a wide range of ambient lighting conditions, and pose-dependent clothing deformations. In this paper, we introduce a novel self-supervised framework that takes a monocular video of a moving human as input and generates a 3d neural representation capable of being rendered with novel poses under arbitrary lighting conditions. Our framework decomposes dynamic humans under varying illumination into neural fields in canonical space, taking into account geometry and spatially varying BRdF material properties. Additionally, we introduce pose-driven deformation fields, enabling bidirectional mapping between canonical space and observation. Leveraging the proposed appearance decomposition anddeformation fields, our framework learns in a self-supervised manner. Ultimately, based on pose-driven deformation, recovered appearance, and physically-based rendering, the reconstructed human figure becomes relightable and can be explicitly driven by novel poses. We demonstrate significant performance improvements over previous works and provide compelling examples of relighting from monocular videos of moving humans in challenging, uncontrolled capture scenarios.
暂无评论