computer graphics research has long prioritized image quality over frame rate. Yet demand for an alternative is growing, with many esports players turning off visual effects to improve frame rates. Is it time for grap...
详细信息
computer graphics research has long prioritized image quality over frame rate. Yet demand for an alternative is growing, with many esports players turning off visual effects to improve frame rates. Is it time for graphics researchers to reconsider their goals? A workshop at the 2023 SIGGRAPH Conference explored this question. Three researchers made provocative presentations, each of which were then discussed by dozens of research and industry attendees. We summarize those presentations and discussions here, concluding with potential research questions, and future plans for esports at SIGGRAPH.
The inference of 3D motion and dynamics of the human musculoskeletal system has traditionally been solved using physics-based methods that exploit physical parameters to provide realistic simulations. Yet, such method...
详细信息
The inference of 3D motion and dynamics of the human musculoskeletal system has traditionally been solved using physics-based methods that exploit physical parameters to provide realistic simulations. Yet, such methods suffer from computational complexity and reduced stability, hindering their use in computer graphics applications that require real-time performance. With the recent explosion of data capture (mocap, video) machine learning (ML) has started to become popular as it is able to create surrogate models harnessing the huge amount of data stemming from various sources, minimizing computational time (instead of resource usage), and most importantly, approximate real-time solutions. The main purpose of this paper is to provide a review and classification of the most recent works regarding motion prediction, motion synthesis as well as musculoskeletal dynamics estimation problems using ML techniques, in order to offer sufficient insight into the state-of-the-art and draw new research directions. While the study of motion may appear distinct to musculoskeletal dynamics, these application domains provide jointly the link for more natural computer graphics character animation, since ML-based musculoskeletal dynamics estimation enables modeling of more long-term, temporally evolving, ergonomic effects, while offering automated and fast solutions. Overall, our review offers an in-depth presentation and classification of ML applications in human motion analysis, unlike previous survey articles focusing on specific aspects of motion prediction.
Collision detection is a fundamental problem in various domains, such as robotics, computational physics, and computer graphics. In general, collision detection is tackled as a computational geometry problem, with the...
详细信息
Collision detection is a fundamental problem in various domains, such as robotics, computational physics, and computer graphics. In general, collision detection is tackled as a computational geometry problem, with the so-called Gilbert, Johnson, and Keerthi (GJK) algorithm being the most adopted solution nowadays. While introduced in 1988, GJK remains the most effective solution to compute the distance or the collision between two 3-D convex geometries. Over the years, it was shown to be efficient, scalable, and generic, operating on a broad class of convex shapes, ranging from simple primitives (sphere, ellipsoid, box, cone, capsule, etc.) to complex meshes involving thousands of vertices. In this article, we introduce several contributions to accelerate collision detection and distance computation between convex geometries by leveraging the fact that these two problems are fundamentally optimization problems. Notably, we establish that the GJK algorithm is a specific subcase of the well-established Frank-Wolfe (FW) algorithm in convex optimization. By adapting recent works linking Polyak and Nesterov accelerations to FW methods, we also propose two accelerated extensions of the classic GJK algorithm. Through an extensive benchmark over millions of collision pairs involving objects of daily life, we show that these two accelerated GJK extensions significantly reduce the overall computational burden of collision detection, leading to computation times that are up to two times faster. Finally, we hope this work will significantly reduce the computational cost of modern robotic simulators, allowing the speedup of modern robotic applications that heavily rely on simulation, such as reinforcement learning or trajectory optimization.
The Coronavirus Disease 2019 (COVID-19) epidemic has constituted a Public Health Emergency of International Concern. Chest computed tomography (CT) can help early reveal abnormalities indicative of lung disease. Thus,...
详细信息
The Coronavirus Disease 2019 (COVID-19) epidemic has constituted a Public Health Emergency of International Concern. Chest computed tomography (CT) can help early reveal abnormalities indicative of lung disease. Thus, accurate and automatic localisation of lung lesions is particularly important to assist physicians in rapid diagnosis of COVID-19 patients. The authors propose a classifier-augmented generative adversarial network framework for weakly supervised COVID-19 lung lesion localisation. It consists of an abnormality map generator, discriminator and classifier. The generator aims to produce the abnormality feature map M to locate lesion regions and then constructs images of the pseudo-healthy subjects by adding M to the input patient images. Besides constraining the generated images of healthy subjects with real distribution by the discriminator, a pre-trained classifier is introduced to enhance the generated images of healthy subjects to possess similar feature representations with real healthy people in terms of high-level semantic features. Moreover, an attention gate is employed in the generator to reduce the noise effect in the irrelevant regions of M. Experimental results on the COVID-19 CT dataset show that the method is effective in capturing more lesion areas and generating less noise in unrelated areas, and it has significant advantages in terms of quantitative and qualitative results over existing methods.
The accurate reconstruction of topology and texture details of three-dimensional (3D) objects from a single two-dimensional image presents a significant challenge in the field of computer vision. Existing methods have...
详细信息
The accurate reconstruction of topology and texture details of three-dimensional (3D) objects from a single two-dimensional image presents a significant challenge in the field of computer vision. Existing methods have achieved varying degrees of success by utilizing different geometric representations, but they all suffer from limitations when accurately reconstructing surfaces with complex topology and texture. Therefore, this study proposes an approach that combines the convolutional block attention module (CBAM), texture detail fusion, and multimodal fusion to address this challenge effectively. To enhance the model's focus on important areas within images, we integrate the CBAM mechanism with ResNet for feature extraction. Texture detail fusion plays a crucial role as it effectively captures changes in the object's surface while multimodal fusion improves the accuracy of predicting the signed distance function. We have developed an implicit single-view 3D reconstruction network capable of retrieving topology and surface details of 3D models from a single input image. The integration of global, local, and surface texture features is a significant advancement that improves shape representation and accurately captures surface textures, filling a crucial gap in the field. During the process of reconstruction, we extract features that represent global information, local information, and texture variation information from the input image. By utilizing global information to approximate the shape of the object, refining shape and surface texture details through the utilization of local information, and applying distinct loss terms to constrain various aspects of reconstruction, our method achieves accurate single-image 3D model reconstruction with detailed surface textures. Through qualitative and quantitative analysis, we demonstrate the superiority of our model over state-of-the-art techniques on the ShapeNet dataset. The significance of our work lies in its ability t
Reconstructing 3D garment models usually requires laborious data-fetching processes, such as expensive lidar, multiple-view images, or SMPL models of the garments. In this paper, we propose a neat framework that takes...
详细信息
Reconstructing 3D garment models usually requires laborious data-fetching processes, such as expensive lidar, multiple-view images, or SMPL models of the garments. In this paper, we propose a neat framework that takes single-image inputs for generating pseudo-sparse views of 3D garments and synthesizing multi-view images into a high-quality 3D neural model. Specifically, our framework combines a pretrained pseudo sparse view generator and a volumetric signed distance function (SDF) representation-based network for 3D garment modeling, which uses neural networks to represent both the density and radiance fields. We further introduce a stride fusion strategy to minimize the pixel-level loss in key viewpoints and semantic loss in random viewpoints, which produces view-consistent geometry and sharp texture details. Finally, a multi-view rendering module utilizes the learned SDF representation to generate multi-view garment images and extract accurate mesh and texture from them. We evaluate our proposed framework on the Deep Fashion 3D dataset and achieve state-of-the-art performance in terms of both quantitative and qualitative evaluations.
Visual speech, referring to the visual domain of speech, has attracted increasing attention due to its wide applications, such as public security, medical treatment, military defense, and film entertainment. As a powe...
详细信息
Visual speech, referring to the visual domain of speech, has attracted increasing attention due to its wide applications, such as public security, medical treatment, military defense, and film entertainment. As a powerful AI strategy, deep learning techniques have extensively promoted the development of visual speech learning. Over the past five years, numerous deep learning based methods have been proposed to address various problems in this area, especially automatic visual speech recognition and generation. To push forward future research on visual speech, this paper will present a comprehensive review of recent progress in deep learning methods on visual speech analysis. We cover different aspects of visual speech, including fundamental problems, challenges, benchmark datasets, a taxonomy of existing methods, and state-of-the-art performance. Besides, we also identify gaps in current research and discuss inspiring future research directions.
Knowing the pole axis of an asteroid is vital to autonomous asteroid exploration efforts. Ground-based initial pole estimation methods are time and data intensive and produce estimates with large uncertainties. These ...
详细信息
Knowing the pole axis of an asteroid is vital to autonomous asteroid exploration efforts. Ground-based initial pole estimation methods are time and data intensive and produce estimates with large uncertainties. These errors have a significant impact on proximity navigation, shape modeling, and scientific data for small body missions. In this paper, a new method of obtaining this information from onboard spacecraft imagery is presented. The proposed method estimates the pole from onboard infrared imagery using the camera-asteroid geometry. This method does not require a prior and is designed to work in a vast majority approach trajectories due to the use of infrared images. The method is applied to simulated infrared images of asteroids 101955 Bennu and 25143 Itokawa as well as real infrared images of asteroid 162173 Ryugu from the Hayabusa2 mission. The average pole errors using this method on Bennu and Itokawa images are approximately 2 and 6 deg, respectively. The pole estimate error on the Ryugu images is approximately 8 deg. The algorithm is shown to be sensitive to the percentage of spin period imaged and the spacing between the images.
The multilayer reflectors of insect epidermis can produce unique structural color through interactions with light. Many fossilized insects, like amber-entombed wasps, present structural colors. However, how this multi...
详细信息
The multilayer reflectors of insect epidermis can produce unique structural color through interactions with light. Many fossilized insects, like amber-entombed wasps, present structural colors. However, how this multilayer structure and structural colors are preserved during the fossilization process is still being determined. We use a transfer matrix method (TMM) and a Non-Standard Finite Difference Time Domain (NS-FDTD) simulations to analyze the effects of both expected compressions and expansions of the epidermis layer thickness during fossilization on its structural colors. We estimate the variations of epidermis layer thickness due to the fossilization by measuring their color distances. Surprisingly, we find that the structural coloration of the multilayer reflectors, ranging from blue to green, emitted by many insects remained unchanged from about +5% expansion to -12% compression of their thickness. These findings suggest that, first, insects might have kept their original colors during the fossilization process. Second, the appearance of these structural colors in insects might not just be by chance, but could also be a result of specific evolutionary choices.
Global Illumination (GI) is a technique that is employed in computer graphics to enhance realism. Various methods have been used to achieve this using computer-generated imagery. The most precise method involves conve...
详细信息
Global Illumination (GI) is a technique that is employed in computer graphics to enhance realism. Various methods have been used to achieve this using computer-generated imagery. The most precise method involves conventional ray tracing, which yields highly realistic results but is computationally intensive and unsuitable for real-time applications. Alternatively, faster algorithms utilize post-processing on rasterization, making them more suitable for real-time scenarios. However, these algorithms are also resource-intensive and may produce inaccurate lighting due to limited information on screen-space features. our proposal involves utilizing a Generative Adversarial Network (GAN) approach to achieve real-time GI effects, following the methodology of conventional screen-space GI techniques. We take surrounding graphical information into account by going beyond screen-space and producing consistent GI effects that are comparatively closer to their physically correct ray-tracing counterpart. Moreover, our model provides a better quality of generated output than the other recent model which utilized a similar approach by scoring 0.90811 in SSIM, 0.00093 in MSE, and 30.30576 dB in PSNR on our developed dataset.
暂无评论