In recent years, most video segmentation methods use deep CNN to process the input image, but they did not fully mine the rich intermediate predictions in spatio-temporal space. And, the segmentation challenges such a...
详细信息
In recent years, most video segmentation methods use deep CNN to process the input image, but they did not fully mine the rich intermediate predictions in spatio-temporal space. And, the segmentation challenges such as occlusion, severe deformation and illumination have not been well solved so far. To alleviate these problems, this paper focuses on constructing multi module network structures that represent multi semantics and proposes a video object segmentation network via coupled-stream architecture with feature memory mechanism. This network first extracts high-level semantic features, edge features, long-term and short-term stable depth features of the target, and then decode them into the segmentation mask of target. In addition, negative skeleton inhibition and frame interpolation are used to prevent the interference of similar objects and motion blur, respectively. The method has a low GPU memory usage, regardless of the number of object in video. And performs 86.5%and 62.4% in J&F measure on DAVIS 2016 and DAVIS 2017 validation set, without fine-tuning and online training. This paper focuses on constructing multi module network structures that represent multi semantics and proposes a video object segmentation network via coupled-stream architecture with feature memory mechanism. image
University of Stuttgart educators have updated three computer science courses to incorporate forward-compatible OpenGL. To help students, they developed an educational framework that abstracts some of modern OpenGL...
详细信息
University of Stuttgart educators have updated three computer science courses to incorporate forward-compatible OpenGL. To help students, they developed an educational framework that abstracts some of modern OpenGL's difficult aspects.
We introduce a new representation of recently introduced perturbed Bernstein operators BnM$$ {B}_n circumflex M $$ in terms of classical Bernstein operators. Using this new representation, we investigate the shape-pre...
详细信息
We introduce a new representation of recently introduced perturbed Bernstein operators BnM$$ {B}_n circumflex M $$ in terms of classical Bernstein operators. Using this new representation, we investigate the shape-preserving properties of the operator BnM$$ {B}_n circumflex M $$. In particular, we prove that perturbed Bernstein operators preserve monotonicity and convexity of functions for certain cases. On the other hand, we demonstrate with some counterexamples that monotonicity and convexity preserving properties fail in other cases. Moreover, we present some weaker results in these cases.
IEEE computer graphics and Applications began publishing "Applications" as a regular department under its present editor 30 years ago in November 1994, with the goal of featuring interesting examples of usin...
详细信息
IEEE computer graphics and Applications began publishing "Applications" as a regular department under its present editor 30 years ago in November 1994, with the goal of featuring interesting examples of using computer graphics to solve real-world problems. The Applications department has appeared in every issue since, making the present article the 181st such article to appear. To mark this occasion, the Applications department takes a look back by revisiting the most cited articles that have appeared since the department's inception.
This study explored the utilization of digital technology in public art design for urban landscapes, focusing on CAD and computer graphics to create dynamic and interactive installations. The research presented a desi...
详细信息
This study explored the utilization of digital technology in public art design for urban landscapes, focusing on CAD and computer graphics to create dynamic and interactive installations. The research presented a design approach aligned with contemporary design practices by summarizing the attributes of public art and emphasizing the use of curve constraint forms. Advanced parametric design techniques significantly reduced the development time of 3D patterns from 30 h to just 10 h, demonstrating enhanced efficacy and precision. Innovative curve sampling techniques, including iso-parametric and isoarc length methods, resulted in a 25% improvement in the smoothness and continuity of design curves compared to traditional techniques. The public art installations developed in this study were designed to reflect and celebrate the cultural and historical aspects of urban areas, fostering a stronger connection between residents and their environment. Surveys conducted with 200 participants revealed that 85% felt a deeper connection to their cultural heritage due to the installations. The interactive nature of these art pieces engaged the public through visual, tactile, and behavioral interactions, significantly enhancing the aesthetic and functional quality of urban spaces. Despite these successes, the study identified several limitations, including a scarcity of authoritative literature and a lack of comprehensive case analyses for large-scale implementation. These gaps suggest the need for further investigation to explore the long-term feasibility and sustainability of digital public art installations. Future studies should investigate the durability, maintenance costs, and environmental impact of these installations, as well as promote interdisciplinary collaboration to develop new materials and technologies. Overall, this research highlighted the transformative potential of digital technology in public art design, offering practical insights and techniques for creating
Most existing multi-view stereo (MVS) methods fail to consider global context information in the stage of feature extraction and cost aggregation. As transformers have shown remarkable performance on various vision ta...
详细信息
Most existing multi-view stereo (MVS) methods fail to consider global context information in the stage of feature extraction and cost aggregation. As transformers have shown remarkable performance on various vision tasks due to their ability to perceive global contextual information, this paper proposes a transformer-based feature enhancement network (TF-MVSNet) to facilitate feature representation learning by combining local features (both 2D and 3D) with long-range contextual information. To reduce memory consumption of feature matching, the cross-attention mechanism is leveraged to efficiently construct 3D cost volumes under the epipolar constraint. Additionally, a colour-guided network is designed to refine depth maps at a coarse stage, hence reducing incorrect depth predictions at a fine stage. Extensive experiments were performed on the DTU dataset and Tanks and Temples (T&T) benchmark and results are reported. (1) We introduce Vision Transformers to enhance both 2D feature representations and global 3D spatial information aggregation. (2) We design a color-guided network to refine depth maps. (3) Proposed method achieves competitive performance on both DTU dataset and Tanks and Temples benchmark. image
The denoising of images is an important research direction in computer vision. We consider the image denoising task as an estimation problem of the filtering policy related to image features, which is different from e...
详细信息
The denoising of images is an important research direction in computer vision. We consider the image denoising task as an estimation problem of the filtering policy related to image features, which is different from end-to-end image mapping. Commonly used simple filters such as gaussian filtering and bilateral filtering have fixed global denoising policies. However, the denoising policies of different filters can only adapt to limited image features. To solve this problem, we propose a method that applies different filters to different spatial ranges and adjusts the parameters of these filters simultaneously. Since not all filters can be easily transformed into differentiable forms and it is difficult to obtain paired datasets of filter action areas, we use reinforcement learning (RL) methods to estimate the spatial domain action range and adjustable parameters of filters, respectively. Furthermore, for removing higher intensity noise, simple filters can iteratively approximate higher-order denoising policies and obtain more accurate and stable denoising results with the increase of iteration steps. Experimental results show that our proposed method can not only generate intuitive and interpretable denoising policies but also achieve comparable or better visual effects and computational efficiency than baseline methods. (1) Selection of filter parameters for image denoising using reinforcement learning. (2) Increased solution space range compared to using discrete action space. (3) The use of filters improves the interpretability of the denoising process. (4) Show the range of filters and the parameter selection for each ***
Approximate convex decomposition simplifies complex shapes into manageable convex components. In this work, we propose a novel surface-based method that achieves efficient computation times and sufficiently convex res...
详细信息
Approximate convex decomposition simplifies complex shapes into manageable convex components. In this work, we propose a novel surface-based method that achieves efficient computation times and sufficiently convex results while avoiding overapproximation of the input model. We start approximation using mesh simplification. Then we iterate over the surface polygons of the mesh and divide them into convex groups. We utilize planar and angular equations to determine suitable neighboring polygons for inclusion in forming convex groups. To ensure our method outputs a sufficient result for a wide range of input shapes, we run multiple iterations of our algorithm using varying planar thresholds and mesh simplification levels. For each level of simplification, we find the planar threshold that leads to the decomposition with the least number of pieces while remaining under a certain concavity threshold. Subsequently, we find the simplification level that houses the decomposition with the least concavity, and output that decomposition as our result. We demonstrated experiment results that show the stability of our method as well as compared our work to two convex decomposition algorithms, providing discussion on the shortcomings and advantages of the proposed method. Notably, our main advantage turns out to be on time efficiency as we produce output faster than our competitors which, however, outperform our results for some models from an accuracy perspective.
How we perceive and experience the world around us is inherently multisensory. Most of the Virtual Reality (VR) literature is based on the senses of sight and hearing. However, there is a lot of potential for integrat...
详细信息
How we perceive and experience the world around us is inherently multisensory. Most of the Virtual Reality (VR) literature is based on the senses of sight and hearing. However, there is a lot of potential for integrating additional stimuli into Virtual Environments (VEs), especially in a training context. Identifying the relevant stimuli for obtaining a virtual experience that is perceptually equivalent to a real experience will lead users to behave the same across environments, which adds substantial value for several training areas, such as firefighters. In this article, we present an experiment aiming to assess the impact of different sensory stimuli on stress, fatigue, cybersickness, Presence and knowledge transfer of users during a firefighter training VE. The results suggested that the stimulus that significantly impacted the user's response was wearing a firefighter's uniform and combining all sensory stimuli under study: heat, weight, uniform, and mask. The results also showed that the VE did not induce cybersickness and that it was successful in the task of transferring knowledge.
Humans constantly interact with objects to accomplish tasks. To understand such interactions, computers need to reconstruct these in 3D from images of whole bodies manipulating objects, e.g., for grasping, moving and ...
详细信息
Humans constantly interact with objects to accomplish tasks. To understand such interactions, computers need to reconstruct these in 3D from images of whole bodies manipulating objects, e.g., for grasping, moving and using the latter. This involves key challenges, such as occlusion between the body and objects, motion blur, depth ambiguities, and the low image resolution of hands and graspable object parts. To make the problem tractable, the community has followed a divide-and-conquer approach, focusing either only on interacting hands, ignoring the body, or on interacting bodies, ignoring the hands. However, these are only parts of the problem. On the contrary, recent work focuses on the whole problem. The GRAB dataset addresses whole-body interaction with dexterous hands but captures motion via markers and lacks video, while the BEHAVE dataset captures video of body-object interaction but lacks hand detail. We address the limitations of prior work with InterCap, a novel method that reconstructs interacting whole-bodies and objects from multi-view RGB-D data, using the parametric whole-body SMPL-X model and known object meshes. To tackle the above challenges, InterCap uses two key observations: (i) Contact between the body and object can be used to improve the pose estimation of both. (ii) Consumer-level Azure Kinect cameras let us set up a simple and flexible multi-view RGB-D system for reducing occlusions, with spatially calibrated and temporally synchronized cameras. With our InterCap method we capture the InterCap dataset, which contains 10 subjects (5 males and 5 females) interacting with 10 daily objects of various sizes and affordances, including contact with the hands or feet. To this end, we introduce a new data-driven hand motion prior, as well as explore simple ways for automatic contact detection based on 2D and 3D cues. In total, InterCap has 223 RGB-D videos, resulting in 67,357 multi-view frames, each containing 6 RGB-D images, paired with pseudo gro
暂无评论