Fine-grained visual parsing, including fine-grained part segmentation and fine-grained object recognition, has attracted considerable critical attention due to its importance in many real-world applications, e.g., agr...
详细信息
Fine-grained visual parsing, including fine-grained part segmentation and fine-grained object recognition, has attracted considerable critical attention due to its importance in many real-world applications, e.g., agriculture, remote sensing, and space technologies. Predominant research efforts tackle these fine-grained sub-tasks following different paradigms, while the inherent relations between these tasks are neglected. Moreover, given most of the research remains fragmented, we conduct an in-depth study of the advanced work from a new perspective of learning the part relationship. In this perspective, we first consolidate recent research and benchmark syntheses with new taxonomies. Based on this consolidation, we revisit the universal challenges in fine-grained part segmentation and recognition tasks and propose new solutions by part relationship learning for these important challenges. Furthermore, we conclude several promising lines of research in fine-grained visual parsing for future research.
Drogue detection is one of the challenging tasks in autonomous aerial refueling due to the requirement for accuracy and *** detection based on image intrinsic cues can achieve fast detection,but with poor *** studies ...
详细信息
Drogue detection is one of the challenging tasks in autonomous aerial refueling due to the requirement for accuracy and *** detection based on image intrinsic cues can achieve fast detection,but with poor *** studies reveal that optimization-based methods provide accurate and quick solutions for saliency *** paper presents a hybrid pigeon-inspired optimization method,the optimized color opponent,that aims to adjust the weight of color opponent channels to detect the drogue *** can optimize the weights in the selected aerial refueling scene offline,and the results are applied for drogue detection in the scene.A novel algorithm aggregated by the optimized color opponent and robust background detection is presented to provide better precision and *** results on benchmark datasets and aerial refueling images show that the proposed method successfully extracts the saliency region or drogue and exhibits superior performance against the other saliency detection methods with intrinsic *** algorithm designed in this paper is competent for the drogue detection task of autonomous aerial refueling.
It is a challenging task to create realistic 3D avatars that accurately replicate individuals' speech and unique talking styles for speech-driven facial animation. Existing techniques have made remarkable progress...
详细信息
It is a challenging task to create realistic 3D avatars that accurately replicate individuals' speech and unique talking styles for speech-driven facial animation. Existing techniques have made remarkable progress but still struggle to achieve lifelike mimicry. This paper proposes “TalkingStyle”, a novel method to generate personalized talking avatars while retaining the talking style of the person. Our approach uses a set of audio and animation samples from an individual to create new facial animations that closely resemble their specific talking style, synchronized with speech. We disentangle the style codes from the motion patterns, allowing our method to associate a distinct identifier with each person. To manage each aspect effectively, we employ three separate encoders for style, speech, and motion, ensuring the preservation of the original style while maintaining consistent motion in our stylized talking avatars. Additionally, we propose a new style-conditioned transformer decoder, offering greater flexibility and control over the facial avatar styles. We comprehensively evaluate TalkingStyle through qualitative and quantitative assessments, as well as user studies demonstrating its superior realism and lip synchronization accuracy compared to current state-of-the-art methods. To promote transparency and further advancements in the field, we also make the source code publicly available at https://***/wangxuanx/TalkingStyle. IEEE
Time-triggered architecture,as a mainstream design of the distributed real-time system,has been successfully applied in the aerospace,automotive and mechanical ***,time-triggered scheduling is a challenging NP-hard **...
详细信息
Time-triggered architecture,as a mainstream design of the distributed real-time system,has been successfully applied in the aerospace,automotive and mechanical ***,time-triggered scheduling is a challenging NP-hard *** are few studies that could quickly solve the scheduling problem of large distributed time-triggered *** solve this problem,a communication affinity parameter is defined in this paper to describe the degree of bias of the shaper task towards sending or receiving *** on this,an innovative task-message decoupling model named D-scheduler is built to reduce the computation complexity of the scheduling problem in large-scale ***,we provide mathematical proof that our model is a convex optimization that is easy to solve with existing computational *** experiments substantiate the efficacy of the *** dramatically reduces the scheduling complexity of large-scale real-time systems with a small loss of solving space compared to the federal scheduler.
In recent years,deep learning techniques have been used to estimate gaze-a significant task in computer vision and human-computer *** studies have made significant achievements in predicting 2D or 3D gazes from monocu...
详细信息
In recent years,deep learning techniques have been used to estimate gaze-a significant task in computer vision and human-computer *** studies have made significant achievements in predicting 2D or 3D gazes from monocular face *** study presents a deep neural network for 2D gaze estimation on mobile *** achieves state-of-the-art 2D gaze point regression error,while significantly improving gaze classification error on quadrant divisions of the *** this end,an efficient attention-based module that correlates and fuses the left and right eye contextual features is first proposed to improve gaze point regression ***,through a unified perspective for gaze estimation,metric learning for gaze classification on quadrant divisions is incorporated as additional ***,both gaze point regression and quadrant classification perfor-mances are *** experiments demonstrate that the proposed method outperforms existing gaze-estima-tion methods on the GazeCapture and MPIIFaceGaze datasets.
The collective behaviors of animals,from schooling fish to packing wolves and flocking birds,display plenty of fascinating phenomena that result from simple interaction rules among *** emergent intelligent properties ...
详细信息
The collective behaviors of animals,from schooling fish to packing wolves and flocking birds,display plenty of fascinating phenomena that result from simple interaction rules among *** emergent intelligent properties of the animal collective behaviors,such as self-organization,robustness,adaptability and expansibility,have inspired the design of autonomous unmanned swarm *** article reviews several typical natural collective behaviors,introduces the origin and connotation of swarm intelligence,and gives the application case of animal collective *** this basis,the article focuses on the forefront of progress and bionic achievements of aerial,ground and marine robotics swarms,illustrating the mapping relationship from biological cooperative mechanisms to cooperative unmanned cluster ***,considering the significance of the coexisting-cooperative-cognitive human-machine system,the key technologies to be solved are given as the reference directions for the subsequent exploration.
Despite the progress made in few-shot video action recognition, existing methods still struggle to achieve satisfactory performance when support samples are limited (e.g., 1-shot task). This paper proposes to augment ...
详细信息
Image and video stitching have made tremendous progress in the construction of wide field-of-view(FOV). However, some long-term challenges still exist, including wide baselines between cameras,large parallaxes, and lo...
详细信息
Image and video stitching have made tremendous progress in the construction of wide field-of-view(FOV). However, some long-term challenges still exist, including wide baselines between cameras,large parallaxes, and low texture in overlapping areas. The augmented virtual environment(AVE) captures videos as live textures of 3D models in a virtual environment, and provides another 3D solution to overcome the aforementioned challenges. Existing AVE methods primarily follow from video projection, and cannot produce satisfactory stitching results compared with image stitching. In this paper, we propose a novel model-guided 3D stitching algorithm for AVE. The algorithm recovers an approximate 3D model for each video streaming and optimizes the warping of the models to meet the requirements of feature point matching of the 3D models from adjacent videos. Compared with previous state-of-the-art methods, experiment results illustrate that our method significantly improves the stitching quality.
Depth information can benefit various computer vision tasks on both images and ***,depth maps may suffer from invalid values in many pixels,and also large *** improve such data,we propose a joint self-supervised and r...
详细信息
Depth information can benefit various computer vision tasks on both images and ***,depth maps may suffer from invalid values in many pixels,and also large *** improve such data,we propose a joint self-supervised and reference-guided learning approach for depth *** the self-supervised learning strategy,we introduce an improved spatial convolutional sparse coding module in which total variation regularization is employed to enhance the structural information while preserving edge *** module alternately learns a convolutional dictionary and sparse coding from a corrupted depth ***,both the learned convolutional dictionary and sparse coding are convolved to yield an initial depth map,which is effectively smoothed using local contextual *** reference-guided learning part is inspired by the fact that adjacent pixels with close colors in the RGB image tend to have similar depth *** thus construct a hierarchical joint bilateral filter module using the corresponding color image to fill in large *** summary,our approach integrates a convolutional sparse coding module to preserve local contextual information and a hierarchical joint bilateral filter module for filling using specific adjacent *** results show that the proposed approach works well for both invalid value restoration and large hole inpainting.
We propose an automatic image matting and fusing system for portrait synthesis in this *** firstly use a face detection algorithm to determine if the input contains a ***,we use a semantic segmentation neural network ...
详细信息
We propose an automatic image matting and fusing system for portrait synthesis in this *** firstly use a face detection algorithm to determine if the input contains a ***,we use a semantic segmentation neural network to generate a trimap and feed the trimap and the portrait into the neural network to predict the alpha channel ***,the input portrait’s background is replaced with the given background via an image synthesis algorithm to obtain the synthesized portrait.
暂无评论