In recent years, with the development of ultrasonic sensing technology, ultrasound is considered as a promising alternative to surface electromyography for the recognition of human hand movements. However, the existin...
详细信息
Modern consumer cameras commonly employ the rolling shutter (RS) imaging mechanism, via which images are captured by scanning scenes row-by-row, resulting in RS distortion for dynamic scenes. To correct RS distortion,...
详细信息
In this work, we focus on the task of procedure planning from instructional videos with text supervision, where a model aims to predict an action sequence to transform the initial visual state into the goal visual sta...
In this work, we focus on the task of procedure planning from instructional videos with text supervision, where a model aims to predict an action sequence to transform the initial visual state into the goal visual state. A critical challenge of this task is the large semantic gap between observed visual states and unobserved intermediate actions, which is ignored by previous works. Specifically, this semantic gap refers to that the contents in the observed visual states are semantically different from the elements of some action text labels in a procedure. To bridge this semantic gap, we propose a novel event-guided paradigm, which first infers events from the observed states and then plans out actions based on both the states and predicted events. Our inspiration comes from that planning a procedure from an instructional video is to complete a specific event and a specific event usually involves specific actions. Based on the proposed paradigm, we contribute an Event-guided Prompting-based Procedure Planning (E3P) model, which encodes event information into the sequential modeling process to support procedure planning. To further consider the strong action associations within each event, our E3P adopts a mask-and-predict approach for relation mining, incorporating a probabilistic masking scheme for regularization. Extensive experiments on three datasets demonstrate the effectiveness of our proposed model.
In recent years,with the rapid development of deep learning technologies,some neural network models have been applied to generate fake ***,a deep learning based forgery technology,can tamper with the face easily and g...
详细信息
In recent years,with the rapid development of deep learning technologies,some neural network models have been applied to generate fake ***,a deep learning based forgery technology,can tamper with the face easily and generate fake videos that are difficult to be distinguished by human *** spread of face manipulation videos is very easy to bring fake ***,it is important to develop effective detection methods to verify the authenticity of the *** to that it is still challenging for current forgery technologies to generate all facial details and the blending operations are used in the forgery process,the texture details of the fake face are ***,in this paper,a new method is proposed to detect DeepFake ***,the texture features are constructed,which are based on the gradient domain,standard deviation,gray level co-occurrence matrix and wavelet transform of the face ***,the features are processed by the feature selection method to form a discriminant feature vector,which is finally employed to SVM for classification at the frame *** experimental results on the mainstream DeepFake datasets demonstrate that the proposed method can achieve ideal performance,proving the effectiveness of the proposed method for DeepFake videos detection.
Vision and language navigation is a task that requires an agent to navigate according to a natural language instruction. Recent methods predict sub-goals on constructed topology map at each step to enable long-term ac...
详细信息
This paper presents RoGSplat, a novel approach for synthesizing high-fidelity novel views of unseen human from sparse multi-view images, while requiring no cumbersome per-subject optimization. Unlike previous methods ...
详细信息
Video domain generalization aims to learn generalizable video classification models for unseen target domains by training in a source domain. A critical challenge of video domain generalization is to defend against th...
详细信息
It is well known that weakly supervised semantic segmentation requires only image-level labels for training, which greatly reduces the annotation cost. In recent years, prototype-based approaches, which prove to subst...
详细信息
At present, the scale of quantum computers in the real sense is still small, and quantum simulation has become one of the important ways of quantum theory research, grover quantum search algorithm is suitable for the ...
详细信息
Class incremental learning (CIL) aims to recognize both the old and new classes along the increment tasks. Deep neural networks in CIL suffer from catastrophic forgetting and some approaches rely on saving exemplars f...
详细信息
暂无评论