检索结果-内蒙古大学图书馆

SelfDRSC++: Self-Supervised Learning for Dual Reversed Rolling Shutter Correction

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Shang, Wei Ren, Dongwei Zhang, Wanying Wang, Qilong Zhu, Pengfei Zuo, Wangmeng The Faculty of Computing Harbin Institute of Technology Harbin China The Tianjin Key Laboratory of Machine Learning College of Intelligence and Computing Tianjin University Tianjin China

Modern consumer cameras commonly employ the rolling shutter (RS) imaging mechanism, via which images are captured by scanning scenes row-by-row, resulting in RS distortion for dynamic scenes. To correct RS distortion, existing methods adopt a fully supervised learning manner that requires high framerate global shutter (GS) images as ground-truth for supervision. In this paper, we propose an enhanced Self-supervised learning framework for Dual reversed RS distortion Correction (SelfDRSC++). Firstly, we introduce a lightweight DRSC network that incorporates a bidirectional correlation matching block to refine the joint optimization of optical flows and corrected RS features, thereby improving correction performance while reducing network parameters. Subsequently, to effectively train the DRSC network, we propose a self-supervised learning strategy that ensures cycle consistency between input and reconstructed dual reversed RS images. The RS reconstruction in SelfDRSC++ can be interestingly formulated as a specialized instance of video frame interpolation, where each row in reconstructed RS images is interpolated from predicted GS images by utilizing RS distortion time maps. By achieving superior performance while simplifying the training process, SelfDRSC++ enables feasible one-stage self-supervised training. Additionally, besides start and end RS scanning time, SelfDRSC++ allows supervision of GS images at arbitrary intermediate scanning times, thus enabling the learned DRSC network to generate high framerate GS videos. On synthetic dataset, SelfDRSC++ achieves better or comparable quantitative metrics in comparison to state-of-the-art methods trained with the full supervision manner. Our SelfDRSC++ can produce high framerate GS videos with finer correction textures and better temporary consistency when dealing with real-world RS cases. The code and trained models are available at https://***/shangwei5/SelfDRSC_plusplus. Copyright © 2024, The Authors. All righ

关键词： Self-supervised learning

Event-Guided Procedure Planning from Instructional Videos with Text Supervision

学校读者我要写书评

暂无评论

Event-Guided Procedure Planning from Instructional Videos wi...

International Conference on Computer Vision (ICCV)

作者： An-Lan Wang Kun-Yu Lin Jia-Run Du Jingke Meng Wei-Shi Zheng School of Computer Science and Engineering Sun Yat-sen University China Key Laboratory of Machine Intelligence and Advanced Computing Ministry of Education China

In this work, we focus on the task of procedure planning from instructional videos with text supervision, where a model aims to predict an action sequence to transform the initial visual state into the goal visual state. A critical challenge of this task is the large semantic gap between observed visual states and unobserved intermediate actions, which is ignored by previous works. Specifically, this semantic gap refers to that the contents in the observed visual states are semantically different from the elements of some action text labels in a procedure. To bridge this semantic gap, we propose a novel event-guided paradigm, which first infers events from the observed states and then plans out actions based on both the states and predicted events. Our inspiration comes from that planning a procedure from an instructional video is to complete a specific event and a specific event usually involves specific actions. Based on the proposed paradigm, we contribute an Event-guided Prompting-based Procedure Planning (E3P) model, which encodes event information into the sequential modeling process to support procedure planning. To further consider the strong action associations within each event, our E3P adopts a mask-and-predict approach for relation mining, incorporating a probabilistic masking scheme for regularization. Extensive experiments on three datasets demonstrate the effectiveness of our proposed model.

关键词：

DeepFake Videos Detection Based on Texture Features

学校读者我要写书评

暂无评论

Computers, Materials & Continua 2021年第7期68卷 1375-1388页

作者： Bozhi Xu Jiarui Liu Jifan Liang Wei Lu Yue Zhang School of Computer Science and Engineering Guangdong Province Key Laboratory of Information Security TechnologyMinistry of Education Key Laboratory of Machine Intelligence and Advanced ComputingSun Yat-sen UniversityGuangzhou510006China Department of Computer Science University of Massachusetts LowellLowell01854MAUSA

In recent years,with the rapid development of deep learning technologies,some neural network models have been applied to generate fake ***,a deep learning based forgery technology,can tamper with the face easily and generate fake videos that are difficult to be distinguished by human *** spread of face manipulation videos is very easy to bring fake ***,it is important to develop effective detection methods to verify the authenticity of the *** to that it is still challenging for current forgery technologies to generate all facial details and the blending operations are used in the forgery process,the texture details of the fake face are ***,in this paper,a new method is proposed to detect DeepFake ***,the texture features are constructed,which are based on the gradient domain,standard deviation,gray level co-occurrence matrix and wavelet transform of the face ***,the features are processed by the feature selection method to form a discriminant feature vector,which is finally employed to SVM for classification at the frame *** experimental results on the mainstream DeepFake datasets demonstrate that the proposed method can achieve ideal performance,proving the effectiveness of the proposed method for DeepFake videos detection.

关键词： DeepFake video tampering tampering detection texture feature

PRET: Planning with Directed Fidelity Trajectory for Vision and Language Navigation

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Lu, Renjie Meng, Jingke Zheng, Wei-Shi School of Computer Science and Engineering Sun Yat-sen University Guangzhou China Peng Cheng Laboratory Shenzhen China Key Laboratory of Machine Intelligence and Advanced Computing Ministry of Education Guangzhou China

Vision and language navigation is a task that requires an agent to navigate according to a natural language instruction. Recent methods predict sub-goals on constructed topology map at each step to enable long-term action planning. However, they suffer from high computational cost when attempting to support such high-level predictions with GCN-like models. In this work, we propose an alternative method that facilitates navigation planning by considering the alignment between instructions and directed fidelity trajectories, which refers to a path from the initial node to the candidate locations on a directed graph without detours. This planning strategy leads to an efficient model while achieving strong performance. Specifically, we introduce a directed graph to illustrate the explored area of the environment, emphasizing directionality. Then, we firstly define the trajectory representation as a sequence of directed edge features, which are extracted from the panorama based on the corresponding orientation. Ultimately, we assess and compare the alignment between instruction and different trajectories during navigation to determine the next navigation target. Our method outperforms previous SOTA method BEVBert on RxR dataset and is comparable on R2R dataset while largely reducing the computational cost. Code is available: https://***/iSEE-laboratory/VLN-PRET. Copyright © 2024, The Authors. All rights reserved.

关键词： Navigation

RoGSplat: Learning Robust Generalizable Human Gaussian Splatting from Sparse Multi-View Images

学校读者我要写书评

暂无评论

arXiv 2025年

作者： Xiao, Junjin Zhang, Qing Nie, Yonewei Zhu, Lei Zheng, Wei-Shi School of Computer Science and Engineering Sun Yat-sen University China South China University of Technology China China Key Laboratory of Machine Intelligence and Advanced Computing Ministry of Education China

This paper presents RoGSplat, a novel approach for synthesizing high-fidelity novel views of unseen human from sparse multi-view images, while requiring no cumbersome per-subject optimization. Unlike previous methods that typically struggle with sparse views with few overlappings and are less effective in reconstructing complex human geometry, the proposed method enables robust reconstruction in such challenging conditions. Our key idea is to lift SMPL vertices to dense and reliable 3D prior points representing accurate human body geometry, and then regress human Gaussian parameters based on the points. To account for possible misalignment between SMPL model and images, we propose to predict image-aligned 3D prior points by leveraging both pixel-level features and voxel-level features, from which we regress the coarse Gaussians. To enhance the ability to capture high-frequency details, we further render depth maps from the coarse 3D Gaussians to help regress fine-grained pixel-wise Gaussians. Experiments on several benchmark datasets demonstrate that our method outperforms state-of-the-art methods in novel view synthesis and cross-dataset generalization. Our code is available at https: //***/iSEE-laboratory/RoGSplat. Copyright © 2025, The Authors. All rights reserved.

关键词： Pixels

Diversifying Spatial-Temporal Perception for Video Domain Generalization

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Lin, Kun-Yu Du, Jia-Run Gao, Yipeng Zhou, Jiaming Zheng, Wei-Shi School of Computer Science and Engineering Sun Yat-sen University China Key Laboratory of Machine Intelligence and Advanced Computing Ministry of Education China

Video domain generalization aims to learn generalizable video classification models for unseen target domains by training in a source domain. A critical challenge of video domain generalization is to defend against the heavy reliance on domain-specific cues extracted from the source domain when recognizing target videos. To this end, we propose to perceive diverse spatial-temporal cues in videos, aiming to discover potential domain-invariant cues in addition to domain-specific cues. We contribute a novel model named Spatial-Temporal Diversification Network (STDN), which improves the diversity from both space and time dimensions of video data. First, our STDN proposes to discover various types of spatial cues within individual frames by spatial grouping. Then, our STDN proposes to explicitly model spatial-temporal dependencies between video contents at multiple space-time scales by spatial-temporal relation modeling. Extensive experiments on three benchmarks of different types demonstrate the effectiveness and versatility of our approach. © 2023, CC BY-SA.

关键词： Video recording

Region-Specific Prototype Customization for Weakly Supervised Semantic Segmentation 26

学校读者我要写书评

暂无评论

Region-Specific Prototype Customization for Weakly Supervise...

26th European Conference on Artificial intelligence, ECAI 2023

作者： Yu, Ruiguo Zhao, Yihang Yu, Mei Gao, Jie Wang, Chenhan Zhang, Ruixuan Li, Xuewei College of Intelligence and Computing Tianjin University Tianjin300350 China Tianjin Key Laboratory of Cognitive Computing and Application Tianjin300350 China Tianjin Key Laboratory of Advanced Networking Tianjin300350 China School of Future Technology Tianjin University Tianjin300350 China It Co. Ltd. Tianjin300456 China

ISBN: (纸本)9781643684369

It is well known that weakly supervised semantic segmentation requires only image-level labels for training, which greatly reduces the annotation cost. In recent years, prototype-based approaches, which prove to substantially improve the segmentation performance, have been favored by a wide range of researchers. However, we are surprised to find that there are semantic gaps between different regions within the same object, hindering the optimization of prototypes, so the traditional prototypes can not adequately represent the entire object. Therefore, we propose region-specific prototypes to adaptively describe the regions themselves, which alleviate the effect of semantic gap by separately obtaining prototypes for different regions of an object. In addition, to obtain more representative region-specific prototypes, a plug-and-play Spatially Fused Attention Module is proposed for combining the spatial correlation and the scale correlation of hierarchical features. Extensive experiments are conducted on PASCAL VOC 2012 and MS COCO 2014, and the results show that our method achieves state-of-the-art performance using only image-level labels. © 2023 The Authors.

关键词： Semantic Segmentation

Circuit Simulation and Optimization of Quantum Search Algorithm 6

学校读者我要写书评

暂无评论

Circuit Simulation and Optimization of Quantum Search Algori...

6th International Conference on Electronic Information Technology and Computer Engineering, EITCE 2022

作者： Liu, Xiaonan Zhao, Chenyan Xie, Haoshan Liu, Zhengyu State Key Laboratory of Mathematical Engineering and Advanced Computing Information Engineering University China School of Computer and Artificial Intelligence ZhengZhou University China

ISBN: (纸本)9781450397148

At present, the scale of quantum computers in the real sense is still small, and quantum simulation has become one of the important ways of quantum theory research, grover quantum search algorithm is suitable for the search problem of disordered database. Firstly, according to the implementation principle of Grover algorithm and Boolean logic relationship, the design idea of multi-objective Oracle is analyzed, based on IBMQ quantum cloud platform, the quantum circuit of Grover algorithm with multi-objective items is simulated. Based on the characteristics of Grover algorithm and the simulation process of quantum gate, the action of multiple identical quantum gates is combined to reduce the update times of probability amplitude and improve the simulation efficiency. The libquantum quantum simulator is used for experiments and the target item is successfully searched, which proves the feasibility of the optimization method and provides reference for the simulation and optimization of other quantum algorithms. © 2022 Association for computing machinery.

关键词： Timing circuits

Rotation Augmented Distillation for Exemplar-Free Class Incremental Learning with Detailed Analysis

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Chen, Xiuwei Chang, Xiaobin School of Artificial Intelligence Sun Yat-sen University China Guangdong Key Laboratory of Big Data Analysis and Processing Guangzhou510006 China Key Laboratory of Machine Intelligence and Advanced Computing Ministry of Education China

Class incremental learning (CIL) aims to recognize both the old and new classes along the increment tasks. Deep neural networks in CIL suffer from catastrophic forgetting and some approaches rely on saving exemplars from previous tasks, known as the exemplar-based setting, to alleviate this problem. On the contrary, this paper focuses on the Exemplar-Free setting with no old class sample preserved. Balancing the plasticity and stability in deep feature learning with only supervision from new classes is more challenging. Most existing Exemplar-Free CIL methods report the overall performance only and lack further analysis. In this work, different methods are examined with complementary metrics in greater detail. Moreover, we propose a simple CIL method, Rotation Augmented Distillation (RAD), which achieves one of the top-tier performances under the Exemplar-Free setting. Detailed analysis shows our RAD benefits from the superior balance between plasticity and stability. Finally, more challenging exemplar-free settings with fewer initial classes are undertaken for further demonstrations and comparisons among the state-of-the-art methods. Copyright © 2023, The Authors. All rights reserved.

关键词： Distillation