检索结果-内蒙古大学图书馆

Forgery-Aware Adaptive Learning With Vision Transformer for Generalized Face Forgery Detection

IEEE Transactions on Circuits and Systems for Video Technology 2025年第5期35卷 4116-4129页

作者： Luo, Anwei Cai, Rizhao Kong, Chenqi Ju, Yakun Kang, Xiangui Huang, Jiwu Kot, Alex C. Sun Yat-sen University School of Computer Science and Engineering Guangzhou510006 China Jiangxi University of Finance and Economics School of Computing and Artificial Intelligence Nanchang330013 China Laboratory School of Electrical and Electronic Engineering Jurong West 639798 Singapore Sun Yat-sen University Guangdong Key Laboratory of Information Security School of Computer Science and Engineering Guangzhou510006 China Shenzhen MSU-BIT University Guangdong Laboratory of Machine Perception and Intelligent Computing Faculty of Engineering Shenzhen518116 China China-Singapore International Joint Research Institute Guangzhou China

With the rapid progress of generative models, the current challenge in face forgery detection is how to effectively detect realistic manipulated faces from different unseen domains. Though previous studies show that pre-trained Vision Transformer (ViT) based models can achieve some promising results after fully fine-tuning on the Deepfake dataset, their generalization performances are still unsatisfactory. To this end, we present a Forgery-aware Adaptive Vision Transformer (FA-ViT) under the adaptive learning paradigm for generalized face forgery detection, where the parameters in the pre-trained ViT are kept fixed while the designed adaptive modules are optimized to capture forgery features. Specifically, a global adaptive module is designed to model long-range interactions among input tokens, which takes advantage of self-attention mechanism to mine global forgery clues. To further explore essential local forgery clues, a local adaptive module is proposed to expose local inconsistencies by enhancing the local contextual association. In addition, we introduce a fine-grained adaptive learning module that emphasizes the common compact representation of genuine faces through relationship learning in fine-grained pairs, driving these proposed adaptive modules to be aware of fine-grained forgery-aware information. Extensive experiments demonstrate that our FA-ViT achieves state-of-the-arts results in the cross-dataset evaluation, and enhances the robustness against unseen perturbations. Particularly, FA-ViT achieves 93.83% and 78.32% AUC scores on Celeb-DF and DFDC datasets in the cross-dataset evaluation. © 1991-2012 IEEE.

关键词： Forgery Faces Adaptation models Transformers Feature extraction Adaptive learning Computer vision Face recognition Deepfakes Visualization

来源：评论

学校读者我要写书评

暂无评论

ThicknesVAE: Learning a Latera Prior for Clothed Human Body Reconsuction

ThicknesVAE: Learning a Latera Prior for Clothed Human Body ...

引用

2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025

作者： Wu, Xiaotao Fan, Zhaoxin He, Huiguang Shen, Dinggang School of Biomedical Engineering State Key Laboratory of Advanced Medical Materials and Devices ShanghaiTech University Shanghai201210 China NeuBCI Group State Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology Institute of Automation Chinese Academy of Sciences Beijing China Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing Institute of Artificial Intelligence Beihang University Beijing100191 China Beijing Academy of Blockchain and Edge Computing China Shanghai United Imaging Intelligence Co. Ltd. Shanghai200232 China Shanghai Clnical Research and Trial Center Shanghai201210 China

ISBN: (纸本)9798350368741

Sandwich-like structures have shown remarkable efficacy in clothed human reconstruction. However, these approaches often generate unrealistic side geometries due to inadequate handling of lateral regions. This paper addresses this limitation by incorporating the side geometry of clothed humans as a prior. We propose ThicknessVAE, a novel two-stage method that makes two key contributions: (1) We learn a prototype from point clouds for the lateral regions of clothed humans to extract common and detailed geometric features. (2) We utilize this prototype as a prior to transform geometric features into a thickness map associated with clothed human images, enabling refined normal integration for sandwich-like reconstruction methods. By seamlessly integrating our model into the sandwich-like reconstruction pipeline, we achieve highly realistic side views. Both qualitative and quantitative experiments demonstrate that our approach is comparable to state-of-the-art methods in terms of side-view realism. © 2025 IEEE.

关键词： 3D Human Reconsruction Normal Integraton Prior Learning

来源：评论

学校读者我要写书评

暂无评论

Augmenting Short Enrollment Speech via Synthesis for Target Speaker Extraction

Augmenting Short Enrollment Speech via Synthesis for Target ...

引用

2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025

作者： Huang, Zikang Lin, Jingru Ge, Meng Jiang, Yu Wang, Xiaobao Wang, Longbiao Dang, Jianwu Tianjin Key Laboratory of Cognitive Computing and Application College of Intelligence and Computing Tianjin University Tianjin China Shenzhen China Department of Electrical and Computer and Engineering National University of Singapore Singapore Co. Ltd. Tianjin China Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shenzhen China

ISBN: (纸本)9798350368741

A high-quality enrollment speech is crucial to target speaker extraction (TSE), since it provides essential cues for identifying the target speaker in the mixture. However, real applications usually only permit a short enrollment speech, e.g. a wakeup word for a mobile device, that provides limited cues. To address this issue, we propose an enrollment augmentation strategy that allows us to enrich the limited enrollment speech with massive text data through speech synthesis. By doing so, the extended enrollment speech contains enhanced speaker timbre and phonetic content which leads to better extraction quality. Furthermore, we propose a training data augmentation strategy to improve the model's robustness and generalization in short enrollment speech scenarios. Experiments on Libri2Mix demonstrate that our proposed strategies bring a significant improvement in extreme scenarios where only 0.5s and 1-word enrollment speech is provided. We also release our code at https://***/HuangZikang-TJU/Aug4TSE. © 2025 IEEE.

关键词： continue speaking data augmentation short enrollment speech synthesis Target speaker extraction

来源：评论

学校读者我要写书评

暂无评论

Synthesis of interfacial electric field-enhanced CdS/CdxZn1-xS/ZnO ternary heterojunction by lye dissolution etching mechanism for photocatalytic H2 production and CO2 reduction

引用

材料科学技术（英文版） 2025年第1期204卷 152-165页

作者： Qi Li Shengchao Yang Yufan Huang Yuwei Liang Chunling Hu Min Wang Zhiyong Liu Yanlong Tai Jichang Liu Yongsheng Li School of Chemistry and Chemical Engineering Shihezi University/Key Laboratory of Green Process for Chemical Engineering/Key Laboratory for Chemical Materials of Xinjiang Uygur Autonomous Region/Engineering Center for Chemical Materials of Xinjiang BingtuanShihezi UniversityShihezi 832003China Shanghai Institute of Ceramics Chinese Academy of SciencesShanghai 200050China School of Chemistry and Chemical Engineering Shihezi University/Key Laboratory of Green Process for Chemical Engineering/Key Laboratory for Chemical Materials of Xinjiang Uygur Autonomous Region/Engineering Center for Chemical Materials of Xinjiang BingtuanShihezi UniversityShihezi 832003China Key Laboratory of Human-Machine Intelligence-Synergy Systems of Chinese Academy of Sciences(CAS) Shenzhen Institutes of Advanced TechnologyShenzhen 518055China School of Chemistry and Chemical Engineering Shihezi University/Key Laboratory of Green Process for Chemical Engineering/Key Laboratory for Chemical Materials of Xinjiang Uygur Autonomous Region/Engineering Center for Chemical Materials of Xinjiang BingtuanShihezi UniversityShihezi 832003China Lab of Low-Dimensional Materials Chemistry Key Laboratory for Ultrafine Materials of Ministry of Education School of Materials Science and Engineering East China University of Science and Technology ShanghaiShanghai 200237China

The difficulty in fabricating a multifaceted composite heterojunction system based on CdxZn1-xS limits the enhancement of photocatalytic *** the present scrutiny,novel ZnO/CdxZn1-xS/CdS com-posite heterojunctions are successfully prepared by the alkaline dissolution etching *** internal electric field at the interface of Ⅰ-type and Z-scheme heterojunction improved the effective charge *** ZC 8 sample exhibits excellent photocatalytic performance and the H2 production efficiency is 15.67 mmol g-1 h-1 with good stability up to 82.9％in 24-hour *** performance of CH4 and CO capacity in the CO2RR process is 3.47 μmol g-1 h-1 and 23.5 μmol g-1 h-1,*** photogener-ated accelerated charge transport is then examined in detail by in situ X-ray photoelectron spectroscopy(ISXPS)and density functional theory(DFT)*** work presents a new idea for the synthe-sis of CdxZni-xS solid-solution-based materials and provides a solid reference for the detailed mechanism regarding the electric field at the heterojunction interface.

关键词： Photocatalysis Interface electric field Composite heterostructure Photocatalytic mechanism CdxZn1-xS solid-solution

来源：评论

学校读者我要写书评

暂无评论

TightLLM: Maximizing Throughput for LLM Inference via Adaptive Offloading Policy

引用

IEEE Transactions on Computers 2025年

作者： Hu, Yitao Liu, Xiulong Yang, Guotao Li, Linxuan Zeng, Kai Zhao, Zhixin Chen, Sheng Zhao, Laiping Li, Wenxin Li, Keqiu Tianjin University Tianjin Key Laboratory of Advanced Networking Tianjin300350 China Tianjin University Department of Intelligence and Computing Tianjin300350 China

Large language models (LLMs) have demonstrated remarkable performance across a wide range of tasks, largely due to their substantial model size. However, this also results in significant GPU memory demands during inference. To address these challenges on hardware with limited GPU memory, existing approaches employ offloading techniques that offload unused tensors to CPU memory, thereby reducing GPU memory usage. Since offloading involves data transfer between GPU and CPU, it introduces transfer overhead. To mitigate this, prior works typically overlap data transfer with GPU computation using a fixed pipelining strategy applied uniformly across all inference iterations, referred to as static offloading. However, static offloading policies fail to maximize inference throughput because they cannot adapt to the dynamically changing transfer overhead during the inference process, leading to increasing GPU idleness and reduced inference *** propose that offloading policies should be adaptive to the varying transfer overhead across inference iterations to maximize inference throughput. To this end, we design and implement an adaptive offloading-based inference system called TightLLM with two key innovations. First, its key-value (KV) distributor employs a trade-compute-for-transfer strategy to address growing transfer overhead by dynamically recomputing portions of the KV cache, effectively overlapping data transfer with computation and minimizing GPU idleness. Second, TightLLM’s weight loader slices model weights and distributes the loading process across multiple batches, amortizing the excessive weight loading overhead and significantly improving throughput. Evaluation across various combinations of GPU hardware and LLM models shows that TightLLM achieves 1.3 to 23 times higher throughput during the decoding phase and 1.2 to 22 times higher throughput in the prefill phase compared to state-of-the-art offloading systems. Due to the higher throughput in prefill

关键词： Decoding

来源：评论

学校读者我要写书评

暂无评论

RoGSplat: Learning Robust Generalizable Human Gaussian Splatting from Sparse Multi-View Images

arXiv

引用

arXiv 2025年

作者： Xiao, Junjin Zhang, Qing Nie, Yonewei Zhu, Lei Zheng, Wei-Shi School of Computer Science and Engineering Sun Yat-sen University China South China University of Technology China China Key Laboratory of Machine Intelligence and Advanced Computing Ministry of Education China

This paper presents RoGSplat, a novel approach for synthesizing high-fidelity novel views of unseen human from sparse multi-view images, while requiring no cumbersome per-subject optimization. Unlike previous methods that typically struggle with sparse views with few overlappings and are less effective in reconstructing complex human geometry, the proposed method enables robust reconstruction in such challenging conditions. Our key idea is to lift SMPL vertices to dense and reliable 3D prior points representing accurate human body geometry, and then regress human Gaussian parameters based on the points. To account for possible misalignment between SMPL model and images, we propose to predict image-aligned 3D prior points by leveraging both pixel-level features and voxel-level features, from which we regress the coarse Gaussians. To enhance the ability to capture high-frequency details, we further render depth maps from the coarse 3D Gaussians to help regress fine-grained pixel-wise Gaussians. Experiments on several benchmark datasets demonstrate that our method outperforms state-of-the-art methods in novel view synthesis and cross-dataset generalization. Our code is available at https: //***/iSEE-laboratory/RoGSplat. Copyright © 2025, The Authors. All rights reserved.

关键词： Pixels

来源：评论

学校读者我要写书评

暂无评论

RL-Based USV Path Planning under the Marine Multimodal Features Considerations

引用

IEEE Internet of Things Journal 2025年第11期12卷 15274-15287页

作者： Lin, Quanbao Gou, Huaxing Tian, Peidong Zuo, Tian-Yu Zhang, Hanzhong Wang, Xin Sun, Poly Z. H. Shanghai Jiao Tong University School of Aeronautics and Astronautics Shanghai200240 China Southeast University School of Computer Science and Engineering Nanjing211189 China Shanghai University School of Microelectronics Shanghai200444 China Chinese Academy of Sciences CAS Key Laboratory of Human-Machine Intelligence-Synergy Systems Shenzhen Institute of Advanced Technology Guangdong Shenzhen518055 China Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems Guangdong Shenzhen518055 China East China Normal University School of Psychology and Cognitive Science Shanghai200062 China Shanghai Jiao Tong University Department of Industrial Engineering Shanghai200240 China

Path planning is an important step in ensuring the safety of Unmanned Surface Vehicle (USV) navigation and executing missions quickly and efficiently. However, current USV path planning methods lack comprehensive consideration of electronic nautical charts and meteorological data, resulting in planned paths being unable to fully utilize marine environmental conditions, which may easily lead to collisions and long navigation times. Based on the above considerations, our study designs a USV path planning system that comprehensively considers the multimodal information from electronic nautical charts and meteorological data. The system consists of three parts: (1) image processing module, (2) meteorological analysis module, and (3) path planning module. In detail, the image processing module obtains the geographical feature information from the electronic chart and constructs a static obstacle environment. The meteorological analysis module obtains the meteorological feature information from meteorological data and constructs a dynamic meteorological vector field environment. The path planning module introduces a designed double deep Q-Network (DQN) structure, a multivariate weighted Dueling network, and a priority sampling mechanism to enhance the DQN algorithm for promising performance in USV path planning. Extensive experiments illustrate the superior performance of the proposed fusion DQN algorithm. Furthermore, the feasibility of the entire path planning system is confirmed. © 2014 IEEE.

关键词： Unmanned surface vehicles

来源：评论

学校读者我要写书评

暂无评论

Cube Attacks Against Trivium, Kreyvium and ACORN with Practical Complexity 20th

Cube Attacks Against Trivium, Kreyvium and ACORN with Prac...

引用

20th International Conference on Information Security and Cryptology, Inscrypt 2024

作者： Chen, Yanqi Li, Ting Sun, Yao Key Laboratory of Cyberspace Security Defense Institute of Information Engineering Chinese Academy of Sciences Beijing China School of Cyber Security University of Chinese Academy of Sciences Beijing China Laboratory for Advanced Computing and Intelligence Engineering Wuxi China

ISBN: (纸本)9789819647330

The cube attack is a powerful cryptanalysis technique used against stream ciphers. It enables the retrieval of secret key information by computing the values of superpolys, with unknown secret key bits as variables. A practical key-recovery attack seeks to recover all key bits within a reasonable time complexity. The complexity of such attacks typically involves two main components: (1) the complexity of calculating the superpoly values under the real key, which depends on the size of the mother cube, and (2) the complexity of solving the superpoly system, guess-and-determine techniques are often used to solve the system especially when superpolys are nonlinear. In this paper, we improve the best-known practical key-recovery attack on Trivium by enhancing the techniques used in these two areas. First, we introduce a heuristic method to search for good mother cubes, enabling many balanced superpolys to recover. Second, we propose an efficient MILP-based model to search for a minimal number of guessed variables to solve the balanced superpoly systems, thus reducing the complexity of solving the system. With these advancements, we achieve key-recovery attacks on the 830- and 832-round Trivium within practical time complexity, surpassing the previous best result of 825 rounds. Additionally, we applied our new model to attack ACORN with 128-bit keys, achieving practical key-recovery attacks on the 507- and 611-round versions, compared to the previous highest of 477 rounds. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.

关键词： Cube attack key-recovery attack Practical attack Stream ciphers Trivium

来源：评论

学校读者我要写书评

暂无评论

LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models

arXiv

引用

arXiv 2025年

作者： Fu, Shenghao Yang, Qize Mo, Qijie Yan, Junkai Wei, Xihan Meng, Jingke Xie, Xiaohua Zheng, Wei-Shi School of Computer Science and Engineering Sun Yat-sen University China Tongyi Lab Alibaba Group China Peng Cheng Laboratory China Key Laboratory of Machine Intelligence and Advanced Computing Ministry of Education China Guangdong Province Key Laboratory of Information Security Technology China China

Recent open-vocabulary detectors achieve promising performance with abundant region-level annotated data. In this work, we show that an open-vocabulary detector co-training with a large language model by generating image-level detailed captions for each image can further improve performance. To achieve the goal, we first collect a dataset, GroundingCap-1M, wherein each image is accompanied by associated grounding labels and an image-level detailed caption. With this dataset, we finetune an open-vocabulary detector with training objectives including a standard grounding loss and a caption generation loss. We take advantage of a large language model to generate both region-level short captions for each region of interest and image-level long captions for the whole image. Under the supervision of the large language model, the resulting detector, LLMDet, outperforms the baseline by a clear margin, enjoying superior open-vocabulary ability. Further, we show that the improved LLMDet can in turn build a stronger large multi-modal model, achieving mutual benefits. The code, model, and dataset is available at https://***/iSEE-laboratory/LLMDet. Copyright © 2025, The Authors. All rights reserved.

关键词： Object detection

来源：评论

学校读者我要写书评

暂无评论

ViSpeak: Visual Instruction Feedback in Streaming Videos

arXiv

引用

arXiv 2025年

作者： Fu, Shenghao Yang, Qize Li, Yuan-Ming Peng, Yi-Xing Lin, Kun-Yu Wei, Xihan Hu, Jian-Fang Xie, Xiaohua Zheng, Wei-Shi School of Computer Science and Engineering Sun Yat-sen University China Tongyi Lab Alibaba Group China Peng Cheng Laboratory China Key Laboratory of Machine Intelligence and Advanced Computing Ministry of Education China Guangdong Province Key Laboratory of Information Security Technology China China

Recent advances in Large Multi-modal Models (LMMs) are primarily focused on offline video understanding. Instead, streaming video understanding poses great challenges to recent models due to its time-sensitive, omni-modal and interactive characteristics. In this work, we aim to extend the streaming video understanding from a new perspective and propose a novel task named Visual Instruction Feedback in which models should be aware of visual contents and learn to extract instructions from them. For example, when users wave their hands to agents, agents should recognize the gesture and start conversations with welcome information. Thus, following instructions in visual modality greatly enhances user-agent interactions. To facilitate research, we define seven key subtasks highly relevant to visual modality and collect the ViSpeak-Instruct dataset for training and the ViSpeak-Bench for evaluation. Further, we propose the ViSpeak model, which is a SOTA streaming video understanding LMM with GPT-4o-level performance on various streaming video understanding benchmarks. After finetuning on our ViSpeak-Instruct dataset, ViSpeak is equipped with basic visual instruction feedback ability, serving as a solid baseline for future research. Copyright © 2025, The Authors. All rights reserved.

关键词： Video streaming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：