检索结果-内蒙古大学图书馆

arXiv 2024年

作者： Xie, Yiping Yu, Zitong Wu, Bingjie Xie, Weicheng Shen, Linlin Computer Vision Institute School of Computer Science & Software Engineering Shenzhen Institute of Artificial Intelligence and Robotics for Society Guangdong Key Laboratory of Intelligent Information Processing Shenzhen University Shenzhen518060 China School of Computing and Information Technology Great Bay University Dongguan523000 China National Engineering Laboratory for Big Data System Computing Technology Shenzhen University Shenzhen518060 China Singapore

Remote Photoplethysmography (rPPG) is a non-contact method that uses facial video to predict changes in blood volume, enabling physiological metrics measurement. Traditional rPPG models often struggle with poor generalization capacity in unseen domains. Current solutions to this problem is to improve its generalization in the target domain through Domain Generalization (DG) or Domain Adaptation (DA). However, both traditional methods require access to both source domain data and target domain data, which cannot be implemented in scenarios with limited access to source data, and another issue is the privacy of accessing source domain data. In this paper, we propose the first Source-free Domain Adaptation benchmark for rPPG measurement (SFDA-rPPG), which overcomes these limitations by enabling effective domain adaptation without access to source domain data. Our framework incorporates a Three-Branch Spatio-Temporal Consistency Network (TSTC-Net) to enhance feature consistency across domains. Furthermore, we propose a new rPPG distribution alignment loss based on the Frequency-domain Wasserstein Distance (FWD), which leverages optimal transport to align power spectrum distributions across domains effectively and further enforces the alignment of the three branches. Extensive cross-domain experiments and ablation studies demonstrate the effectiveness of our proposed method in source-free domain adaptation settings. Our findings highlight the significant contribution of the proposed FWD loss for distributional alignment, providing a valuable reference for future research and applications. The source code is available at https://***/XieYiping66/SFDA-rPPG. Copyright © 2024, The Authors. All rights reserved.

关键词： Photoplethysmography

来源：评论

学校读者我要写书评

暂无评论

Joint Design of Radar Receive Filter and Unimodular ISAC Waveform with Sidelobe Level Control

arXiv

引用

arXiv 2025年

作者： Zhang, Kecheng Liu, Ya-Feng Wang, Zhongbin Yuan, Weijie Keskin, Musa Furkan Wymeersch, Henk Xia, Shuqiang School of System Design and Intelligent Manufacturing The Shenzhen Key Laboratory of Robotics and Computer Vision Southern University of Science and Technology Shenzhen518055 China State Key Laboratory of Scientific and Engineering Computing Institute of Computational Mathematics and Scientific/Engineering Computing Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing100190 China Department of Electrical Engineering Chalmers University of Technology Gothenburg41296 Sweden ZTE Corporation The State Key Laboratory of Mobile Network and Mobile Multimedia Technology Shenzhen518055 China

Integrated sensing and communication (ISAC) has been considered a key feature of next-generation wireless networks. This paper investigates the joint design of the radar receive filter and dual-functional transmit waveform for the multiple-input multiple-output (MIMO) ISAC system. While optimizing the mean square error (MSE) of the radar receive spatial response and maximizing the achievable rate at the communication receiver, besides the constraints of full-power radar receiving filter and unimodular transmit sequence, we control the maximum range sidelobe level, which is often overlooked in existing ISAC waveform design literature, for better radar imaging performance. To solve the formulated optimization problem with convex and nonconvex constraints, we propose an inexact augmented Lagrangian method (ALM) algorithm. For each subproblem in the proposed inexact ALM algorithm, we custom-design a block successive upper-bound minimization (BSUM) scheme with closed-form solutions for all blocks of variable to enhance the computational efficiency. Convergence analysis shows that the proposed algorithm is guaranteed to provide a stationary and feasible solution. Extensive simulations are performed to investigate the impact of different system parameters on communication and radar imaging performance. Comparison with the existing works shows the superiority of the proposed algorithm. © 2025, CC BY-NC-SA.

关键词： Mean square error

来源：评论

学校读者我要写书评

暂无评论

GM-DF: Generalized Multi-Scenario Deepfake Detection

arXiv

引用

arXiv 2024年

作者： Lai, Yingxin Yu, Zitong Yang, Jing Li, Bin Kang, Xiangui Shen, Linlin The School of Computing and Information Technology Great Bay University Dongguan523000 China National Engineering Laboratory for Big Data System Computing Technology Shenzhen University Shenzhen518060 China Shenzhen Institute of Artificial Intelligence and Robotics for Society Shenzhen University Shenzhen518060 China The Guangdong Key Laboratory of Information Security The School of Computer Science and Engineering Sun Yat-sen University Guangzhou510080 China Computer Vision Institute School of Computer Science & Software Engineering Shenzhen Institute of Artificial Intelligence and Robotics for Society Guangdong Key Laboratory of Intelligent Information Processing National Engineering Laboratory for Big Data System Computing Technology Shenzhen University Shenzhen518060 China

Existing face forgery detection usually follows the paradigm of training models in a single domain, which leads to limited generalization capacity when unseen scenarios and unknown attacks occur. In this paper, we elaborately investigate the generalization capacity of deepfake detection models when jointly trained on multiple face forgery detection datasets. We first find a rapid degradation of detection accuracy when models are directly trained on combined datasets due to the discrepancy across collection scenarios and generation methods. To address the above issue, a Generalized Multi-Scenario Deepfake Detection framework (GM-DF) is proposed to serve multiple real-world scenarios by a unified model. First, we propose a hybrid expert modeling approach for domain-specific real/forgery feature extraction. Besides, as for the commonality representation, we use CLIP to extract the common features for better aligning visual and textual features across domains. Meanwhile, we introduce a masked image reconstruction mechanism to force models to capture rich forged details. Finally, we supervise the models via a domain-aware meta-learning strategy to further enhance their generalization capacities. Specifically, we design a novel domain alignment loss to strongly align the distributions of the meta-test domains and meta-train domains. Thus, the updated models are able to represent both specific and common real/forgery features across multiple datasets. In consideration of the lack of study of multi-dataset training, we establish a new benchmark leveraging multi-source data to fairly evaluate the models' generalization capacity on unseen scenarios. Both qualitative and quantitative experiments on five datasets conducted on traditional protocols as well as the proposed benchmark demonstrate the effectiveness of our approach. The codes will be available on https://***/laiyingxin2/GM-DF. Copyright © 2024, The Authors. All rights reserved.

关键词： Image reconstruction

来源：评论

学校读者我要写书评

暂无评论

Deep Multi-Instance Learning with Adaptive Recurrent Pooling for Medical Image Classification

Deep Multi-Instance Learning with Adaptive Recurrent Pooling...

引用

IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

作者： Yi Ding Lu Zhao Liming Yuan Xianbin Wen School of Computer Science and Engineering Tianjin University of Technology Tianjin China Key Laboratory of Computer Vision and System Ministry of Education Tianjin China Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology Tianjin China School of Computer and Information Engineering Tianjin Chengjian University Tianjin China

ISBN: (数字)9781665468190

ISBN: (纸本)9781665468206

Recently, deep multi-instance neural networks have been successfully applied for medical image classification, where only image-level labels rather than fine-grained patch-level labels are available for use. One key issue for these multi-instance neural networks is how to aggregate all patch (instance) features into an entire image (bag) representation, referred to as multi-instance pooling, e.g., max, mean, and attention based pooling. Nevertheless, these multi-instance pooling operations do not take the structural information within an image into account. This is obviously inappropriate for medical image classification since there often exist certain dependencies among regional patches/lesions. We propose an adaptive recurrent pooling based deep multi-instance neural network in this paper. In this network, we first extract multi-view global structural features from every bag using the self-attention mechanism, and then aggregate these multi-view features into a whole bag representation based on the adaptive recurrent pooling operation in order to further capture the contextual information within the bag. Moreover, we introduce the cross-normalization operation used in the Unit Force Operated vision Transformer into the self-attention module to reduce its computational complexity. We have experimentally evaluated the performance of the proposed network on three medical image datasets, namely UCSB breast, Messidor, and Colon cancer. The results demonstrate the advantage of our network over current state-of-the-art deep multi-instance networks in terms of classification accuracy and interpretability.

关键词： Technological innovation Adaptive systems Aggregates Neural networks Force Feature extraction Transformers

来源：评论

学校读者我要写书评

暂无评论

Multi-View Representation Learning for Multi-Instance Learning with Applications to Medical Image Classification

Multi-View Representation Learning for Multi-Instance Learni...

引用

IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

作者： Lu Zhao Liming Yuan Zhenliang Li Xianbin Wen School of Computer and Information Engineering Tianjin Chengjian University Tianjin China School of Computer Science and Engineering Tianjin University of Technology Tianjin China Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology Tianjin China Key Laboratory of Computer Vision and System Ministry of Education Tianjin China

ISBN: (数字)9781665468190

ISBN: (纸本)9781665468206

Multi-Instance Learning (MIL) is a weakly supervised learning paradigm, in which every training example is a labeled bag of unlabeled instances. In typical MIL applications, instances are often used for describing the features of regions/parts in a whole object, e.g., regional patches/lesions in an eye-fundus image. However, for a (semantically) complex part the standard MIL formulation puts a heavy burden on the representation ability of the corresponding instance. To alleviate this pressure, we still adopt a bag-of-instances as an example in this paper, but extract from each instance a set of representations using $1 \times1$ convolutions. The advantages of this tactic are two-fold: i) This set of representations can be regarded as multi-view representations for an instance; ii) Compared to building multi-view representations directly from scratch, extracting them automatically using $1 \times1$ convolutions is more economical, and may be more effective since $1 \times1$ convolutions can be embedded into the whole network. Furthermore, we apply two consecutive multi-instance pooling operations on the reconstituted bag that has actually become a bag of sets of multi-view representations. We have conducted extensive experiments on several canonical MIL data sets from different application domains. The experimental results show that the proposed framework outperforms the standard MIL formulation in terms of classification performance and has good interpretability.

关键词： Training Representation learning Retinopathy Supervised learning Buildings Diabetes Bioinformatics

来源：评论

学校读者我要写书评

暂无评论

StyleGene: Crossover and Mutation of Region-level Facial Genes for Kinship Face Synthesis

StyleGene: Crossover and Mutation of Region-level Facial Gen...

引用

Conference on computer vision and Pattern Recognition (CVPR)

作者： Hao Li Xianxu Hou Zepeng Huang Linlin Shen Computer Vision Institute College of Computer Science and Software Engineering Shenzhen University National Engineering Laboratory for Big Data System Computing Technology Shenzhen University School of AI and Advanced Computing Xi'an Jiaotong-Liverpool University Shenzhen Institute of Artificial Intelligence and Robotics for Society Guangdong Key Laboratory of Intelligent Information Processing Shenzhen University

High-fidelity kinship face synthesis has many potential applications, such as kinship verification, missing child identification, and social media analysis. However, it is challenging to synthesize high-quality descendant faces with genetic relations due to the lack of large-scale, high-quality annotated kinship data. This paper proposes RFG (Region-level Facial Gene) extraction framework to address this issue. We propose to use IGE (Image-based Gene Encoder), LGE (Latent-based Gene Encoder) and Gene Decoder to learn the RFGs of a given face image, and the relationships between RFGs and the latent space of Style-GAN2. As cycle-like losses are designed to measure the $\mathcal{L}_{2}$ distances between the output of Gene Decoder and image encoder, and that between the output of LGE and IGE, only face images are required to train our framework, i.e. no paired kinship face data is required. Based upon the proposed RFGs, a crossover and mutation module is further designed to inherit the facial parts of parents. A Gene Pool has also been used to introduce the variations into the mutation of RFGs. The diversity of the faces of descendants can thus be significantly increased. Qualitative, quantitative, and subjective experiments on FIW, TSKinFace, and FF-Databases clearly show that the quality and diversity of kinship faces generated by our approach are much better than the existing state-of-the-art methods.

关键词：

来源：评论

学校读者我要写书评

暂无评论

A Kind of new parallax algorithm based on symmetric continuous optimization 5

A Kind of new parallax algorithm based on symmetric continuo...

引用

2020 5th International Seminar on computer Technology, Mechanical and Electrical Engineering, ISCME 2020

作者： Jiang, Kaiwen Zhang, Degan Xu, Haixia Key Laboratory of Computer Vision and System Tianjin University of Technology Ministry of Education 300384 China Tianjin Key Lab of Intelligent Computing and Novel Software Technology Tianjin University of Technology Tianjin China School of Electronic and Information Engineering Tianjin Vocational Institute Tianjin300410 China

With the fast development of 3D imaginations becomes more and more fascination, multi-view stereo based 3D reconstruction is a significant technique for those application. To facilitate the subsequent processing of 3D reconstruction and reduce the possibility of other algorithms falling into local optimal solutions, attempting to get better and faster performance, a new parallax calculation based on symmetric continuous optimization is proposed in this paper. The algorithm is proposed here will be tested respectively in the same data, in order to certificate the algorithm applied is better than traditional minimize El algorithm. © Published under licence by IOP Publishing Ltd.

关键词： Optimization

来源：评论

学校读者我要写书评

暂无评论

Data protection and provenance in cloud of things environment: Research challenges

引用

International Journal of Information and computer Security 2020年第4期12卷 416-435页

作者： Wang, Chundong Yang, Lei Guo, Hao Wan, Fujin Key Laboratory of Computer Vision and System Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology Ministry of Education Tianjin University of Technology Tianjin China Global Energy Internet Research Institute Beijing University of Technology Tianjin China Department of College of Computer and Control Engineering Nankai University Tianjin China

Internet of things are increasingly being deployed over the cloud (also referred to as cloud of things) to provide a broader range of services. However, there are serious challenges of CoT in the data protection and security provenance. This paper proposes a data privacy protection and provenance model (DPSPM) based on CoT. It can protect the privacy data of the users and trace the source of leaked data. In detail, security encryption and watermarking algorithms are proposed. Meanwhile, we use the improved k-anonymity data masking algorithm and pseudo-row watermarking algorithm in this scheme. Those algorithms can carry out security control over the whole process of data publishing, especially in data encryption, data masking and provenance verification. Finally, the experimental results show that our scheme has good efficiency. It is proved that the data masking time is proportional to the parameters k and L, the results also show good robustness to the common database watermarking attacks. © 2020 Inderscience Enterprises Ltd.

关键词： Data privacy

来源：评论

学校读者我要写书评

暂无评论

A New Dynamic Routing Network for Monocular Depth Estimation 12

A New Dynamic Routing Network for Monocular Depth Estimation

引用

12th International Conference on CYBER Technology in Automation, Control, and Intelligent systems, CYBER 2022

作者： Luo, Zhehao Luo, Sijin Liang, Guoyuan Wu, Xinyu Guangdong Provincial Key Lab of Robotics and Intelligent System Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Guangdong Province Shenzhen China University of Chinese Academy of Sciences Beijing China Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems Chinese Academy of Sciences Guangdong Province Shenzhen China Guangdong Provincial Key Laboratory of Computer Vision and Virtual Reality Technology Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Guangdong Province Shenzhen China

ISBN: (数字)9781665472678

ISBN: (纸本)9781665472678

Depth estimation is an essential task for understanding the geometry of 3D scenes. Compared with multi-view-based methods, monocular depth estimation is more challenging for the requirement of integrating not only global information but also local cues from various parts. Recently, numerous Convolutional Neural Networks (CNNs) based approaches relying on pre-defined static architectures have been reported. Although many of them have achieved remarkable improvements in accuracy, there remains problem with the generality of the static model when it comes to diverse scenes. In this paper, a novel deep network with dynamic routing architecture is proposed with a dynamically adjustable structure with respect to different inputs, therefore leading to not only improvements on both accuracy and generality but also decrease of the model parameters. Extensive experiments of monocular depth estimation have been conducted on two well-known challenging datasets (i.e., NYUD-v2 and Make3D) as well as the mixed dataset, and have verified the effectiveness of the proposed dynamic framework. © 2022 IEEE.

关键词： Deep neural networks

来源：评论

学校读者我要写书评

暂无评论

FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs

arXiv

引用

arXiv 2025年

作者： Wang, Xiaoqin Ma, Xusen Hou, Xianxu Ding, Meidan Li, Yudong Chen, Junliang Chen, Wenting Peng, Xiaoyang Shen, Linlin Computer Vision Institute College of Computer Science and Software Engineering Shenzhen University China National Engineering Laboratory for Big Data System Computing Technology Shenzhen University China Guangdong Provincial Key Laboratory of Intelligent Information Processing China AIAC Xi’an Jiaotong-Liverpool University China Tsinghua University China The Hong Kong Polytechnic University Hong Kong City University of Hong Kong Hong Kong Sun Yat-sen University China

Multimodal large language models (MLLMs) have demonstrated remarkable capabilities in various tasks. However, effectively evaluating these MLLMs on face perception remains largely unexplored. To address this gap, we introduce FaceBench, a dataset featuring hierarchical multi-view and multi-level attributes specifically designed to assess the comprehensive face perception abilities of MLLMs. Initially, we construct a hierarchical facial attribute structure, which encompasses five views with up to three levels of attributes, totaling over 210 attributes and 700 attribute values. Based on the structure, the proposed FaceBench consists of 49,919 visual question-answering (VQA) pairs for evaluation and 23,841 pairs for fine-tuning. Moreover, we further develop a robust face perception MLLM baseline, Face-LLaVA, by training with our proposed face VQA data. Extensive experiments on various mainstream MLLMs and Face-LLaVA are conducted to test their face perception ability, with results also compared against human performance. The results reveal that, the existing MLLMs are far from satisfactory in understanding the fine-grained facial attributes, while our Face-LLaVA significantly outperforms existing open-source models with a small amount of training data and is comparable to commercial ones like GPT-4o and Gemini. The dataset will be released at https://***/CVI-SZU/FaceBench. Copyright © 2025, The Authors. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：