检索结果-内蒙古大学图书馆

Inserting anybody in diffusion models via celeb basis 23

学校读者我要写书评

暂无评论

Inserting anybody in diffusion models via celeb basis

Proceedings of the 37th International Conference on Neural information Processing Systems

作者： Ge Yuan Xiaodong Cun Yong Zhang Maomao Li Chenyang Qi Xintao Wang Ying Shan Huicheng Zheng School of Computer Science and Engineering Sun Yat-sen University and Key Laboratory of Machine Intelligence and Advanced Computing Ministry of Education China and Guangdong Key Laboratory of Information Security Technology and Tencent AI Lab Tencent AI Lab The Hong Kong University of Science and Technology School of Computer Science and Engineering Sun Yat-sen University and Key Laboratory of Machine Intelligence and Advanced Computing Ministry of Education China and Guangdong Key Laboratory of Information Security Technology

Exquisite demand exists for customizing the pretrained large text-to-image model, e.g. Stable Diffusion, to generate innovative concepts, such as the users themselves. However, the newly-added concept from previous customization methods often shows weaker combination abilities than the original ones even given several images during training. We thus propose a new personalization method that allows for the seamless integration of a unique individual into the pre-trained diffusion model using just one facial photograph and only 1024 learnable parameters under 3 minutes. So we can effortlessly generate stunning images of this person in any pose or position, interacting with anyone and doing anything imaginable from text prompts. To achieve this, we first analyze and build a well-defined celeb basis from the embedding space of the pre-trained large text encoder. Then, given one facial photo as the target identity, we generate its own embedding by optimizing the weight of this basis and locking all other parameters. Empowered by the proposed celeb basis, the new identity in our customized model showcases a better concept combination ability than previous personalization methods. Besides, our model can also learn several new identities at once and interact with each other where the previous customization model fails to. Project page is at: http://***. Code is at: https://***/ygtxr1997/CelebBasis.

关键词：

DarkFed: A data-Free Backdoor Attack in Federated Learning

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Li, Minghui Wan, Wei Ning, Yuxuan Hu, Shengshan Xue, Lulu Zhang, Leo Yu Wang, Yichen School of Software Engineering Huazhong University of Science and Technology China National Engineering Research Center for Big Data Technology and System China Services Computing Technology and System Lab China Hubei Engineering Research Center on Big Data Security China Hubei Key Laboratory of Distributed System Security China School of Cyber Science and Engineering Huazhong University of Science and Technology China School of Computer Science and Technology Huazhong University of Science and Technology China School of Information and Communication Technology Griffith University Australia

Federated learning (FL) has been demonstrated to be susceptible to backdoor attacks. However, existing academic studies on FL backdoor attacks rely on a high proportion of real clients with main task-related data, which is impractical. In the context of real-world industrial scenarios, even the simplest defense suffices to defend against the state-of-the-art attack, 3DFed. A practical FL backdoor attack remains in a nascent stage of development. To bridge this gap, we present DarkFed. Initially, we emulate a series of fake clients, thereby achieving the attacker proportion typical of academic research scenarios. Given that these emulated fake clients lack genuine training data, we further propose a data-free approach to backdoor FL. Specifically, we delve into the feasibility of injecting a backdoor using a shadow dataset. Our exploration reveals that impressive attack performance can be achieved, even when there is a substantial gap between the shadow dataset and the main task dataset. This holds true even when employing synthetic data devoid of any semantic information as the shadow dataset. Subsequently, we strategically construct a series of covert backdoor updates in an optimized manner, mimicking the properties of benign updates, to evade detection by defenses. A substantial body of empirical evidence validates the tangible effectiveness of DarkFed. Copyright © 2024, The Authors. All rights reserved.

关键词： Semantics

Deep Feature Surgery: Towards Accurate and Efficient Multi-Exit Networks

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Gong, Cheng Chen, Yao Luo, Qiuyang Lu, Ye Li, Tao Zhang, Yuzhi Sun, Yufei Zhang, Le College of Software Nankai University China College of Computer Science Nankai University China National University of Singapore Singapore School of Information and Communication Engineering University of Electronic Science and Technology of China China HAIHE Lab of ITAI China Tianjin Key Laboratory of Network and Data Security Technology China Key Laboratory of Data and Intelligent System Security Ministry of Education

Multi-exit network is a promising architecture for efficient model inference by sharing backbone networks and weights among multiple exits. However, the gradient conflict of the shared weights results in sub-optimal accuracy. This paper introduces Deep Feature Surgery (DFS), which consists of feature partitioning and feature referencing approaches to resolve gradient conflict issues during the training of multi-exit networks. The feature partitioning separates shared features along the depth axis among all exits to alleviate gradient conflict while simultaneously promoting joint optimization for each exit. Subsequently, feature referencing enhances multi-scale features for distinct exits across varying depths to improve the model accuracy. Furthermore, DFS reduces the training operations with the reduced complexity of backpropagation. Experimental results on Cifar100 and ImageNet datasets exhibit that DFS provides up to a 50.00% reduction in training time and attains up to a 6.94% enhancement in accuracy when contrasted with baseline methods across diverse models and tasks. Budgeted batch classification evaluation on MSDNet demonstrates that DFS uses about 2× fewer average FLOPs per image to achieve the same classification accuracy as baseline methods on Cifar100. The code is availab.e at https: //***/GongCheng1919/dfs. Copyright © 2024, The Authors. All rights reserved.

关键词： Budget control

View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network

学校读者我要写书评

暂无评论

View-decoupled Transformer for Person Re-identification unde...

Conference on computer Vision and Pattern Recognition (CVPR)

作者： Quan Zhang Lei Wang Vishal M. Patel Xiaohua Xie Jianhuang Lai School of Computer Science and Engineering Sun Yat-Sen University China Department of Electrical and Computer Engineering Johns Hopkins University USA Pazhou Lab (HuangPu) Guangdong China Guangdong Province Key Laboratory of Information Security Technology Guangdong China Key Laboratory of Machine Intelligence and Advanced Computing Ministry of Education China

ISBN: (数字)9798350353006

ISBN: (纸本)9798350353013

Existing person re-identification methods have achieved remarkable advances in appearance-based identity association across homogeneous cameras, such as ground-ground matching. However, as a more practical scenario, aerial-ground person re-identification (AGPReID) among heterogeneous cameras has received minimal attention. To alleviate the disruption of discriminative identity representation by dramatic view discrepancy as the most significant challenge in AGPReID, the view-decoupled transformer (VDT) is proposed as a simple yet effective framework. Two major components are designed in VDT to decouple view-related and view-unrelated features, namely hierarchical subtractive separation and orthogonal loss, where the former separates these two features inside the VDT, and the latter constrains these two to be independent. In addition, we contribute a large-scale AGPReID dataset called CARGO, consisting of five/eight aerial/ground cameras, 5,000 identities, and 108,563 images. Experiments on two datasets show that VDT is a feasible and effective solution for AGPReID, surpassing the previous method on mAP/Rank1 by up to 5.0%/2.7% on CARGO and 3.7%/5.2% on AG-ReID, keeping the same magnitude of computational complexity. Our project is availab.e at https://***/LinlyAC/VDT-AGPReID.

关键词： computer vision Cameras Transformers Pattern recognition Computational complexity Identification of persons

Cell Graph Transformer for Nuclei Classification

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Lou, Wei Li, Guanbin Wan, Xiang Li, Haofeng Shenzhen Research Institute of Big Data Shenzhen China The Chinese University of Hong Kong Shenzhen China School of Computer Science and Engineering Sun Yat-sen University Guangzhou China GuangDong Province Key Laboratory of Information Security Technology China

Nuclei classification is a critical step in computer-aided diagnosis with histopathology images. In the past, various methods have employed graph neural networks (GNN) to analyze cell graphs that model inter-cell relationships by considering nuclei as vertices. However, they are limited by the GNN mechanism that only passes messages among local nodes via fixed edges. To address the issue, we develop a cell graph transformer (CGT) that treats nodes and edges as input tokens to enable learnable adjacency and information exchange among all nodes. Nevertheless, training the transformer with a cell graph presents another challenge. Poorly initialized features can lead to noisy self-attention scores and inferior convergence, particularly when processing the cell graphs with numerous connections. Thus, we further propose a novel topology-aware pretraining method that leverages a graph convolutional network (GCN) to learn a feature extractor. The pre-trained features may suppress unreasonable correlations and hence ease the finetuning of CGT. Experimental results suggest that the proposed cell graph transformer with topology-aware pretraining significantly improves the nuclei classification results, and achieves the state-of-the-art performance. Code and models are availab.e at https://***/lhaof/CGT Copyright © 2024, The Authors. All rights reserved.

关键词： computer aided diagnosis

VFGCN: A Vertical Federated Learning Framework With Privacy Preserving for Graph Convolutional Network

学校读者我要写书评

暂无评论

IEEE Transactions on Dependable and Secure Computing 2025年

作者： Wang, Gang Li, Qingming Liu, Ximeng Yan, Xiaoran Dong, Qingkuan Wu, Huiwen Kong, Xiangjie Zhou, Li Zhejiang Lab HangZhou310000 China Zhejiang University College of Computer Science and Technology Hangzhou310058 China Fuzhou University Key Laboratory of Information Security of Network Systems College of Computer and Big Data Fuzhou350108 China Xidian University State Key Laboratory of Integrated Services Networks Xi'an710071 China Zhejiang University of Technology College of Computer Science and Technology Hangzhou310023 China Dalian University of Technology School of Software Dalian116620 China

Due to the robust representational capabilities of graph data, employing graph neural networks for its processing has demonstrated superior performance over conventional deep learning algorithms. Graph data encompasses abundant features and structural information;however, its large-scale collection is often challenging in practice. This difficulty arises because data predominantly exists in isolated compartments, making it arduous to harmonize information across various organizations or to enable multiple organizations to collab.rate effectively while safeguarding local data privacy. In light of an extreme data distribution scenario, where each client possesses distinct nodes with partially overlapping segments yet divergent data features, we introduce a dual-cloud server architecture. This framework encompasses the design of four secure subprotocols: ReEnc (secure re-encryption), SecPSI (secure outsourcing of PSI), SecWeight (secure weight calculation), and SecAgg (secure aggregation). Together, these components facilitate a vertical federated learning framework for graph convolutional networks, ensuring privacy preservation. We provide a security proof for the entire system and extensive evaluation on three benchmark datasets (Cora, Citeseer, and Pubmed) illustrates that our Vertical Federated Graph Convolutional Network (VFGCN) surpasses existing privacy-preserving methodologies. © 2004-2012 IEEE.

关键词： Privacy by design

EFFICIENT ONLINE lab.L CONSISTENT HASHING FOR LARGE-SCALE CROSS-MODAL RETRIEVAL

学校读者我要写书评

暂无评论

EFFICIENT ONLINE LABEL CONSISTENT HASHING FOR LARGE-SCALE CR...

2021 IEEE International Conference on Multimedia and Expo, ICME 2021

作者： Yi, Jinhan Liu, Xin Cheung, Yiu-Ming Xu, Xing Fan, Wentao He, Yi Department of Computer Science and Technology Huaqiao University Xiamen361021 China Xiamen Key Lab. of Computer Vision and Pattern Recognition Fujian Key Lab. of Big Data Intelligence and Security China Department of Computer Science Hong Kong Baptist University Kowloon Hong Kong School of Computer Science and Engineering University of Electronic Science and Technology of China China Provincial Key Laboratory for Computer Information Processing Technology Soochow University China

ISBN: (纸本)9781665438643

Existing cross-modal hashing still faces three challenges: (1) Most batch-based methods are unsuitable for processing large-scale and streaming data. (2) Current online methods often suffer from insufficient semantic association, while lacking flexibility to learn the hash functions for varying streaming data. (3) Existing supervised methods always require much computation time or accumulate large quantization loss to learn hash codes. To address above challenges, we present an efficient Online lab.l Consistent Hashing (OLCH) for cross-modal retrieval, which aims to incrementally learn hash codes for the current arriving data, while updating the hash functions at a streaming manner. To be specific, an online semantic representation learning framework is designed to adaptively preserve the semantic similarity across different modalities, and a mini-batch online gradient descent approach associated with forward-backward splitting is developed to optimize the hash functions. Accordingly, the hash codes are adaptively learned online with the high discriminative capability, while avoiding high computation complexity to process the streaming data. Experimental results show its outstanding performance in comparison with the-state-of-arts. © 2021 IEEE computer Society. All rights reserved.

关键词： Hash functions

FedPHE: A Secure and Efficient Federated Learning via Packed Homomorphic Encryption

学校读者我要写书评

暂无评论

IEEE Transactions on Dependable and Secure Computing 2025年

作者： Li, Yuqing Yan, Nan Chen, Jing Wang, Xiong Hong, Jianan He, Kun Wang, Wei Li, Bo Wuhan University Key Laboratory of Aerospace Information Security and Trusted Computing Ministry of Education School of Cyber Science and Engineering Wuhan430072 China Wuhan University RiZhao Information Technology Institute Rizhao276800 China Huazhong University of Science and Technology National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab/Cluster and Grid Computing Lab School of Computer Science and Technology Wuhan430074 China Shanghai Jiao Tong University School of Cyber Science and Engineering Shanghai200240 China Hong Kong University of Science and Technology Department of Computer Science and Engineering Hong Kong

Cross-silo federated learning (FL) enables multiple institutions (clients) to collab.ratively build a global model without sharing private data. To prevent privacy leakage during aggregation, homomorphic encryption (HE) is widely used to encrypt model updates, yet incurs high computation and communication overheads. To reduce these overheads, packed HE (PHE) has been proposed to encrypt multiple plaintexts into a single ciphertext. However, the original design of PHE assumes all clients share a single private key, making the system vulnerable to security threats of ciphertexts being intercepted and decrypted by honest-but-curious clients. Also, it does not consider the heterogeneity among different clients, resulting in undermined training efficiency with slow convergence and stragglers. To address these challenges, we propose FedPHE, a secure and efficient FL framework with PHE by jointly exploiting contribution-aware secure aggregation and straggler-resistant client selection. Using CKKS with sparsification and blinding, FedPHE achieves efficient secure aggregation that allows clients to only provide obscured encrypted updates while the server can perform aggregation by accounting for contributions of local updates. To mitigate the straggler effect, we devise a perturbed sketch-based selection to cherry-pick representative clients with heterogeneous models and computing capabilities in a communication-efficient and privacy-preserving manner. We show, through rigorous security analysis and extensive experiments, that FedPHE can efficiently safeguard clients' privacy, achieve 2.45-6.56× training speedup, cut the communication overhead by 1.32-24.85×, and reduce straggler effects by 1.89-2.78×. © 2004-2012 IEEE.

关键词： Privacy by design

Semi-Supervised Image Captioning Considering Wasserstein Graph Matching

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Yang, Yang The Nanjing University of Science and Technology Nanjing210094 China PCA Lab Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education Jiangsu Key Lab of Image and Video Understanding for Social Security School of Computer Science and Engineering Nanjing University of Science and Technology China Ministry of Education State Key Lab. for Novel Software Technology Nanjing University China

Image captioning can automatically generate captions for the given images, and the key challenge is to learn a mapping function from visual features to natural language features. Existing approaches are mostly supervised ones, i.e., each image has a corresponding sentence in the training set. However, considering that describing images always requires a huge of manpower, we usually have limited amount of described images (i.e., image-text pairs) and a large number of undescribed images in real-world applications. Thereby, a dilemma is the "Semi-Supervised Image Captioning". To solve this problem, we propose a novel Semi-Supervised Image Captioning method considering Wasserstein Graph Matching (SSIC-WGM), which turns to adopt the raw image inputs to supervise the generated sentences. Different from traditional single modal semi-supervised methods, the difficulty of semi-supervised cross-modal learning lies in constructing intermediately comparable information among heterogeneous modalities. In this paper, SSIC-WGM adopts the successful scene graphs as intermediate information, and constrains the generated sentences from two aspects: 1) inter-modal consistency. SSIC-WGM constructs the scene graphs of the raw image and generated sentence respectively, then employs the wasserstein distance to better measure the similarity between region embeddings of different graphs. 2) intra-modal consistency. SSIC-WGM takes the data augmentation techniques for the raw images, then constrains the consistency among augmented images and generated sentences. Consequently, SSIC-WGM combines the cross-modal pseudo supervision and structure invariant measure for efficiently using the undescribed images, and learns more reasonable mapping function. Experiments show that our method can outperform state-of-the-art comparison methods on the MS-COCO "Karpathy" offline test split, under different complex semi-supervised scenarios. Copyright © 2024, The Authors. All rights reserved.

关键词： Visual languages