In conventional few-shot learning approaches, masked image modeling paradigms such as masked autoencoders are typically used as feature extractors, followed by classifiers. Traditional masked autoencoders depend on st...
详细信息
Rate control is a critical component for image and video compression Particularly under limited network bandwidth conditions, bitrate control is essential to ensure efficient image transmission by effectively allocati...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Rate control is a critical component for image and video compression Particularly under limited network bandwidth conditions, bitrate control is essential to ensure efficient image transmission by effectively allocation channel resources. In this research, since both Channel and Spatial have relationship with rate allocation, we first propose a joint Channel-wise and Spatial-wise Quantization scheme to determine optimal quantization parameters. Subsequently, we develop a quantization step estimation network to obtain parameters to efficiently allocate rate according to target rate. Experiments demonstrate that our algorithm significantly improve compressed image quality with minimal bitrate distortion and achieve accurate rate control with nearly 3% average bitrate error.
Robust Reversible Watermarking (RRW) enables perfect recovery of cover images and watermarks in lossless channels while ensuring robust watermark extraction under lossy channels. However, existing RRW methods, mostly ...
详细信息
Robust Reversible Watermarking (RRW) enables perfect recovery of cover images and watermarks in lossless channels while ensuring robust watermark extraction under lossy channels. However, existing RRW methods, mostly non-deep learning-based, suffer from complex designs, high computational costs, and poor robustness limiting their practical applications. To address these issues, this paper proposes Deep Robust Reversible Watermarking (DRRW), a deep learning-based RRW scheme. DRRW introduces an Integer Invertible Watermark Network (iIWN) to achieve an invertible mapping between integer data distributions, fundamentally addressing the limitations of conventional RRW approaches. Unlike traditional RRW methods requiring task-specific designs for different distortions, DRRW adopts an encoder-noise layer-decoder framework, enabling adaptive robustness against various distortions through end-to-end training. During inference, the cover image and watermark are mapped into an overflowed stego image and latent variables. Arithmetic coding efficiently compresses these into a compact bitstream, which is embedded via reversible data hiding to ensure lossless recovery of both the image and watermark. To reduce pixel overflow, we introduce an overflow penalty loss, significantly shortening the auxiliary bitstream while improving both robustness and stego image quality. Additionally, we propose an adaptive weight adjustment strategy that eliminates the need to manually preset the watermark loss weight, ensuring improved training stability and performance. Experiments on multiple datasets demonstrate that DRRW achieves notable performance advantages. Compared to state-of-the-art RRW methods, DRRW improves robustness and reduces embedding, extraction, and recovery complexities by 55.14×, 5.95×, and 3.57×, respectively. The auxiliary bitstream is shortened by 43.86×, and reversible embedding succeeds on 16,762 images in the PASCAL VOC 2012 dataset, marking a significant step toward pra
Due to the high cost of Image Quality Assessment (IQA) datasets, achieving robust generalization remains challenging for prevalent deep learning-based IQA methods. To address this, this paper proposes a novel end-to-e...
Due to the high cost of Image Quality Assessment (IQA) datasets, achieving robust generalization remains challenging for prevalent deep learning-based IQA methods. To address this, this paper proposes a novel end-to-end blind IQA method: Causal-IQA. Specifically, we first analyze the causal mechanisms in IQA tasks and construct a causal graph to understand the interplay and confounding effects between distortion types, image contents, and subjective human ratings. Then, through shifting the focus from correlations to causality, Causal-IQA aims to improve the estimation accuracy of image quality scores by mitigating the confounding effects using a causality-based optimization strategy. This optimization strategy is implemented on the sample subsets constructed by a Counterfactual Division process based on the Backdoor Criterion. Extensive experiments illustrate the superiority of Causal-IQA.
Self-supervised learning usually uses a large amount of unlabeled data to pre-train an encoder which can be used as a general-purpose feature extractor, such that downstream users only need to perform fine-tuning oper...
Self-supervised learning usually uses a large amount of unlabeled data to pre-train an encoder which can be used as a general-purpose feature extractor, such that downstream users only need to perform fine-tuning operations to enjoy the benefit of "large model". Despite this promising prospect, the security of pre-trained encoder has not been thoroughly investigated yet, especially when the pre-trained encoder is publicly available for commercial *** this paper, we propose AdvEncoder, the first framework for generating downstream-agnostic universal adversarial examples based on the pre-trained encoder. AdvEncoder aims to construct a universal adversarial perturbation or patch for a set of natural images that can fool all the downstream tasks inheriting the victim pre-trained encoder. Unlike traditional adversarial example works, the pre-trained encoder only outputs feature vectors rather than classification labels. Therefore, we first exploit the high frequency component information of the image to guide the generation of adversarial examples. Then we design a generative attack framework to construct adversarial perturbations/patches by learning the distribution of the attack surrogate dataset to improve their attack success rates and transferability. Our results show that an attacker can successfully attack downstream tasks without knowing either the pre-training dataset or the downstream dataset. We also tailor four defenses for pre-trained encoders, the results of which further prove the attack ability of AdvEncoder. Our codes are available at: https://***/CGCL-codes/AdvEncoder.
In this paper, we consider the optimization of federated learning (FL) over a realistic wireless multiple-input multiple-output (MIMO) communication system with digital modulation and over-the-air computation (AirComp...
In this paper, we consider the optimization of federated learning (FL) over a realistic wireless multiple-input multiple-output (MIMO) communication system with digital modulation and over-the-air computation (AirComp). In such a system, MIMO devices transmit their locally trained FL models to a parameter server (PS) using beamforming to maximize the number of devices scheduled for transmission. AirComp enables efficient wireless model aggregation by the PS in bandwidth-limited settings. However, wireless channel fading can produce distortions in AirComp-based FL. To tackle this challenge, we develop a novel aggregation scheme that combines digital modulation with AirComp to mitigate wireless fading while ensuring communication efficiency. We formulate this as a joint transmit-receive beamforming design optimization problem which dynamically adjusts the beamforming matrices to minimize the FL training loss with transmission errors. To solve this problem based on limited information at the PS, we employ an artificial neural network (ANN) to estimate the local FL models of all devices. Then, we derive a closed-form optimal design of the transmit and receive beamforming matrices based on predicted FL models. Numerical evaluations validate the advantages of the proposed methodology in terms of model training performance compared with baselines.
Images captured in haze conditions, especially at nighttime with low light, often suffer from degraded visibility, contrasts, and vividness, which makes it difficult to carry out the following vision tasks. In this ar...
详细信息
In unmanned aerial systems, especially in complex environments, accurately detecting tiny objects is crucial. Resizing images is a common strategy to improve detection accuracy, particularly for small objects. However...
详细信息
The sparse interactions between users and items have aggravated the difficulty of their representations in recommender systems. Existing methods leverage tags to alleviate the sparsity problem but ignore prevalent log...
详细信息
ISBN:
(数字)9798350317152
ISBN:
(纸本)9798350317169
The sparse interactions between users and items have aggravated the difficulty of their representations in recommender systems. Existing methods leverage tags to alleviate the sparsity problem but ignore prevalent logical relations among items and tags (e.g., membership, hierarchy, and exclusion), which can be leveraged to enhance the accuracy of modeling user preferences and conducting recommendations. To this end, we propose to extract logical relations among item tags from existing tag taxonomies and exploit the individual strengths of the Poincaré and the Lorentz models in hyperbolic space for logical relation modeling towards enhanced recommendations. Moreover, we find that the logical relations directly extracted from existing tag taxonomies can be inaccurate and coarse. Therefore, we further devise innovative consistency-based and granularity- based weighting mechanisms based on user behavior patterns for data-driven logical relation mining that can be jointly optimized along with recommendations in an end-to-end fashion. Extensive experiments on four real-world benchmark datasets show drastic performance gains brought by our proposed framework, which constantly achieves an average of 8.25% improvement over state-of-the-art competitors regarding both Recall and NDCG metrics. Insightful case studies further demonstrate that our automatically refined logical relations are highly accurate and interpretable.
With the wide applications of the Global Navigation Satellite System (GNSS) in autonomous driving scenarios, the demand for high-precision positioning of navigation systems has increased dramatically in complex multip...
ISBN:
(纸本)9780936406350
With the wide applications of the Global Navigation Satellite System (GNSS) in autonomous driving scenarios, the demand for high-precision positioning of navigation systems has increased dramatically in complex multipath environments. Conventional model-based methods are constrained by strict assumptions about noise models and can hardly model complex environment errors. In contrast, approaches based on artificial intelligent learning have become an important direction to solving the problem of high-precision positioning because learning-based approaches only require simple assumptions. However, current learning-based approaches are facing the following issues. The existing Graph Neural Network-based (GNN) method could hardly adapt to dynamically changing driving environment scenarios since it considers positioning discretely. On the other hand, existing Reinforcement Learning-based (RL) approaches ignore the relationship between multi-constellation satellites, resulting in an inadequate description of the driving correction environment observations. In this paper, we construct a GNN-driven recurrent reinforcement learning method to consider the GNSS measurement of multi-constellation satellites and to learn real-time correction strategy in the dynamic driving environment. To establish a comprehensive positioning correction environment, we construct a multi-constellation graph observation, based on the feature vector concerning GNSS measurement of multi-constellation satellites and edges for satellites in and between constellations. To make more effective use of GNSS measurements, we employ the graph embedding module to deal with the multi-constellation graph inputs, to extract hidden topological features to form the brief states about relationships between multi-constellation satellites for the RL environment. Finally, we construct a recurrent actor-critic structured RL model with cumulative reward and continuous action space to exploit historical information and a
暂无评论