Federated learning inherently provides a certain level of privacy protection, which however is often inadequate in many real-world scenarios. Existing privacy-preserving methods frequently incur unbearable time overhe...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Federated learning inherently provides a certain level of privacy protection, which however is often inadequate in many real-world scenarios. Existing privacy-preserving methods frequently incur unbearable time overheads or result in non-negligible deterioration to model performance, thus suffering from the tradeoff between performance and privacy. In this work, we propose a novel Federated Privacy-Preserving Knowledge Transfer framework, namely FedPPKT, which employs data-free knowledge distillation in a meta-learning manner to rapidly generates pseudo data and performs privacy-preserving knowledge transfer. FedPPKT establishes a protective barrier between the original private data and the federated model, thereby ensuring user privacy. Furthermore, leveraging the few-round strategy of FedPPKT, it has the capability to reduce the number of communication rounds, further mitigating the risk of privacy exposure for user data. With the help of the meta generator, the problem of uneven local label distribution on clients is alleviated, mitigating data heterogeneity and improving model performance. Experiments show that FedPPKT outperforms the state-of-the-art privacy-preserving federated learning methods. Our code is publicly available at https://***/HIT-weiqb/FedPPKT.
With the rapid development of video-on-demand (VOD) and real-time streaming video technologies, the accurate objective assessment of streaming video Quality of Experience (QoE) has become a focal point for optimizing ...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
With the rapid development of video-on-demand (VOD) and real-time streaming video technologies, the accurate objective assessment of streaming video Quality of Experience (QoE) has become a focal point for optimizing streaming-related technologies. However, due to the inherent transmission distortions caused by poor Quality of Service (QoS) conditions in streaming videos, such as intermittent stalling, rebuffering, and drastic changes in video sharpness due to bitrate fluctuations, evaluating streaming video QoE presents numerous challenges. This paper introduces a large and diverse in-the-wild streaming video QoE evaluation dataset - the SJLIVE-1k dataset. This work addresses the limitations of corresponding datasets, which lack in-the-wild video sequences under real network conditions and whose amount of video content is insufficient. Furthermore, we propose an end-to-end objective QoE evaluation strategy that extracts video content and QoS features from the video itself without using any extra information. By implementing self-supervised contrastive learning as the "reminder" to bridge the gap between the different types of features, our approach achieves state-of-the-art results across three datasets. Our proposed dataset will be released to facilitate further research.
Currently, with the extensive application of digital cameras in dynamic capturing, implications such as camera jitter, out-of-focus, and target motion induce various types and degrees of image blurring. Deep learning ...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Currently, with the extensive application of digital cameras in dynamic capturing, implications such as camera jitter, out-of-focus, and target motion induce various types and degrees of image blurring. Deep learning (DL) is a powerful technique that offers data-adaptive recovery without prior characterization of deblurring filter kernels. However, end-to-end networks can still be improved to restore regions with severe localized blurring. Therefore, we propose a multi-scale circular transformer (MSC-Former) employing averaged neighborhood attention (AvgNA) to solve this problem. It computes the local attention of each feature pixel by learning the correlation between the center and the surrounding windowed neighborhood, then produces integrated attention with direct averaging. We employ a multiscale circular strategy (MSCS) to compute attention at different spatial scales to expand the receptive field while maintaining a low parameter count. It uses concentric circular regions with varying radii to define neighborhoods at different scales, which expands the receptive field during attention computation while capturing spatial continuity across larger neighborhoods. Experimental results demonstrate that the proposed method surpasses the recent state-of-the-art deblurring techniques on the benchmark dataset.
The imaging quality of automotive cameras is crucial in complex driving environments. Therefore, it is essential to conduct subjective experiments that can realistically reflect drivers' evaluation of the imaging ...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
The imaging quality of automotive cameras is crucial in complex driving environments. Therefore, it is essential to conduct subjective experiments that can realistically reflect drivers' evaluation of the imaging quality of automotive cameras in real traffic scenarios. To accurately assess the imaging quality of automotive cameras, this paper proposes a no-reference quality assessment method with quality scores that are highly consistent with human subjective perception. Initially, this study constructs a new image quality assessment dataset and then obtains the subjective scores of image quality through subjective experiments. The dataset is constructed by using a variety of realistic props to simulate scene elements that might be captured by an automotive camera and are captured using a wide range of cameras with different sensor types, lens focus, and viewing angles, resulting in a dataset of diverse images. The objective quality assessment method proposed in this paper consists of an object detection network and a multi-branch quality evaluation network. The object detection network is responsible for identifying and classifying scene elements, while the multi-branch quality evaluation network performs feature extraction and score regression on various types of elements to effectively evaluate the imaging quality of the automotive cameras. In the experiments, this no-reference quality assessment method is tested on our built dataset, and the results show that the proposed method exhibits the best performance compared with the state-of-the-art image quality assessment methods.
Open vocabulary object detection (OVD), which detects novel categories through detectors trained on base categories, has achieved remarkable advancement attributable to large-scale vision-language models, such as CLIP...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Open vocabulary object detection (OVD), which detects novel categories through detectors trained on base categories, has achieved remarkable advancement attributable to large-scale vision-language models, such as CLIP. The prior OVD works mainly focused on improving the classification accuracy of proposals, ignoring the ability of localization for novel categories. In this work, we propose IoU-aware language-image model tuning (IoU-CLIP) for open vocabulary object detection. Specifically, we construct a region image dataset with different IoU and adopt IoU values as labels to fine-tune the CLIP model to learn IoU-aware and class-agnostic semantic prompts and visual embeddings. The fine-tuned IoU-CLIP can predict IoU scores for proposals, which interact with classification scores. Meanwhile, IoU-aware and class-agnostic visual embeddings are utilized for box regression to enhance the generalization of the localization capability. We evaluate our method on the COCO and LVIS OVD benchmarks, outperforming the baseline (RegionCLIP) by 5.5% AP(50) and 5.8% AP on novel categories, respectively, achieving state-of-the-art performance.
Lighting conditions significantly affect the quality of both real and AI-generated images. Facial images are particularly sensitive to lighting due to their detailed nature and the importance of facial features in con...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Lighting conditions significantly affect the quality of both real and AI-generated images. Facial images are particularly sensitive to lighting due to their detailed nature and the importance of facial features in conveying identity. Poor lighting can easily obscure these critical details. To address this issue, various portrait relighting methods have been developed to adjust the lighting in improperly exposed images. However, these methods often encounter challenges such as overexposure, underexposure, and detail loss in the relighted portraits. Consequently, there is a need for effective quality assessment and control of relighted human heads (RHHs). In this study, one proposed simple baseline and three typical relighting methods are applied to six selected human head (HH) images, resulting in the creation of a quality assessment dataset named ReLI-QA, which comprises 840 RHHs. A multidimensional subjective quality assessment method based on visual guidance is proposed to accurately evaluate the visual quality of each RHH in the dataset. By analyzing the results of subjective experiments, the quality of RHHs is shown to be affected by multiple factors. Finally, based on ReLI-QA, some typical image quality assessment (IQA) methods are selected for benchmark experiments. The experimental results show the limitations of the existing methods in RHH quality assessment. The dataset and code for this research has been released at https://***/zyj-2000/ReLI-QA.
HTTP adaptive streaming (HAS) constructs bitrate ladders to deliver videos with the best possible quality under varying network conditions. Though per-shot content adaptive encoding (CAE) largely improves the compress...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
HTTP adaptive streaming (HAS) constructs bitrate ladders to deliver videos with the best possible quality under varying network conditions. Though per-shot content adaptive encoding (CAE) largely improves the compression efficiency by constructing the optimal bitrate ladder for each video shot, it suffers from excessive encoding complexity as all the points in the operating space (typically resolution x bitrate) need to be encoded and compared. To address this issue, this paper proposes an efficient bitrate ladder construction method that encodes only a subset of operating points, then uses curve fitting and inter-curve prediction to estimate other points' RD performance. The proposed method enables low-complexity ladder construction even for high-dimension operating spaces that incorporate dimensions like encoding presets. Experiments show that this method can achieve RD performance comparable to the original per-shot CAE with only 42% encoding points. Even when minimizing the encoding points to 3.6% of the original CAE, it achieves 15% BDRate improvements compared to using the fixed bitrate ladder.
Video-based point cloud compression (V-PCC) converts the dynamic point cloud data into video sequences using traditional video codecs for efficient encoding. However, this lossy compression scheme introduces artifacts...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Video-based point cloud compression (V-PCC) converts the dynamic point cloud data into video sequences using traditional video codecs for efficient encoding. However, this lossy compression scheme introduces artifacts that degrade the color attributes of the data. This paper introduces a framework designed to enhance the color quality in the V-PCC compressed point clouds. We propose the lightweight de-compression Unet (LDC-Unet), a 2D neural network, to optimize the projection maps generated during V-PCC encoding. The optimized 2D maps will then be back-projected to the 3D space to enhance the corresponding point cloud attributes. Additionally, we introduce a transfer learning strategy and develop a customized natural image dataset for the initial training. The model was then fine-tuned using the projection maps of the compressed point clouds. The whole strategy effectively addresses the scarcity of point cloud training data. Our experiments, conducted on the public 8i voxelized full bodies long sequences (8iVSLF) dataset, demonstrate the effectiveness of our proposed method in improving the color quality.
We present a new image compression paradigm to achieve "intelligently coding for machine" by cleverly leveraging the common sense of Large Multimodal Models (LMMs). We are motivated by the evidence that larg...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
We present a new image compression paradigm to achieve "intelligently coding for machine" by cleverly leveraging the common sense of Large Multimodal Models (LMMs). We are motivated by the evidence that large language/multimodal models are powerful general-purpose semantics predictors for understanding the real world. Different from traditional image compression typically optimized for human eyes, the image coding for machines (ICM) framework we focus on requires the compressed bitstream to more comply with different downstream intelligent analysis tasks. To this end, we employ LMM to tell codec what to compress: 1) first utilize the powerful semantic understanding capability of LMMs w.r.t object grounding, identification, and importance ranking via prompts, to disentangle image content before compression, 2) and then based on these semantic priors we accordingly encode and transmit objects of the image in order with a structured bitstream. In this way, diverse vision benchmarks including image classification, object detection, instance segmentation, etc., can be well supported with such a semantically structured bitstream. We dub our method "SDComp" for "Semantically Disentangled Compression", and compare it with state-of-the-art codecs on a wide variety of different vision tasks. SDComp codec leads to more flexible reconstruction results, promised decoded visual quality, and a more generic/satisfactory intelligent task-supporting ability.
Currently, there is a high demand for neural network-based image compression codecs. These codecs employ non-linear transforms to create compact bit representations and facilitate faster coding speeds on devices compa...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Currently, there is a high demand for neural network-based image compression codecs. These codecs employ non-linear transforms to create compact bit representations and facilitate faster coding speeds on devices compared to the hand-crafted transforms used in classical frameworks. The scientific and industrial communities are highly interested in these properties, leading to the standardization effort of JPEG-AI. The JPEG-AI verification model has been released and is currently under development for standardization. Utilizing neural networks, it can outperform the classic codec VVC intra by over 10% BD-rate operating at base operation point. Researchers attribute this success to the flexible bit distribution in the spatial domain, in contrast to VVC intra's anchor that is generated with a constant quality point. However, our study reveals that VVC intra displays a more adaptable bit distribution structure through the implementation of various block sizes. As a result of our observations, we have proposed a spatial bit allocation method to optimize the JPEG-AI verification model's bit distribution and enhance the visual quality. Furthermore, by applying the VVC bit distribution strategy, the objective performance of JPEG-AI verification mode can be further improved, resulting in a maximum gain of 0.45 dB in PSNR-Y.
暂无评论