Plenoptic cameras are light field capturing devices able to acquire large amounts of angular and spatial information. The lenslet video produced by such cameras presents on each frame a distinctive hexagonal pattern o...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Plenoptic cameras are light field capturing devices able to acquire large amounts of angular and spatial information. The lenslet video produced by such cameras presents on each frame a distinctive hexagonal pattern of micro-images. Due to the particular structure of lenslet images, traditional video codecs perform poorly on lenslet video. Previous works have proposed a preprocessing scheme that cuts and realigns the micro-images on each lenslet frame. While effective, this method introduces high frequency components into the processed image. In this paper, we propose an additional step to the aforementioned scheme by applying an invertible smoothing transform. We evaluate the enhanced scheme on lenslet video sequences captured with single-focused and multi-focused plenoptic cameras. On average, the enhanced scheme achieves 9.85% bitrate reduction compared to the existing scheme.
Blind image Quality Assessment (BIQA) is essential in computational vision for predicting the visual quality of digital images without reference counterparts. Despite advancements through convolutional neural networks...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Blind image Quality Assessment (BIQA) is essential in computational vision for predicting the visual quality of digital images without reference counterparts. Despite advancements through convolutional neural networks (CNNs), a significant challenge in BIQA remains the long-tail distribution of image quality scores, leading to biased training and reduced model generalization. To address this, we restructured the KonIQ-10k dataset to create an imbalanced version named KonIQ-10k-LT, manipulating the distribution of image quality scores to have opposing distributions in the training and validation sets. This restructuring increases the proportion of certain quality scores in the training set while decreasing them in the validation set. Experimental results show a significant performance decline of BIQA models on the KonIQ-10k-LT dataset compared to the original KonIQ-10k, highlighting the challenge posed by the long-tail distribution. To mitigate this issue, we propose a Proportion Weighted Balancing (PWB) method as a baseline, designed to enhance the robustness and generalization ability of BIQA models. Our findings demonstrate that the proposed WB method improves the performance and reliability of BIQA models under these challenging conditions.
While saliency detection for images has been extensively studied during the past decades, only a little work explores the influence of different viewing devices (i.e., tablet computer, mobile phone) towards human visu...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
While saliency detection for images has been extensively studied during the past decades, only a little work explores the influence of different viewing devices (i.e., tablet computer, mobile phone) towards human visual attention behavior. The lack of research in this field hinders the research progress in cross-device image saliency detection. In this paper, we first establish a novel cross-device saliency detection (CDSD) database based on eye-tracking experiments and investigate subjects' visual attention behavior when using different viewing devices. Then, we evaluate several classic saliency detection models using the CDSD database and the evaluation results indicate that the cross-device performance of these models need further improvement. Finally, some meaningful discussions are provided which might enlighten the design of cross-device saliency detection model. The proposed CDSD database will be made publicly available.
Generative models have significantly advanced generative AI, particularly in image and video generation. Recognizing their potential, researchers have begun exploring their application in image compression. However, e...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Generative models have significantly advanced generative AI, particularly in image and video generation. Recognizing their potential, researchers have begun exploring their application in image compression. However, existing methods face two primary challenges: limited performance improvement and high model complexity. In this paper, to address these two challenges, we propose a perceptual image compression solution by introducing a conditional diffusion model. Given that compression performance heavily depends on the decoder's generative capability, we base our decoder on the diffusion transformer architecture. To address the model complexity problem, we implement the diffusion transformer architecture with Swin transformer. Equipped with enhanced generative capability, we further augment the decoder with informative features using a multi-scale feature fusion module. Experimental results demonstrate that our approach surpasses existing perceptual image compression methods while achieving lower model complexity.
When training a Learned image Compression model, the loss function is minimized such that the encoder and the decoder attain a target Rate-Distorsion trade-off. Therefore, a distinct model shall be trained and stored ...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
When training a Learned image Compression model, the loss function is minimized such that the encoder and the decoder attain a target Rate-Distorsion trade-off. Therefore, a distinct model shall be trained and stored at the transmitter and receiver for each target rate, fostering the quest for efficient variable bitrate compression schemes. This paper proposes plugging Low-Rank Adapters into a transformer-based pre-trained LIC model and training them to meet different target rates. With our method, encoding an image at a variable rate is as simple as training the corresponding adapters and plugging them into the frozen pre-trained model. Our experiments show performance comparable with state-of-the-art fixed-rate LIC models at a fraction of the training and deployment cost. We publicly released the code at https://***/EIDOSLAB/ALICE.
Lookup tables (LUTs) are commonly used to speed up imageprocessing by handling complex mathematical functions like sine and exponential calculations. They are used in various applications such as camera image process...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Lookup tables (LUTs) are commonly used to speed up imageprocessing by handling complex mathematical functions like sine and exponential calculations. They are used in various applications such as camera imageprocessing, high-dynamic range imaging, and edge-preserving filtering. However, due to the increasing gap between computing and input/output performance, LUTs are becoming less effective. Even though specific circuits like SIMD can improve LUT efficiency, they still need to bridge the performance gap fully. The gap makes it difficult to choose between direct numerical and LUT calculations. For this problem, a register-LUTs method with the nearest neighbor was proposed;however, it is limited for functions with narrow-range values approaching zero. In this paper, we propose a method for using register LUTs to process images efficiently over a wide range of values. Our contributions include proposing register-LUT with linear interpolation for efficient computation, using a smaller data type for further efficiency, and suggesting an efficient data retrieving method.
Recent advancements in learned image compression methods have demonstrated superior rate-distortion performance and remarkable potential compared to traditional compression techniques. However, the core operation of q...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Recent advancements in learned image compression methods have demonstrated superior rate-distortion performance and remarkable potential compared to traditional compression techniques. However, the core operation of quantization, inherent to lossy image compression, introduces errors that can degrade the quality of the reconstructed image. To address this challenge, we propose a novel Quantization Error Compensator (QEC), which leverages spatial context within latent representations and hyperprior information to effectively mitigate the impact of quantization error. Moreover, we propose a tailored quantization error optimization training strategy to further improve rate-distortion performance. Notably, QEC serves as a lightweight, plug-and-play module, offering high flexibility and seamless integration into various learned image compression methods. Extensive experimental results consistently demonstrate significant coding efficiency improvements achievable by incorporating the proposed QEC into state-of-the-art methods, with a slight increase in runtime.
Depth estimation of light field images is a crucial technique in various applications, including 3D reconstruction, autonomous driving, and object tracking. However, current deep-learning methods ignore the geometric ...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Depth estimation of light field images is a crucial technique in various applications, including 3D reconstruction, autonomous driving, and object tracking. However, current deep-learning methods ignore the geometric information of the light field image and are limited to learning repetitive textures, which leads to inaccurate estimates of depth. The paper proposes a light field depth estimation network that fuses multi-scale semantic information with geometric information to address the problem of non-adaptation for repeated texture regions. The main focus of the network is the semantic and geometric information fusion (SGI) module, which can adaptively combine semantic and geometric information to improve the efficiency of cost aggregation. Furthermore, SGI module establishes a direct link between feature extraction and cost aggregation, providing feedback for feature extraction and guiding more efficient feature extraction. The experimental results from the optical field synthesis dataset HCI 4D demonstrate that the method has high accuracy and generalisation performance.
Recent advancements in neural compression have surpassed traditional codecs in PSNR and MS-SSIM measurements. However, at low bit-rates, these methods can introduce visually displeasing artifacts, such as blurring, co...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Recent advancements in neural compression have surpassed traditional codecs in PSNR and MS-SSIM measurements. However, at low bit-rates, these methods can introduce visually displeasing artifacts, such as blurring, color shifting, and texture loss, thereby compromising perceptual quality of images. To address these issues, this study presents an enhanced neural compression method designed for optimal visual fidelity. We have trained our model with a sophisticated semantic ensemble loss, integrating Charbonnier loss, perceptual loss, style loss, and a non-binary adversarial loss, to enhance the perceptual quality of image reconstructions. Additionally, we have implemented a latent refinement process to generate content-aware latent codes. These codes adhere to bit-rate constraints, and prioritize bit allocation to regions of greater importance. Our empirical findings demonstrate that this approach significantly improves the statistical fidelity of neural image compression.
Traffic sign recognition plays a crucial role in self-driving cars, but unfortunately, it is vulnerable to adversarial patches (AP). Although AP can efficiently fool DNN-based models in previous studies, the connectio...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Traffic sign recognition plays a crucial role in self-driving cars, but unfortunately, it is vulnerable to adversarial patches (AP). Although AP can efficiently fool DNN-based models in previous studies, the connection between image forensics and AP detection still needs to be explored. From a high-level point of view, their goals are the same. That is to find tampered regions and prevent false positives in the meantime. A natural question arises: "Is achieving application-agnostic anomaly detection possible?" In this paper, we propose image Forensics Defense Against Adversarial Patch (IDAP), a framework to defend against adversarial patches via generalizable features learned from tampered images. In addition, we incorporate the Hausdorff erosion loss into our network model for joint training to complete the shape of a predicted mask. Extensive experimental comparisons on three datasets, including COCO, DFG, and APRICOT demonstrate that IDAP outperforms state-of-the-art AP detection methods.
暂无评论