Advances in media compression indicate significant potential to drive future media coding standards, e.g., Joint Photographic Experts Group's learning-based image coding technologies (JPEG AI) and Joint Video Expe...
详细信息
ISBN:
(纸本)9781728185514
Advances in media compression indicate significant potential to drive future media coding standards, e.g., Joint Photographic Experts Group's learning-based image coding technologies (JPEG AI) and Joint Video Experts Team's (JVET) deep neural networks (DNN) based video coding. These codecs in fact represent a new type of media format. As a dire consequence, traditional media security and forensic techniques will no longer be of use. This paper proposes an initial study on the effectiveness of traditional watermarking on two state-of-the-art learning based image coding. Results indicate that traditional watermarking methods are no longer effective. We also examine the forensic trails of various DNN architectures in the learning based codecs by proposing a residual noise based source identification algorithm that achieved 79% accuracy.
With the rapid development of 3D technologies, effective no-reference stereoscopic image quality assessment (NR-SIQA) methods are in great demand. In this paper, we propose a parallel multi-scale feature extraction co...
详细信息
ISBN:
(纸本)9781665475921
With the rapid development of 3D technologies, effective no-reference stereoscopic image quality assessment (NR-SIQA) methods are in great demand. In this paper, we propose a parallel multi-scale feature extraction convolution neural network (CNN) model combined with novel binocular feature interaction consistent with human visual system (HVS). In order to simulate the characteristics of HVS sensing multi-scale information at the same time, parallel multi-scale feature extraction module (PMSFM) followed by compensation information is proposed. And modified convolutional block attention module (MCBAM) with less computational complexity is designed to generate visual attention maps for the multi-scale features extracted by the PMSFM. In addition, we employ cross-stacked strategy for multi-level binocular fusion maps and binocular disparity maps to simulate the hierarchical perception characteristics of HVS. Experimental results show that our method is superior to the state-of-the-art metrics and achieves an excellent performance.
image dehazing plays a crucial role in autonomous driving and outdoor surveillance. However, as haze affects different components of an image in various ways and degrees, existing methods treat the image as a singular...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
image dehazing plays a crucial role in autonomous driving and outdoor surveillance. However, as haze affects different components of an image in various ways and degrees, existing methods treat the image as a singular input and overlook the need to decouple different components, leading to mutual interference during the enhancement of each component. Consequently, issues such as insufficient color restoration or blurred edges may arise. In this paper, we introduce a novel tri-branch network for Single image Dehazing that independently extracts low-frequency, high-frequency, and semantic information from images using three distinct sub-networks. A meticulously designed fusion network is then employed to integrate the information from these three branches to produce the final dehazed image. To facilitate the training of such a complex network, we propose a two-stage training approach. Experimental results demonstrate that our approach achieves state-of-the-art (SOTA) performance.
Single image desnowing is an important and challenge task for lots of computer vision applications, such as visual tracking and video surveillance. Although existing deep learning-based methods have achieved promising...
详细信息
ISBN:
(纸本)9781665475921
Single image desnowing is an important and challenge task for lots of computer vision applications, such as visual tracking and video surveillance. Although existing deep learning-based methods have achieved promising results, most of them rely on the local deep features and neglect global relationship information between the local regions. Therefore, inevitably leading to over-smooth or detail loss results. To solve this issue, we design a UNet-based end-to-end architecture for image desnowing. Specially, to better characterize global information and preserve image detail, we combine Window-based Self-Attention (WSA) transformer block with Residue Spatial Attention (RSA) to build basic unit of our network. Besides, to protect the structure of the image effectively, we also introduce a Residue Channel (RC) loss to guide high-quality image restoration. Extensive experimental results on both synthetic and real-world datasets demonstrate that the proposed model achieves new state-of-the-art results.
In this study, the alignment of video-text and image-text datasets is studied. Firstly, similarities are calculated over the texts in the two data sets. A retrieval setup with visual similarities is then applied to th...
详细信息
ISBN:
(纸本)9798350343557
In this study, the alignment of video-text and image-text datasets is studied. Firstly, similarities are calculated over the texts in the two data sets. A retrieval setup with visual similarities is then applied to the subset which is created via calculated text similarities. A BERT-based embedding vector method is applied to the raw and pure texts. As a visual feature, object-based and CLIP-based methods are used to define video frames. According to the results, alignment with CLIP features achieves the best results in the subset created by filtering using raw text.
Counterfeit medicines present a severe public health threat, especially in low-resource countries where consumers lack reliable means to verify the medicines they purchase. visual inspection of medicine packaging imag...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Counterfeit medicines present a severe public health threat, especially in low-resource countries where consumers lack reliable means to verify the medicines they purchase. visual inspection of medicine packaging images through keypoint matching techniques offers a promising approach for detecting design inconsistencies that could indicate counterfeit products. However, conventional methods often struggle with high computational costs and reduced accuracy when processingimages of varying quality and perspectives. To address these limitations, we propose the Angle and Scale Voting (ASVote) method, which enhances keypoint-based image matching by introducing a 2D voting mechanism that leverages relative angles and scales of the keypoints to eliminate false matches(outliers) while identifying consistent matches (inliers). This approach significantly improves both processing time and accuracy. Experiments on a real-world dataset of medicine packages show that ASVote improves processing time and accuracy, outperforming conventional methods.
Recently, the pre-processed video transcoding has attracted wide attention and has been increasingly used in practical applications for improving the perceptual experience and saving transmission resources. However, v...
详细信息
ISBN:
(纸本)9781728185514
Recently, the pre-processed video transcoding has attracted wide attention and has been increasingly used in practical applications for improving the perceptual experience and saving transmission resources. However, very few works have been conducted to evaluate the performance of pre-processing methods. In this paper, we select the source (SRC) videos and various pre-processing approaches to construct the first Pre-processed and Transcoded Video Database (PTVD). Then, we conduct the subjective experiment, showing that compared with the video sent to the codec directly at the same bitrate, the appropriate pre-processing methods indeed improve the perceptual quality. Finally, existing image/video quality metrics are evaluated on our database. The results indicate that the performance of the existing image/video quality assessment (IQA/VQA) approaches remain to be improved. We will make our database publicly available soon.
Complexity scalable algorithms are capable of trading resource usage for output quality in a near-optimal way. We present a complexity scalable motion estimation algorithm based on the 3-D recursive search block match...
详细信息
ISBN:
(纸本)0819444111
Complexity scalable algorithms are capable of trading resource usage for output quality in a near-optimal way. We present a complexity scalable motion estimation algorithm based on the 3-D recursive search block matcher. We introduce data prioritizing as a new approach to scalability. With this approach, we achieve a near-constant complexity and a continuous quality-resource distribution. While maintaining acceptable quality, it is possible to vary the resource usage from below 1 match-error calculation per block on the average to more than 5 match-error calculations per block on the average.
image registration among multimodality has received increasing attention in the scope of computer vision and computational photography nowadays. However, the nonlinear intensity variations prohibit the accurate featur...
详细信息
ISBN:
(纸本)9781728185514
image registration among multimodality has received increasing attention in the scope of computer vision and computational photography nowadays. However, the nonlinear intensity variations prohibit the accurate feature points matching between modal-different image pairs. Thus, a robust image descriptor for multi-modal image registration is proposed, named shearlet-based modality robust descriptor(SMRD). The anisotropic feature of edge and texture information in multi-scale is encoded to describe the region around a point of interest based on discrete shearlet transform. We conducted the experiments to verify the proposed SMRD compared with several state-of-the-art multi-modal/multispectral descriptors on four different multi-modal datasets. The experimental results showed that our SMRD achieves superior performance than other methods in terms of precision, recall and F1-score.
The increasing demand for high-quality, real-time visual communication and the growing user expectations, coupled with limited network resources, necessitate novel approaches to semantic image communication. This pape...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
The increasing demand for high-quality, real-time visual communication and the growing user expectations, coupled with limited network resources, necessitate novel approaches to semantic image communication. This paper presents a method to enhance semantic image communication that combines a novel lossy semantic encoding approach with spatially adaptive semantic image synthesis models. By developing a model-agnostic training augmentation strategy, our approach substantially reduces susceptibility to distortion introduced during encoding, effectively eliminating the need for lossless semantic encoding. Comprehensive evaluation across two spatially adaptive conditioning methods and three popular datasets indicates that this approach enhances semantic image communication at very low bit rate regimes.
暂无评论