When transmission medium and compression degradation are intertwined, new challenges emerge. This study addresses the problem of raindrop removal from compressed images, where raindrops obscure large areas of the back...
详细信息
When transmission medium and compression degradation are intertwined, new challenges emerge. This study addresses the problem of raindrop removal from compressed images, where raindrops obscure large areas of the background and compression leads to the loss of high-frequency (HF) information. The restoration of the former requires global contextual information, while the latter necessitates guidance for high-frequency details, resulting in a conflict in utilizing these two types of information when designing existing methods. To address this issue, we propose a novel transformer architecture that leverages the advantages of attention mechanism and HF-friendly design to effectively restore the compressed raindrop images at the framework, component, and module levels. Specifically, at the framework level, we integrate relative position multi-head self-attention and convolutional layers into the proposed low-high-frequency transformer (LHFT), where the former captures global contextual information and the latter focuses on high-frequency information. Their combination effectively resolves the issue of mixed degradation. At the component level, we utilize high-frequency depth-wise convolution (HFDC) with zero-mean kernels to improve the capability to extract high-frequency features, drawing inspiration from typical high-frequency filters like Prewitt and Sobel operators. Finally, at the module level, we introduce a low-high-attention module (LHAM) to adaptively allocate the importance of low and high frequencies along channels for effective fusion. We establish the JPEG-compressed raindrop image dataset and conduct extensive experiments on different compression rates. Experimental results demonstrate that the proposed method outperforms state-of-the-art methods without increasing computational costs.
Tampered images can easily be used for illegal activities, such as spreading rumors, economic fraud, fabricating false news, and illegally obtaining experience benefits, etc. With the improvement and development of ar...
详细信息
Tampered images can easily be used for illegal activities, such as spreading rumors, economic fraud, fabricating false news, and illegally obtaining experience benefits, etc. With the improvement and development of artificial intelligence (AI), image manipulation technology has also been further improved, more and more retouching software in daily life adopts AI technology. So far, there is no AI-based tampered dataset. To address this challenge, we propose a dataset-IPM15K. It utilizes the most advanced image processing technology and contains a total of 150,00 doctored vital images. This dataset also could serve as a catalyst for progressing many vision tasks, e.g., localization, segmentation, and alpha-matting, etc. Additionally, we propose an effective multi-feature fusion identification network (MFI-Net) to identify these challenging images. Our model consists of four modules: the detail extraction module (DEM), which utilizes different sizes of convolutions and perceptual fields to extract more valuable information of tampered locations;the multi-branch attention fusion module (MAFM), which fully exploits contextual information of different levels to capture subtle traces of tampering;the feature decoder component (FDC), which combines fused features to identify tampered regions;and the detail enhancement block (DEB), which continues to supplement the detailed information of the detected regions. Extensive experiments on three public datasets and the proposed dataset show that MFI-Net outperforms various state-of-the-art (SOTA) manipulation detection baselines.
In low-bitrate audio coding, modern coders often rely on efficient parametric techniques to enhance the performance of the waveform preserving transform coder core. While the latter features well-known perceptually ad...
详细信息
In low-bitrate audio coding, modern coders often rely on efficient parametric techniques to enhance the performance of the waveform preserving transform coder core. While the latter features well-known perceptually adapted quantization of spectral coefficients, parametric techniques reconstruct the signal parts that have been quantized to zero by the encoder to meet the low-bitrate constraint. Large numbers of zeroed spectral values and especially consecutive zeros constituting gaps often lead to audible artifacts at the decoder. To avoid such artifacts the new 3GPP Enhanced Voice Services (EVS) coding standard utilizes noise filling and intelligent gap filling (IGF) techniques, guided by spectral envelope information. In this paper the underlying considerations of the parametric energy adjustment and transmission in EVS and its relation to noise filling, IGF, and tonality preservation are presented. It is further shown that complex-valued IGF envelope calculation in the encoder improves the temporal energy stability of some signals while retaining real-valued decoder-side processing.
Existing deep learning-based steganography detection methods utilize convolution to automatically capture and learn steganographic features, yielding higher detection efficiency compared to manually designed steganogr...
详细信息
Existing deep learning-based steganography detection methods utilize convolution to automatically capture and learn steganographic features, yielding higher detection efficiency compared to manually designed steganography detection methods. Detection methods based on convolutional neural network frameworks can extract global features by increasing the network's depth and width. These frameworks are not highly sensitive to global features and can lead to significant resource consumption. This manuscript proposes a lightweight steganography detection method based on multiple residual structures and transformer(Res Former). A multi-residuals block based on channel rearrangement is designed in the preprocessing layer. Multiple residuals are used to enrich the residual features and channel shuffle is used to enhance the feature representation capability. A lightweight convolutional and transformer feature extraction backbone is constructed, which reduces the computational and parameter complexity of the network by employing depth-wise separable convolutions. This backbone integrates local and global image features through the fusion of convolutional layers and transformer, enhancing the network's ability to learn global features and effectively enriching feature diversity. An effective weighted loss function is introduced for learning both local and global features, Bias Loss loss function is used to give full play to the role of feature diversity in classification, and cross-entropy loss function and contrast loss function are organically combined to enhance the expression ability of features. Based on Boss Base-1.01, BOWS2 and ALASKA#2, extensive experiments are conducted on the stego images generated by spatial and JPEG domain adaptive steganographic algorithms, employing both classical and state-of-theart steganalysis techniques. The experimental results demonstrate that compared to the SRM, SRNet, Sia Steg Net,CSANet, LWENet, and Sia IRNet methods, the proposed Res
Current video coding standards, including H.264/AVC, HEVC, and VVC, utilize discrete cosine transform (DCT), discrete sine transform (DST), to decorrelate the intra-prediction residuals. However, these transforms ofte...
详细信息
ISBN:
(纸本)9781510679344;9781510679351
Current video coding standards, including H.264/AVC, HEVC, and VVC, utilize discrete cosine transform (DCT), discrete sine transform (DST), to decorrelate the intra-prediction residuals. However, these transforms often face challenges in effectively decorrelating signals with complex, non-smooth, and non-periodic structures. Even in smooth areas, an abrupt transition (due to noise or prediction artifacts) can limit their effectiveness. This paper presents a novel block-adaptive separable path graph-based transform (GBT) that is particularly adept at handling such signals. This new method focuses on adaptively modifying the block size and learning GBT to enhance the performance. The GBT is learned in an online scenario using sequential K-means clustering, where each available block size has K clusters and K GBT kernels. This approach allows the GBT to be dynamically learned for the current block based on previously reconstructed areas with same block size and similar characteristics. Our evaluation, integrating this method with H.264/AVC intra-coding tools, shows significant improvement over the traditional H.264/AVC DCT in processing high-resolution natural images.
Iris recognition system for identity authentication and verification is one of the most precise and accepted biometrics in the world. Portable iris system mostly used in law enforcement applications, has been increasi...
详细信息
Iris recognition system for identity authentication and verification is one of the most precise and accepted biometrics in the world. Portable iris system mostly used in law enforcement applications, has been increasing more rapidly. The portable device, however, requires a narrow-bandwidth communication channel to transmit iris code or iris image. Though a full resolution of iris image is preferred for accurate recognition of individual, to minimize time in a narrow-bandwidth channel for emergency identification, image compression should be used to minimize the size of image. This paper has investigated the effects of compression particularly for iris image based on wavelet transformed image, using Spatial-orientation tree wavelet (STW), Embedded Zero tree Wavelet (EZW) and Set Partitioning in hierarchical trees (SPIHT), to identify the most suitable image compression. In this paper, Haar wavelet transform is utilized for image compression and image decomposition, by varying the decomposition level. The results have been examined in terms of Peak signal to noise ratio (PSNR), Mean square Error (MSE), Bit per Pixel Ratio (BPP) and Compression ratio (CR). It has been evidently found that wavelet transform is more effective in the image compression, as recognition performance is minimally affected and the use of Haar transform is ideally suited. CASIA, MMU iris database have been used for this purpose.
Deep neural networks (DNNs) have shown great potential in no-reference image quality assessment (NR-IQA). However, the annotation of NR-IQA is labor-intensive and time-consuming, which severely limits its application,...
详细信息
Deep neural networks (DNNs) have shown great potential in no-reference image quality assessment (NR-IQA). However, the annotation of NR-IQA is labor-intensive and time-consuming, which severely limits its application, especially for authentic images. To relieve the dependence on quality annotation, some works have applied unsupervised domain adaptation (UDA) to NR-IQA. However, the above methods ignore the fact that the alignment space used in classification is sub-optimal, since the space is not elaborately designed for perception. To solve this challenge, we propose an effective perception-oriented unsupervised domain adaptation method StyleAM (Style Alignment and Mixup) for NR-IQA, which transfers sufficient knowledge from label-rich source domain data to label-free target domain images. Specifically, we find a more compact and reliable space i.e., feature style space for perception-oriented UDA based on an interesting observation, that the feature style (i.e., the mean and variance) of the deep layer in DNNs is exactly associated with the quality score in NR-IQA. Therefore, we propose to align the source and target domains in a more perceptual-oriented space i.e., the feature style space, to reduce the intervention from other quality-irrelevant feature factors. Furthermore, to increase the consistency (i.e., ordinal/continuous characteristics) between quality score and its feature style, we also propose a novel feature augmentation strategy Style Mixup, which mixes the feature styles (i.e., the mean and variance) before the last layer of DNNs together with mixing their labels. Extensive experimental results on many cross-domain settings (e.g., synthetic to authentic, and multiple distortions to one distortion) have demonstrated the effectiveness of our proposed StyleAM on NR-IQA.
Recent advances in Synthetic Aperture Radar (SAR) sensors and innovative advanced imagery techniques have enabled SAR systems to acquire very high-resolution images with wide swaths, large bandwidth and in multiple po...
详细信息
Recent advances in Synthetic Aperture Radar (SAR) sensors and innovative advanced imagery techniques have enabled SAR systems to acquire very high-resolution images with wide swaths, large bandwidth and in multiple polarization channels. The improvements of the SAR system capabilities also imply a significant increase in SAR data acquisition rates, such that efficient and effective compression methods become necessary. The compression of SAR raw data plays a crucial role in addressing the challenges posed by downlink and memory limitations onboard the SAR satellites and directly affects the quality of the generated SAR image. Neural data compression techniques using deep models have attracted many interests for natural image compression tasks and demonstrated promising results. In this study, neural data compression is extended into the complex domain to develop a Complex-Valued (CV) autoencoder-based data compression for SAR raw data. To this end, the basic fundamentals of data compression and Rate-Distortion (RD) theory are reviewed, well known data compression methods, Block Adaptive Quantization (BAQ) and JPEG2000 methods, are implemented and tested for SAR raw data compression, and a neural data compression based on CV autoencoders is developed for SAR raw data. Furthermore, since the available Sentinel-1 SAR raw products are already compressed with Flexible Dynamic BAQ (FDBAQ), an adaptation procedure applied to the decoded SAR raw data to generate SAR raw data with quasi-uniform quantization that resemble the statistics of the uncompressed SAR raw data onboard the satellites.
Semantic communication (SC) is an emerging communication paradigm that transmits only task-related semantic features to receivers, offering advantages in speed. However, existing robust steganography cannot extract me...
详细信息
Semantic communication (SC) is an emerging communication paradigm that transmits only task-related semantic features to receivers, offering advantages in speed. However, existing robust steganography cannot extract message correctly after SC. To address this issues, we propose a novel steganography framework based on Generating Adversarial Networks (GANs) for SC, called "Image Semantic Steganography". Our framework embeds message into semantic features to guarantee extraction while considering both pixel-level and semantic-level distortions to enhance security. Experimental results show that our framework not only achieves message extraction successfully and behavioral covertness during and after SC, but also does not impact the implementation of SC.
We propose DeepPCC, an end-to-end learning-based approach for the lossy compression of large-scale object point clouds. For both geometry and attribute components, we introduce the Multiscale Neighborhood Information ...
详细信息
We propose DeepPCC, an end-to-end learning-based approach for the lossy compression of large-scale object point clouds. For both geometry and attribute components, we introduce the Multiscale Neighborhood Information Aggregation (NIA) mechanism, which applies resolution downscaling progressively (i.e., dyadic downsampling of geometry and average pooling of attribute) and combines sparse convolution and local self-attention at each resolution scale for effective feature representation. Under a simple autoencoder structure, scale-wise NIA blocks are stacked as the analysis and synthesis transform in the encoder-decoder pair to best characterize spatial neighbors for accurate approximation of geometry occupancy probability and attribute intensity. Experiments demonstrate that DeepPCC remarkably outperforms state-of-the-art rules-based MPEG G-PCC and learning-based solutions both quantitatively and qualitatively, providing strong evidence that DeepPCC is a promising solution for emerging AI-based PCC.
暂无评论