We propose DeepPCC, an end-to-end learning-based approach for the lossy compression of large-scale object point clouds. For both geometry and attribute components, we introduce the Multiscale Neighborhood Information ...
详细信息
We propose DeepPCC, an end-to-end learning-based approach for the lossy compression of large-scale object point clouds. For both geometry and attribute components, we introduce the Multiscale Neighborhood Information Aggregation (NIA) mechanism, which applies resolution downscaling progressively (i.e., dyadic downsampling of geometry and average pooling of attribute) and combines sparse convolution and local self-attention at each resolution scale for effective feature representation. Under a simple autoencoder structure, scale-wise NIA blocks are stacked as the analysis and synthesis transform in the encoder-decoder pair to best characterize spatial neighbors for accurate approximation of geometry occupancy probability and attribute intensity. Experiments demonstrate that DeepPCC remarkably outperforms state-of-the-art rules-based MPEG G-PCC and learning-based solutions both quantitatively and qualitatively, providing strong evidence that DeepPCC is a promising solution for emerging AI-based PCC.
Lossless compression of remote sensing images is critically important for minimizing storage requirements while preserving the complete integrity of the data. The main challenge in lossless compression lies in strikin...
详细信息
Lossless compression of remote sensing images is critically important for minimizing storage requirements while preserving the complete integrity of the data. The main challenge in lossless compression lies in striking a good balance between reasonable compression durations and high compression ratios. In this article, we introduce an innovative lossless compression framework that uniquely utilizes lossy compression data as prior knowledge to enhance the compression process. Our framework employs a checkerboard segmentation technique to divides the original remote sensing image into various subimages. The main diagonal subimages are compressed using a traditional lossy method to obtain prior knowledge for facilitating the compression of all subimages. These subimages are then subjected to lossless compression using our newly developed lossy prior probability prediction network (LP3Net) and arithmetic coding in a specific order. The proposed LP3Net is an advanced network architecture, consisting of an image preprocessing module, a channel enhancement module, and a pixel probability transformer module, to learn the discrete probability distribution of each pixel within every subimage, enhancing the accuracy and efficiency of the compression process. Experiments on high-resolution remote sensing image datasets demonstrate the effectiveness and efficiency of the proposed LP3Net and lossless compression framework, achieving a minimum of 4.57% improvement over traditional compression methods and 1.86% improvement over deep learning-based compression methods.
In this paper, a high efficient arbitrary transform blocks hardware allocation framework is proposed, which can adapt the prediction tree structure and then improve the utilization ratio as well as a parallel hardware...
详细信息
In this paper, a high efficient arbitrary transform blocks hardware allocation framework is proposed, which can adapt the prediction tree structure and then improve the utilization ratio as well as a parallel hardware design to improve the data throughput. This method will configure an appropriate combination of five different size inverse transform units: Fast IDST, 4×4 IDCT, 8×8 IDCT, 16×16 IDCT and 32×32 IDCT. If the input video stream changed, it will reconfigure the combination and allocated the hardware resources to retain a high utilization ratio in hardware framework. Experiments show that the performance of the proposed method is improved from 48.8% to 96.2% under various conditions. The proposed method can enhance the efficiency of H.265 decoder.
Existing JPEG encryption approaches pose a security risk due to the difficulty in changing all block-feature values while considering format compatibility and file size expansion. To address these concerns, this paper...
详细信息
Existing JPEG encryption approaches pose a security risk due to the difficulty in changing all block-feature values while considering format compatibility and file size expansion. To address these concerns, this paper introduces a novel JPEG image encryption scheme. First, the security of sketch information against chosen-plaintext attacks is improved by increasing the change rate of block-feature values. Second, a classification global permutation approach is designed to encrypt the undivided run/size, value (RSV)-based AC groups to achieve larger changes in the block-feature values. Third, to reduce file size expansion while maintaining format compatibility, the DC coefficients are rotated based on the mapped DC differences in the same category, and the nonzero AC coefficients are mapped in the same category. Extensive experiments demonstrate that the proposed algorithm is superior to existing schemes in terms of security. Notably, the average change rate of block-feature values is increased by at least 20%. Furthermore, the proposed scheme reduces the file size by an average of 2.036% compared to existing JPEG image encryption methods.
With wide applications of image editing tools, forged images (splicing, copy-move, removal and etc.) have been becoming great public concerns. Although existing image forgery localization methods could achieve fairly ...
详细信息
With wide applications of image editing tools, forged images (splicing, copy-move, removal and etc.) have been becoming great public concerns. Although existing image forgery localization methods could achieve fairly good results on several public datasets, most of them perform poorly when the forged images are JPEG compressed as they are usually done in social networks. To tackle this issue, in this paper, a self-supervised domain adaptation network, which is composed of a backbone network with Siamese architecture and a compression approximation network (ComNet), is proposed for JPEG-resistant image forgery detection and localization. To improve the performance against JPEG compression, ComNet is customized to approximate the JPEG compression operation through self-supervised learning, generating JPEG-agent images with general JPEG compression characteristics. The backbone network is then trained with domain adaptation strategy to localize the tampering boundary and region, and alleviate the domain shift between uncompressed and JPEG-agent images. Extensive experimental results on several public datasets show that the proposed method outperforms or rivals to other state-of-the-art methods in image forgery detection and localization, especially for JPEG compression with unknown QFs.
The demand for interpretable models has driven the exploration of explainable approaches grounded in human-friendly case-based reasoning. Among these approaches, prototype-based methods have proven effective in perfor...
详细信息
The demand for interpretable models has driven the exploration of explainable approaches grounded in human-friendly case-based reasoning. Among these approaches, prototype-based methods have proven effective in performing case-based reasoning by utilizing prototypes and similarity scores. However, their interpretability is affected by degraded similarity in the input space and latent space. This semantic gap leads to inconsistent explanation for images that are perceived to be similar, which undermines the reliability of the explanation. In this paper, we propose a distributional embedding framework in which the embedding is randomly sampled from a parameterized distribution in a regularized latent space. With a simple modification, our method significantly improves the reliability of the model's explanation by bridging the gap between similarity in human perception and explanation. To demonstrate this, we conduct experiments ranging from small-scale scenarios to direct explanation regarding similarity. Extensive comparisons with a real-world dataset and multiple backbone networks showcase the usability and efficacy of the proposed framework.
Recently, learning-based light field (LF) image compression methods have achieved impressive progress, while end-to-end spatially scalable LF image compression (SS-LFIC) has not been explored. To tackle this problem, ...
详细信息
Recently, learning-based light field (LF) image compression methods have achieved impressive progress, while end-to-end spatially scalable LF image compression (SS-LFIC) has not been explored. To tackle this problem, this paper proposes an end-to-end spatially scalable LF compression network (SSLFC-Net). In the SSLFC-Net, a spatial-angular domain-specific enhancement layer coding strategy is designed to boost the coding performance of the enhancement layers (ELs). Specifically, by referencing domain-specific features, the ELs compress spatial features by predictive coding in the spatial domain to effectively remove inter-layer spatial redundancy, and reconstruct angular features by decoder-side generative method in the angular domain to strategically avoid angular compression. Particularly, to produce accurate spatial predictions and reconstruct high-quality LF images, an inter-layer spatial prediction module and a spatial-angular context-aware reconstruction module are presented to collaboratively promote EL compression. Experiments show that the proposed SSLFC-Net effectively supports spatial scalability and achieves state-of-the-art rate-distortion performance.
This article develops a Scalable Point Cloud Attribute Compression solution, termed ScalablePCAC. In a two-layer example, ScalablePCAC uses the standard G-PCC at the base layer to directly encode the thumbnail point c...
详细信息
This article develops a Scalable Point Cloud Attribute Compression solution, termed ScalablePCAC. In a two-layer example, ScalablePCAC uses the standard G-PCC at the base layer to directly encode the thumbnail point cloud that is downscaled from the original input, and a learning-based model at the enhancement layer to compress and restore the full-resolution input point cloud conditioned on the base layer reconstruction. As such, the base layer provides a coarse reconstruction of the input point cloud and the enhancement layer further improves the quality. We then adopt a cross-layer rate allocation strategy that flexibly determines the resolution downscaling factor, the quantization parameter of the base layer, and the quality controlling factor of the enhancement layer to adapt the bitrate of the two layers for approximately optimal Rate-Distortion (R-D) performance. We conduct extensive experiments on popular point clouds following the MPEG common test conditions. Results demonstrate that the proposed ScalablePCAC achieves >10% BD-BR reduction against the latest G-PCC version 22 (TMC13v22) on the Y component;it also significantly outperforms existing learning-based solutions for point cloud attribute compression, e.g., compared with a recent work showing state-of-the-art performance, it achieves >20% BD- BR reduction.
Crypto-space reversible data hiding (RDH) has emerged as an effective technique for transmitting secret information over the Internet. However, most existing schemes are designed for uncompressed images, while almost ...
详细信息
Crypto-space reversible data hiding (RDH) has emerged as an effective technique for transmitting secret information over the Internet. However, most existing schemes are designed for uncompressed images, while almost all images are processed and transmitted in compressed formats. There is an urgent need to develop methods for compressed images, such as joint photographic experts group (JPEG). In this article, we propose an RDH in encrypted JPEG images, where the bitstreams of alternating current (AC) coefficients and the secret data are mapped to numbers over Galois field. The obtained numbers are then utilized to conduct a polynomial for secret sharing. By reproduction into secret shares, the AC coefficients and the secret data are secured. In addition, a block sorting strategy is used to reduce image distortion under low data payload. Experimental results demonstrate that the proposed scheme outperforms state-of-the-art methods in embedding capacity while preserving the file size and conforming to the JPEG format.
Multimedia file fragment classification (MFFC) aims to identify file fragment types, e.g., image/video, audio, and text without system metadata. It is of vital importance in multimedia storage and communication. Exist...
详细信息
Multimedia file fragment classification (MFFC) aims to identify file fragment types, e.g., image/video, audio, and text without system metadata. It is of vital importance in multimedia storage and communication. Existing MFFC methods typically treat fragments as 1D byte sequences and emphasize the relations between separate bytes (interbytes) for classification. However, the more informative relations inside bytes (intrabytes) are overlooked and seldom investigated. By looking inside bytes, the bit-level details of file fragments can be accessed, enabling a more accurate classification. Motivated by this, we first propose Byte2Image, a novel visual representation model that incorporates previously overlooked intrabyte information into file fragments and reinterprets these fragments as 2D grayscale images. This model involves a sliding byte window to reveal the intrabyte information and a rowwise stacking of intrabyte n-grams for embedding fragments into a 2D space. Thus, complex interbyte and intrabyte correlations can be mined simultaneously using powerful vision networks. Additionally, we propose an end-to-end dual-branch network ByteNet to enhance robust correlation mining and feature representation. ByteNet makes full use of the raw 1D byte sequence and the converted 2D image through a shallow byte branch feature extraction (BBFE) and a deep image branch feature extraction (IBFE) network. In particular, the BBFE, composed of a single fully-connected layer, adaptively recognizes the co-occurrence of several some specific bytes within the raw byte sequence, while the IBFE, built on a vision transformer, effectively mines the complex interbyte and intrabyte correlations from the converted image. Experiments on the two representative benchmarks, including 14 cases, validate that our proposed method outperforms state-of-the-art approaches on different cases by up to 12.2%.
暂无评论