This paper used Time-Frequency Analysis (TFA) techniques for signal processing on tasks of computer vision. Our main idea is as follows: To build a simple network architecture without two or more convolutional neural ...
详细信息
ISBN:
(纸本)9781665475921
This paper used Time-Frequency Analysis (TFA) techniques for signal processing on tasks of computer vision. Our main idea is as follows: To build a simple network architecture without two or more convolutional neural networks (CNNs), analyze hidden features by Discrete Wavelet Transform (DWT), and send them into filters as weights by convolutions, transformers or other methods. And we do not need to build the network with 2 or more stages to accomplish this idea. Actually, we try to directly use TFA skills on CNN to build one-stage network. Networks which build by this way not only keep their outstanding performance, but also cost lower computing resources. In this paper, we mainly use DWT on CNN to solve image inpainting problems. And the results show that our model can work stably in frequency domain to realize free-form image inpainting.
With the emergence of various machine-to-machine and machine-to-human tasks with deep learning, the amount of deep feature data is increasing. Deep product quantization is widely applied in deep feature retrieval task...
详细信息
ISBN:
(纸本)9781728185514
With the emergence of various machine-to-machine and machine-to-human tasks with deep learning, the amount of deep feature data is increasing. Deep product quantization is widely applied in deep feature retrieval tasks and has achieved good accuracy. However, it does not focus on the compression target primarily, and its output is a fixed-length quantization index, which is not suitable for subsequent compression. In this paper, we propose an entropy-based deep product quantization algorithm for deep feature compression. Firstly, it introduces entropy into hard and soft quantization strategies, which can adapt to the codebook optimization and codeword determination operations in the training and testing processes respectively. Secondly, the loss functions related to entropy are designed to adjust the distribution of quantization index, so that it can accommodate to the subsequent entropy coding module. Experimental results carried on retrieval tasks show that the proposed method can be generally combined with deep product quantization and its extended schemes, and can achieve a better compression performance under near lossless condition.
Compressed video is very sensitive to channel errors. A few bit losses can derail the entire decoding process. Thus, protecting compressed video is imperative to enable visualcommunications. Since different elements ...
详细信息
ISBN:
(纸本)9780819466211
Compressed video is very sensitive to channel errors. A few bit losses can derail the entire decoding process. Thus, protecting compressed video is imperative to enable visualcommunications. Since different elements in a compressed video stream vary in their impact on the quality of the decoded video, unequal error protection can be used to provide efficient protection. This paper describes an unequal error protection method for protecting data elements in a video stream, via a Wyner-Ziv encoder that consists of a coarse quantizer and a Turbo coder based lossless Slepian-wolf encoder. Data elements that significantly impact the visual quality of decoded video, such as modes and motion vectors as used by H.264, are provided more parity bits than coarsely quantized transform coefficients. This results in an improvement in the quality of the decoded video when the transmitted sequence is corrupted by transmission errors, than obtained by the use of equal error protection.
Depth estimation of light field images is a crucial technique in various applications, including 3D reconstruction, autonomous driving, and object tracking. However, current deep-learning methods ignore the geometric ...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Depth estimation of light field images is a crucial technique in various applications, including 3D reconstruction, autonomous driving, and object tracking. However, current deep-learning methods ignore the geometric information of the light field image and are limited to learning repetitive textures, which leads to inaccurate estimates of depth. The paper proposes a light field depth estimation network that fuses multi-scale semantic information with geometric information to address the problem of non-adaptation for repeated texture regions. The main focus of the network is the semantic and geometric information fusion (SGI) module, which can adaptively combine semantic and geometric information to improve the efficiency of cost aggregation. Furthermore, SGI module establishes a direct link between feature extraction and cost aggregation, providing feedback for feature extraction and guiding more efficient feature extraction. The experimental results from the optical field synthesis dataset HCI 4D demonstrate that the method has high accuracy and generalisation performance.
Recently, there has been an increasing interest in the design of very fast wavelet image encoders focused on applications (interactive real-time image&video applications, GIS systems, etc) and devices (digital cam...
详细信息
ISBN:
(纸本)9780819466211
Recently, there has been an increasing interest in the design of very fast wavelet image encoders focused on applications (interactive real-time image&video applications, GIS systems, etc) and devices (digital cameras, mobile phones, PDAs, etc) where coding delay and/or available computing resources (working memory and power processing) are critical for proper operation. Most of these fast wavelet image encoders are non-embedded in order to reduce complexity, so no rate control tools are available for scalable coding applications. In this work, we analyze the impact of simple rate control tools for these encoders in order to determine if the inclusion of rate control functionality is worth enough with respect to popular embedded encoders like SPIHT and JPEG2000. We perform the study by adding rate control to the non-embedded LTW encoder, showing that the increase in complexity still maintains LTW competitive with respect SPIHT and JPEG2000 in terms of R/D performance, coding delay and memory consumption.
Recent advancements in neural compression have surpassed traditional codecs in PSNR and MS-SSIM measurements. However, at low bit-rates, these methods can introduce visually displeasing artifacts, such as blurring, co...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Recent advancements in neural compression have surpassed traditional codecs in PSNR and MS-SSIM measurements. However, at low bit-rates, these methods can introduce visually displeasing artifacts, such as blurring, color shifting, and texture loss, thereby compromising perceptual quality of images. To address these issues, this study presents an enhanced neural compression method designed for optimal visual fidelity. We have trained our model with a sophisticated semantic ensemble loss, integrating Charbonnier loss, perceptual loss, style loss, and a non-binary adversarial loss, to enhance the perceptual quality of image reconstructions. Additionally, we have implemented a latent refinement process to generate content-aware latent codes. These codes adhere to bit-rate constraints, and prioritize bit allocation to regions of greater importance. Our empirical findings demonstrate that this approach significantly improves the statistical fidelity of neural image compression.
Synthetic DNA has received much attention recently as a long-term archival medium alternative due to its high density and durability characteristics. However, most current work has primarily focused on using DNA as a ...
详细信息
ISBN:
(纸本)9781728185514
Synthetic DNA has received much attention recently as a long-term archival medium alternative due to its high density and durability characteristics. However, most current work has primarily focused on using DNA as a precise storage medium. In this work, we take an alternate view of DNA. Using neural-network-based compression techniques, we transform images into a latent-space representation, which we then store on DNA. By doing so, we transform DNA into an approximate image storage medium, as images generated back from DNA are only approximate representations of the original images. Using several datasets, we investigate the storage benefits of approximation, and study the impact of DNA storage errors (substitutions, indels, bias) on the quality of approximation. In doing so, we demonstrate the feasibility and potential of viewing DNA as an approximate storage medium.
In this paper, we investigate spatial and temporal models for texture analysis and synthesis. The goal is to use these models to increase the coding efficiency for video sequences containing textures. The models are u...
详细信息
ISBN:
(纸本)9780819466211
In this paper, we investigate spatial and temporal models for texture analysis and synthesis. The goal is to use these models to increase the coding efficiency for video sequences containing textures. The models are used to segment texture regions in a frame at the encoder and synthesize the textures at the decoder. These methods can be incorporated into a conventional video coder (e.g. H.264) where the regions to be modeled by the textures are not coded in a usual manner but texture model parameters are sent to the decoder as side information. We showed that this approach can reduce the data rate by as much as 15%.
The impact of using different lossy compression algorithms on the matching accuracy of fingerprint and face recognition systems is investigated. In particular, we relate rate-distortion performance as measured in PSNR...
详细信息
ISBN:
(纸本)9780819466211
The impact of using different lossy compression algorithms on the matching accuracy of fingerprint and face recognition systems is investigated. In particular, we relate rate-distortion performance as measured in PSNR to the matching scores as obtained by the recognition systems. JPEG2000 and SPIHT are correctly predicted by PSNR to be the most suited compression algorithms to be used in fingerprint and face recognition systems. Fractal compression is identified to be least suited for the use in the investigated recognition systems, although PSNR suggests JPEG to deliver worse recognition results in the case of face imagery. JPEG compression performs surprisingly well at high bitrates in face recognition systems, given the low PSNR performance observed.
We have witnessed the rapid development of learned image compression (LIC). The latest LIC models have outperformed almost all traditional image compression standards in terms of rate-distortion (RD) performance. Howe...
详细信息
ISBN:
(纸本)9781728185514
We have witnessed the rapid development of learned image compression (LIC). The latest LIC models have outperformed almost all traditional image compression standards in terms of rate-distortion (RD) performance. However, the time complexity of LIC model is still underdiscovered, limiting the practical applications in industry. Even with the acceleration of GPU, LIC models still struggle with long coding time, especially on the decoder side. In this paper, we analyze and test a few prevailing and representative LIC models, and compare their complexity with traditional codecs including H.265/HEVC intra and H.266/VVC intra. We provide a comprehensive analysis on every module in the LIC models, and investigate how bitrate changes affect coding time. We observe that the time complexity bottleneck mainly exists in entropy coding and context modelling. Although this paper pay more attention to experimental statistics, our analysis reveals some insights for further acceleration of LIC model, such as model modification for parallel computing, model pruning and a more parallel context model.
暂无评论