Light field, as a new data representation format in multimedia, has the ability to capture both intensity and direction of light rays. However, the additional angular information also brings a large volume of data. Cl...
详细信息
ISBN:
(纸本)9781665492584
Light field, as a new data representation format in multimedia, has the ability to capture both intensity and direction of light rays. However, the additional angular information also brings a large volume of data. Classical coding methods are not effective to describe the relationship between different views, leading to redundancy left. To address this problem, we propose a novel light field compression scheme based on implicit neural representation to reduce redundancies between views. We store the information of a light field image implicitly in an neural network and adopt model compression methods to further compress the implicit representation. Extensive experiments have demonstrated the effectiveness of our proposed method, which achieves comparable rate-distortion performance as well as superior perceptual quality over traditional methods.
With the increasing popularity of mobile devices, there are more and more screens with heterogeneous resolutions. In order to solve the mismatching problem of images displaying on different screens, various image reta...
详细信息
ISBN:
(纸本)9781479989591
With the increasing popularity of mobile devices, there are more and more screens with heterogeneous resolutions. In order to solve the mismatching problem of images displaying on different screens, various image retargeting techniques have been proposed. However, little effective objective quality assessment metric for image retargeting has been proposed. In this paper, we propose an objective image retargeting quality assessment method based on Hybrid Distortion Pooled Model (HDPM) considering image local similarity, content information loss and image structural distortion. The proposed HDPM method measures the retargeted image's local similarity based on matching the similar block by Scale-Invariant Features Transform (SIFT) features and computing the corresponding blocks' similarity by structural similarity (SSIM). Furthermore, the image content information loss in retargeted image, which is regarded as the SIFT feature loss, is taken into account. Besides, we also consider image's structural distortion in the proposed method, which is based on GLCM (Gray-level co-occurrence matrix). To evaluate the effectiveness of the proposed method, extensive experiments have been conducted, and the results show improved consistency between the proposed HDPM method and the corresponding subjective evaluations.
To facilitate large-scale deployment of convolutional networks, integer-arithmetic-only inference has been demonstrated effective, which not only reduces computational cost but also ensures cross-platform consistency....
详细信息
To facilitate large-scale deployment of convolutional networks, integer-arithmetic-only inference has been demonstrated effective, which not only reduces computational cost but also ensures cross-platform consistency. However, previous studies on integer networks usually report a decline in the inference accuracy, given the same number of parameters as floating-point-number (FPN) networks. In this paper, we propose to finetune and quantize a well-trained FPN convolutional network to obtain an integer convolutional network. Our key idea is to adjust the upper bound of a bounded rectified linear unit (ReLU), which replaces the normal ReLU and effectively controls the dynamic range of activations. Based on the tradeoff between learning ability and quantization error of networks, we managed to preserve full accuracy after quantization and obtain efficient integer networks. Our experiments on ResNet for image classification demonstrate that our 8-bit integer networks achieve state-of-the-art performance compared with Google's TensorFlow and NVIDIA's TensorRT. Moreover, we experiment on VDSR for image super-resolution and on VRCNN for compression artifact reduction, both of which serve regression tasks that natively require high inference accuracy. Besides ensuring the equivalent performance as the corresponding FPN networks, our integer networks have only 1/4 memory cost and run 2× faster on GPUs.
For current learned image compression methods, padding input images is necessary to meet the resolution requirements of down-sampling layers. However, the impact of padding has not been studied thoroughly. Most previo...
For current learned image compression methods, padding input images is necessary to meet the resolution requirements of down-sampling layers. However, the impact of padding has not been studied thoroughly. Most previous studies ignore padded images in the training process. In this paper, we analyze the impact of padding on compression performance. Then, we propose a padding-aware training (PAT) strategy, handling the padding effect during the training. Specifically, our PAT strategy calculates the loss of pre-padding image through a masking operation. Finally, according to our systematic experimental results, we find that images with different resolutions tend to favor different padding modes. Therefore, we further propose to conduct padding mode decision in the encoding process for rate-distortion optimization. Experiments demonstrate that our proposed PAT strategy and padding mode decision effectively compensate for the performance drop caused by padding.
In video-based point cloud compression (V-PCC), occupancy map video is utilized to indicate whether a 2-D pixel corresponds to a valid 3-D point or not. In the current design of V-PCC, the occupancy map video is direc...
详细信息
ISBN:
(数字)9781728180687
ISBN:
(纸本)9781728180694
In video-based point cloud compression (V-PCC), occupancy map video is utilized to indicate whether a 2-D pixel corresponds to a valid 3-D point or not. In the current design of V-PCC, the occupancy map video is directly compressed losslessly with High Efficiency Video Coding (HEVC). However, the coding tools in HEVC are specifically designed for natural images, thus unsuitable for the occupancy map. In this paper, we present a novel quadtree-based scheme for lossless occupancy map coding. In this scheme, the occupancy map is firstly divided into several coding tree units (CTUs). Then, the CTU is divided into coding units (CUs) recursively using a quadtree. The quadtree partition is terminated when one of the three conditions is satisfied. Firstly, all the pixels have the same value. Secondly, the pixels in the CU only have two kinds of values and they can be separated by a continuous edge whose endpoints lie on the side of the CU. The continuous edge is then coded using chain code. Thirdly, the CU reaches the minimum size. This scheme simplifies the design of block partitioning in HEVC and designs simpler yet more effective coding tools. Experimental results show significant reduction of bit-rate and complexity compared with the occupancy map coding scheme in V-PCC. In addition, this scheme is also very efficient to compress the semantic map.
We propose to improve neural network-based compression artifact reduction by transmitting side information for the neural network. The side information consists of artifact descriptors that are obtained by analyzing t...
详细信息
ISBN:
(数字)9781728180687
ISBN:
(纸本)9781728180694
We propose to improve neural network-based compression artifact reduction by transmitting side information for the neural network. The side information consists of artifact descriptors that are obtained by analyzing the original and compressed images in the encoder. In the decoder, the received descriptors are used as additional input to a well-designed conditional post-processing neural network. To reduce the transmission overhead, the entire model is optimized under the rate-distortion constraint via end-to-end learning. Experimental results show that introducing the side information greatly improves the ability of the post-processing neural network, and improves the rate-distortion performance.
In this paper, we propose a learned scalable/progressive image compression scheme based on deep neural networks (DNN), named Bidirectional Context Disentanglement Network (BCD-Net). For learning hierarchical represent...
详细信息
Online music services have been popular for end users to obtain music, where user interests, as reflected by their downloading records, are crucial for service providers to understand users and thus to provide persona...
详细信息
ISBN:
(纸本)9781479947164
Online music services have been popular for end users to obtain music, where user interests, as reflected by their downloading records, are crucial for service providers to understand users and thus to provide personalization. However, the raw downloading records are of huge volume and difficult to analyze intuitively. We study a visualization approach to analyzing downloading records so as to present user interests. To reveal the underlying relevance between music tracks, we utilized not only the metadata of music (especially genres), but also collaborative relevance that is voted by users. To present time varying user interests, we designed several new figures, namely Bean plot, Instrument plot, and Transitional Pie plot, that are capable in displaying different aspects of user interests variation. We have performed experiments with a real-world data set, and the results show the effectiveness of our proposed visualization method. Our work is also inspiring for visualization of time varying data in other applications.
This paper addresses the issue on how to more effectively coordinate the depth with RGB aiming at boosting the performance of RGB-D object detection. Particularly, we investigate two primary ideas under the CNN model:...
详细信息
This paper addresses the issue on how to more effectively coordinate the depth with RGB aiming at boosting the performance of RGB-D object detection. Particularly, we investigate two primary ideas under the CNN model: property derivation and property fusion. Firstly, we propose that the depth can be utilized not only as a type of extra information besides RGB but also to derive more visual properties for comprehensively describing the objects of interest. So a two-stage learning framework consisting of property derivation and fusion is constructed. Here the properties can be derived either from the provided color/depth or their pairs (e.g. the geometry contour adopted in this paper). Secondly, we explore the fusion method of different properties in feature learning, which is boiled down to, under the CNN model, from which layer the properties should be fused together. The analysis shows that different semantic properties should be learned separately and combined before passing into the final classifier. Actually, such a detection way is in accordance with the mechanism of the primary neural cortex (V1) in brain. We experimentally evaluate the proposed method on the challenging dataset, and have achieved state-of-the-art performance.
Content-aware image retargeting has attracted substantial research interests in the related research community. However, so far there is still no method can adequately preserve important image contents and structure w...
详细信息
ISBN:
(纸本)9781509053179
Content-aware image retargeting has attracted substantial research interests in the related research community. However, so far there is still no method can adequately preserve important image contents and structure well without introducing conspicuous visible deformation in a relatively short period of time. To address this problem, we propose a Fast Genetic Multi-operator (FGM) method which integrates multiple retargeting operators. To improve the efficiency, FGM method utilizes Genetic Algorithms (GAs) to reach the optimal operator ratio, which adopts saliency and Gray-Level Co-occurrence Matrix (GLCM) as its energy function. FGM method not only can well preserve salient contents and structure, but also can greatly reduce the computational complexity. Experimental results demonstrated that our method outperforms state-of-art image retargeting methods.
暂无评论