In recent years, a lot of deep convolution neural networks have been successfully applied in single image super-resolution (SISR). Even in the case of using small convolution kernel, those methods still require large ...
详细信息
ISBN:
(纸本)9781665475921
In recent years, a lot of deep convolution neural networks have been successfully applied in single image super-resolution (SISR). Even in the case of using small convolution kernel, those methods still require large number of parameters and computation. To tackle the problem above, we propose a novel framework to extract features more efficiently. Inspired by the idea of deep separable convolution, we improve the standard residual block and propose the inverted bottleneck block (IBNB). The IBNB replaces the small-sized convolution kernel with the large-sized convolution kernel without introducing additional computation. The proposed IBNB proves that large kernel size convolution is available for SISR. Comprehensive experiments demonstrate that our method surpasses most methods by up to 0.10 similar to 0.32dB in quantitative metrics with fewer parameters.
With the advancement of deep learning techniques, learned image compression (LIC) has surpassed traditional compression methods. However, these methods typically require training separate models to achieve optimal rat...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
With the advancement of deep learning techniques, learned image compression (LIC) has surpassed traditional compression methods. However, these methods typically require training separate models to achieve optimal rate-distortion performance, leading to increased time and resource consumption. To tackle this challenge, we propose leveraging multi-gain and inverse multi-gain unit pairs to enable variable rate adaptation within a single model. Nevertheless, experiments have shown that rate-distortion performance may degrade at certain bitrates. Therefore, we introduce weighted probability assignment, where different selection probabilities are assigned during training based on lambda values, to increase the model's training frequency under specific bitrate conditions. To validate our approach, extensive experiments were conducted on Transformer-based and CNN-based models. The experimental results validate the efficiency of our proposed method.
In this paper, we propose the fractal-based gradient match vector quantizers (FGMVQs) and the fractal-based side match vector quantizers (FSMVQs) for the image coding framework. The proposed schemes are based upon the...
详细信息
ISBN:
(纸本)0819439886
In this paper, we propose the fractal-based gradient match vector quantizers (FGMVQs) and the fractal-based side match vector quantizers (FSMVQs) for the image coding framework. The proposed schemes are based upon the noniterative fractal block coding (FBC) technique and the concepts of the gradient match vector quantizers (GMVQs) and the side match vector quantizers (SMVQs). Unlike the ordinary GMVQs and SMVQs, the super codebooks in the proposed FGMVQs and FSMVQs are generated from the affine-transformed domain blocks in the non-iterative FBC technique. The codewords in the state codebook are dynamically extracted from the super codebook with the side-match and gradient-match criteria. The redundancy in affine-transformed domain blocks is greatly reduced and tile compression ratio can be significantly increased. Our simulation results show that about 10% - 20% bit rates in tile non-iterative FBC techniques are saved by using the proposed FGMVQs and FSMVQs.
The MPEG-7 standard is currently being developed to specify standardized interfaces that help users or agents to identify, filter, browse and efficiently retrieve audio-visual material. The purpose of the presentation...
详细信息
ISBN:
(纸本)0780367251
The MPEG-7 standard is currently being developed to specify standardized interfaces that help users or agents to identify, filter, browse and efficiently retrieve audio-visual material. The purpose of the presentation is to provide an overview of the scope and potential of the MPEG-7 standard, with particular emphasis on no-text descriptors for visual content. Based on the MPEG-7 specifications the implementation of a wealth of applications is possible. In the conference various application scenarios will be presented and discussed.
With the development of the game industry and the popularization of mobile devices, mobile games have played an important role in people's entertainment life. The aesthetic quality of mobile game images determines...
详细信息
ISBN:
(纸本)9781728185514
With the development of the game industry and the popularization of mobile devices, mobile games have played an important role in people's entertainment life. The aesthetic quality of mobile game images determines the users' Quality of Experience (QoE) to a certain extent. In this paper, we propose a multi-task deep learning based method to evaluate the aesthetic quality of mobile game images in multiple dimensions (i.e. the fineness, color harmony, colorfulness, and overall quality). Specifically, we first extract the quality-aware feature representation through integrating the features from all intermediate layers of the convolution neural network (CNN) and then map these quality-aware features into the quality score space in each dimension via the quality regressor module, which consists of three fully connected (FC) layers. The proposed model is trained through a multi-task learning manner, where the quality-aware features are shared by different quality dimension prediction tasks, and the multi-dimensional quality scores of each image are regressed by multiple quality regression modules respectively. We further introduce an uncertainty principle to balance the loss of each task in the training stage. The experimental results show that our proposed model achieves the best performance on the Multi-dimensional Aesthetic assessment for Mobile Game image database (MAMG) among state-of-the-art image quality assessment (IQA) algorithms and aesthetic quality assessment (AQA) algorithms.
The process of multi-modal image registration is fundamental in remote sensing and visual navigation applications. However, existing image registration methods that are designed for single modality images do not provi...
详细信息
ISBN:
(纸本)9798350343557
The process of multi-modal image registration is fundamental in remote sensing and visual navigation applications. However, existing image registration methods that are designed for single modality images do not provide satisfactory results when applied to multi-modal image registration. In this research, our objective is to achieve highly accurate alignment of both infrared and optical (visible range) images. To accomplish this goal, we explore the effectiveness of the Swin Transformer encoder and cosine loss in enhancing the keypoint-based image registration process. Simulation results show the improvement achieved in multi-modal registration by using a transformer based Siamese network.
One of the principal contradictions these days in the field of v ideo is l ying b etween t he b ooming d emand for evaluating the streaming video quality and the low precision of the Quality of Experience prediction r...
详细信息
ISBN:
(纸本)9781728180687
One of the principal contradictions these days in the field of v ideo is l ying b etween t he b ooming d emand for evaluating the streaming video quality and the low precision of the Quality of Experience prediction results. In this paper, we propose Convolutional Neural Network and Gate Recurrent Unit (CGNN)-QoE, a deep learning QoE model, that can predict overall and continuous scores of video streaming services accurately in real time. We further implement state-of-the-art models on the basis of their works and compare with our method on six public available datasets. In all considered scenarios, the CGNN-QoE outperforms existing methods.
With the increasing popularity of commercial depth cameras, 3D reconstruction of dynamic scenes has aroused widespread interest. Although many novel 3D applications have been unlocked, real-time performance is still a...
详细信息
ISBN:
(纸本)9781665475921
With the increasing popularity of commercial depth cameras, 3D reconstruction of dynamic scenes has aroused widespread interest. Although many novel 3D applications have been unlocked, real-time performance is still a big problem. In this paper, a low-cost, real-time system: LiveRecon3D, is presented, with multiple RGB-D cameras connected to one single computer. The goal of the system is to provide an interactive frame rate for 3D content capture and rendering at a reduced cost. In the proposed system, we adopt a scalable volume structure and employ ray casting technique to extract the surface of 3D content. Based on a pipeline design, all the modules in the system run in parallel and are designed to minimize the latency to achieve an interactive frame rate of 30 FPS. At last, experimental results corresponding to implementation with three Kinect v2 cameras are presented to verify the system's effectiveness in terms of visual quality and real-time performance.
Though learning-based low-light enhancement methods have achieved significant success, existing methods are still sensitive to noise and unnatural appearance. The problems may come from the lack of structural awarenes...
详细信息
ISBN:
(纸本)9781728180687
Though learning-based low-light enhancement methods have achieved significant success, existing methods are still sensitive to noise and unnatural appearance. The problems may come from the lack of structural awareness and the confusion between noise and texture. Thus, we present a low-light image enhancement method that consists of an image disentanglement network and an illumination boosting network. The disentanglement network is first used to decompose the input image into image details and image illumination. The extracted illumination part then goes through a multi-branch enhancement network designed to improve the dynamic range of the image. The multi-branch network extracts multi-level image features and enhances them via numerous subnets. These enhanced features are then fused to generate the enhanced illumination part. Finally, the denoised image details and the enhanced illumination are entangled to produce the normal-light image. Experimental results show that our method can produce visually pleasing images in many public datasets
HTTP adaptive streaming (HAS) constructs bitrate ladders to deliver videos with the best possible quality under varying network conditions. Though per-shot content adaptive encoding (CAE) largely improves the compress...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
HTTP adaptive streaming (HAS) constructs bitrate ladders to deliver videos with the best possible quality under varying network conditions. Though per-shot content adaptive encoding (CAE) largely improves the compression efficiency by constructing the optimal bitrate ladder for each video shot, it suffers from excessive encoding complexity as all the points in the operating space (typically resolution x bitrate) need to be encoded and compared. To address this issue, this paper proposes an efficient bitrate ladder construction method that encodes only a subset of operating points, then uses curve fitting and inter-curve prediction to estimate other points' RD performance. The proposed method enables low-complexity ladder construction even for high-dimension operating spaces that incorporate dimensions like encoding presets. Experiments show that this method can achieve RD performance comparable to the original per-shot CAE with only 42% encoding points. Even when minimizing the encoding points to 3.6% of the original CAE, it achieves 15% BDRate improvements compared to using the fixed bitrate ladder.
暂无评论