In this paper, we propose a convolutional neural network (CNN)-based post-processing filter for video compression with multi-scale feature representation. The discrete wavelet transform (DWT) decomposes an image into ...
详细信息
ISBN:
(纸本)9781665475921
In this paper, we propose a convolutional neural network (CNN)-based post-processing filter for video compression with multi-scale feature representation. The discrete wavelet transform (DWT) decomposes an image into multi-frequency and multi-directional sub-bands, and can figure out artifacts caused by video compression with multi-scale feature representation. Thus, we combine DWT with CNN and construct two sub-networks: Step-like sub-band network (SLSB) and mixed enhancement network (ME). SLSB takes the wavelet subbands as input, and feeds them into the Res2Net group (R2NG) from high frequency to low frequency. R2NG consists of Res2Net modules and adopts spatial and channel attentions to adaptively enhance features. We combine the high frequency sub-band output with the low frequency sub-band in R2NG to capture multi-scale features. ME uses mixed convolution composed of dilated convolution and standard convolution as the basic block to expand the receptive field without blind spots in dilated convolution and further improve the reconstruction quality. Experimental results demonstrate that the proposed CNN filter achieves average 2.13 %, 2.63 %, 2.99 %, 4.8 %, 3.72 % and 4.5 % BD-rate reductions over VTM 11.0-NNVC anchor for Y channel on A1, A2, B, C, D and E classes of the common test conditions (CTC) in AI, RA and LDP configurations, respectively.
In this paper, some of the most significant image quality indexes are reviewed and compared with a new method for blockness distortion evaluation. The paper begins with a brief survey on classical measures based on nu...
详细信息
ISBN:
(纸本)0780339061
In this paper, some of the most significant image quality indexes are reviewed and compared with a new method for blockness distortion evaluation. The paper begins with a brief survey on classical measures based on numerical difference between original and reconstructed image data (e.g., MSE, SNR and PSNR) and advanced methods aiming at considering the perceptive aspects of image degradation (e.g., Hosaka Plots and other methods based on Human visual System properties like Information Content or Perceptual image Distortion). After, four innovative methods for blockness distortion measurement are proposed: two based on DCT analysis, and two on differential Sobel operator. Results on standard pictures confirm the efficiency of the proposed measures.
A motion-compensated wavelet video coder is presented that uses adaptive mode selection (AMS) for each macroblock (MB). The block-based motion estimation is performed in the spatial domain, and an embedded zerotree wa...
详细信息
ISBN:
(纸本)0819452114
A motion-compensated wavelet video coder is presented that uses adaptive mode selection (AMS) for each macroblock (MB). The block-based motion estimation is performed in the spatial domain, and an embedded zerotree wavelet coder (EZW) is employed to encode the residue frame. In contrast to other motion-compensated wavelet video coders, where all the MBs are forced to be in INTER mode, we construct the residue frame by combining the prediction residual of the INTER MBs with the coding residual of the INTRA and INTER ENCODE MBs. Different from INTER MBs that are not coded, the INTRA and INTER-ENCODE MBs are encoded separately by a DCT coder. By adaptively selecting the quantizers of the INTRA and INTER-ENCODE coded MBs, our goal is to equalize the characteristics of the residue frame in order to improve the overall coding efficiency of the wavelet coder. The mode selection is based on the variance of the MB, the variance of the prediction error, and the variance of the neighboring MBs' residual. Simulations show that the proposed motion-compensated wavelet video coder achieves a gain of around 0.7-0.8dB PSNR over MPEG-2 TM5, and a comparable PSNR to other 2D motion-compensated wavelet-based video codecs. It also provides potential visual quality improvement.
Label location and recognition has become a crucial task for today's Unmanned Aerial Vehicles. We proposed a morphology-based algorithm to locate and recognize labels. This algorithm is insensitive to scaling and ...
详细信息
ISBN:
(纸本)0819459763
Label location and recognition has become a crucial task for today's Unmanned Aerial Vehicles. We proposed a morphology-based algorithm to locate and recognize labels. This algorithm is insensitive to scaling and rotation, and able to work at low resolution. The label positioning and recognition strategy we designed is divided into two steps. First, at the altitude of 10m or so, we apply dilation processing and edge detection on the images sent back by UAV. Then combining the current heading information of the vehicle, we are able to give the topology map of all labels. After that the vehicle is lowered to about 5m and we apply erosion processing on the returned image and then recognize each label using image measurement and image analysis methods. The validity of this algorithm is well verified at ARCC 2004.
In this paper, we propose a new super resolution technique based on the interpolation followed by registering them using iterative back projection (IBP). Low resolution images are being interpolated and then the inter...
详细信息
ISBN:
(纸本)9781467355636;9781467355629
In this paper, we propose a new super resolution technique based on the interpolation followed by registering them using iterative back projection (IBP). Low resolution images are being interpolated and then the interpolated images are being registered in order to generate a sharper high resolution image. The proposed technique has been tested on Lena, Elaine, Pepper, and Baboon. The quantitative peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) results as weil as the visual results show the superiority of the proposed technique over the conventional and state-of-art image super resolution techniques. For Lena's image, the PSNR is 6.52 dB high er than the bicubic interpolation.
Designing visual content and characters for games is a time consuming task even for designers and illustrators with experience. Most of the game companies and developers use procedural methods to automate the design p...
详细信息
ISBN:
(纸本)9781665450928
Designing visual content and characters for games is a time consuming task even for designers and illustrators with experience. Most of the game companies and developers use procedural methods to automate the design process. The visual content produced by these algorithms is limited in terms of variation. In this paper, we propose to use Generative Adversarial Networks (GANs) for visual content production. Two different rpg and dnd visualimage datasets were collected over the internet for training and 6 different GAN models were trained on them. In 3 of 18 experiments, transfer learning methods are used because of the limited datasets. The Frechet Inception Distance metric was used to compare the model results. As a result, SNGAN was the most successful in both datasets. Moreover, the transfer learning method (WGAN-GP, BigGAN) was more successful than the from scratch method.
Many users want to preserve their visual record of the moment that they want to commemorate. Nonetheless, it is still challenging to remember the actual emotional feeling for that moment even by looking at the old pic...
详细信息
ISBN:
(纸本)9781479961399
Many users want to preserve their visual record of the moment that they want to commemorate. Nonetheless, it is still challenging to remember the actual emotional feeling for that moment even by looking at the old picture. There are methods such as to tag or hide the message within the image. However, tradeoffs exist by attaching additional data for the former method and the quality of the image is degraded for the latter one. It is difficult to avoid these two tradeoffs. In this paper, we propose D-mago to preserve the moment to remember as an image, which is consists of the visual information and the emotional feeling without binding extra data or degrading the quality of the image. To further verify the benefit of our proposed algorithm, we conducted series of evaluation studies to see the effectiveness of the proposed scheme. The results indicate that D-mago overcomes the preceding tradeoffs by maintaining PSNR above 40 dB.
Inpainting applications include object removal on images and videos, crack filling, error concealment, texture synthesis, where in this paper, its usage for image coherence and perspective emphasis on video frames in ...
详细信息
ISBN:
(纸本)9781538615010
Inpainting applications include object removal on images and videos, crack filling, error concealment, texture synthesis, where in this paper, its usage for image coherence and perspective emphasis on video frames in 2D image-to-video conversion system is analysed. Besides, the performance of different techniques in object removal and image reconstruction is compared using visual experiments and quality metrics.
In this study a new method is proposed for inserting advertisement visuals into images automatically and without disturbing the image content. In this method important areas are determined using deep learning based ob...
详细信息
ISBN:
(纸本)9781538615010
In this study a new method is proposed for inserting advertisement visuals into images automatically and without disturbing the image content. In this method important areas are determined using deep learning based object, face and text detection, edge and saliency maps are obtained, and these information are used for the identification of the best location for inserting the advertisement visual. In order to select the best available advertisement visual from an advertisement pool shape and color features are utilized.
Magnetic Resonance Imaging (MRI) is widely used for medical diagnosis, staging and follow-up of disease. However, MRI images may have artifacts due to various reasons such as patient movement or machine distortion, wh...
详细信息
ISBN:
(纸本)9781665475921
Magnetic Resonance Imaging (MRI) is widely used for medical diagnosis, staging and follow-up of disease. However, MRI images may have artifacts due to various reasons such as patient movement or machine distortion, which may be unintentionally introduced during the procedure of medical image acquisition, processing, etc. These artifacts may affect the effectiveness of diagnosis or even cause false diagnosis. To solve this problem, we propose a general medical image quality assessment (MIQA) methodology, including subjective MIQA procedures and objective MIQA algorithms. We further apply this methodology to MRI images in this paper due to its widespread use in practical applications. We first establish a magnetic resonance imaging quality assessment (MRIQA) database, which contains 3809 MRI images. Then a subjective image quality assessment experiment is conducted by expert doctors according to the diagnostic value of these images, which split all MRI images into 1285 low quality images and 2524 high quality images. We then conduct a baseline deep learning experiment, and propose an attention based MIQANet model to automatically separate MRI images into high quality and low quality based on their diagnosis value. Our proposed method achieves a great quality assessment accuracy of 96.59%. The constructed MRIQA database and proposed MIQA model will be public available to further promote medical IQA research.
暂无评论