Since the lighting conditions in strong contrast regions between the light and dark cant be estimated accurately by traditional center/surround Retinex algorithm, the over-enhancement and color distortion may exist. I...
详细信息
ISBN:
(纸本)9781509028603
Since the lighting conditions in strong contrast regions between the light and dark cant be estimated accurately by traditional center/surround Retinex algorithm, the over-enhancement and color distortion may exist. In view of this, combining with the human visual characteristics, a color image enhancement algorithm based on tone-preserving was proposed. A determination function was added to the bilateral filter to estimate illuminance image more accurately and weaken over-enhancement. According to human visual masking effect, the improved gamma correction was utilized to correct the brightness of illumination image adaptively and the local contrast of reflection image obtained by division was enhanced based on local statistics. Besides, the final enhanced image was obtained by combining illumination image with reflection image, which can make image appear more natural. Compared with other similar algorithms from both subjective and objective aspects, the results show that this method being applied to low-contrast color image enhancement can not only improve image clarity, but reduce color distortion.
Intra prediction is an essential component in the image coding. This paper gives an intra prediction framework completely based on neural network modes (NM). Each NM can be regarded as a regression from the neighborin...
详细信息
ISBN:
(纸本)9781728180687
Intra prediction is an essential component in the image coding. This paper gives an intra prediction framework completely based on neural network modes (NM). Each NM can be regarded as a regression from the neighboring reference blocks to the current coding block. (1) For variable block size, we utilize different network structures. For small blocks 4x4 and 8x8, fully connected networks are used, while for large blocks 16x16 and 32x32, convolutional neural networks are exploited. (2) For each prediction mode, we develop a specific pre-trained network to boost the regression accuracy. When integrating into HEVC test model, we can save 3.55%, 3.03% and 3.27% BD-rate for Y, U, V components compared with the anchor. As far as we know, this is the first work to explore a fully NM based framework for intra prediction, and we reach a better coding gain with a lower complexity compared with the previous work.
In streaming media services, video transcoding is a common practice to alleviate bandwidth demands. Unfortunately, traditional methods employing a uniform rate factor (RF) across all videos often result in significant...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
In streaming media services, video transcoding is a common practice to alleviate bandwidth demands. Unfortunately, traditional methods employing a uniform rate factor (RF) across all videos often result in significant inefficiencies. Content-adaptive encoding (CAE) techniques address this by dynamically adjusting encoding parameters based on video content characteristics. However, existing CAE methods are often tightly coupled with specific encoding strategies, leading to inflexibility. In this paper, we propose a model that predicts both RF-quality and RF-bitrate curves, which can be utilized to derive a comprehensive bitrate-quality curve. This approach facilitates flexible adjustments to the encoding strategy without necessitating model retraining. The model leverages codec features, content features, and anchor features to predict the bitrate-quality curve accurately. Additionally, we introduce an anchor suspension method to enhance prediction accuracy. Experiments confirm that the actual quality metric (VMAF) of the compressed video stays within +/- 1 of the target, achieving an accuracy of 99.14%. By incorporating our quality improvement strategy with the rate-quality curve prediction model, we conducted online A/B tests, obtaining both +0.107% improvements in video views and video completions and +0.064% app duration time. Our model has been deployed on the Xiaohongshu App.
Neural Radiance Fields (NeRF) have demonstrated exceptional performance in generating novel views of scenes by learning implicit volumetric representations from calibrated RGB images, without depth information. A majo...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Neural Radiance Fields (NeRF) have demonstrated exceptional performance in generating novel views of scenes by learning implicit volumetric representations from calibrated RGB images, without depth information. A major limitation is the need for large training datasets in neural network-based view synthesis frameworks. The challenge of effective data augmentation for view synthesis remains unresolved. NeRF models require extensive scene coverage from multiple views to accurately estimate radiance and density. Insufficient coverage reduces the model's ability to interpolate or extrapolate unseen parts of the scene effectively. In this paper, we propose a novel pipeline to address this data augmentation issue using depth map information. We use depth image-based rendering (DIBR) to overcome the lack of enough views for training NeRF. Experimental results indicate that our approach enhances the quality of rendered images using the NeRF framework, achieving an average peak signal-to-noise ratio (PSNR) increase of 7.2 dB, with a maximum improvement of 12 dB.
In this paper, we propose a convolutional neural network (CNN)-based post-processing filter for video compression with multi-scale feature representation. The discrete wavelet transform (DWT) decomposes an image into ...
详细信息
ISBN:
(纸本)9781665475921
In this paper, we propose a convolutional neural network (CNN)-based post-processing filter for video compression with multi-scale feature representation. The discrete wavelet transform (DWT) decomposes an image into multi-frequency and multi-directional sub-bands, and can figure out artifacts caused by video compression with multi-scale feature representation. Thus, we combine DWT with CNN and construct two sub-networks: Step-like sub-band network (SLSB) and mixed enhancement network (ME). SLSB takes the wavelet subbands as input, and feeds them into the Res2Net group (R2NG) from high frequency to low frequency. R2NG consists of Res2Net modules and adopts spatial and channel attentions to adaptively enhance features. We combine the high frequency sub-band output with the low frequency sub-band in R2NG to capture multi-scale features. ME uses mixed convolution composed of dilated convolution and standard convolution as the basic block to expand the receptive field without blind spots in dilated convolution and further improve the reconstruction quality. Experimental results demonstrate that the proposed CNN filter achieves average 2.13 %, 2.63 %, 2.99 %, 4.8 %, 3.72 % and 4.5 % BD-rate reductions over VTM 11.0-NNVC anchor for Y channel on A1, A2, B, C, D and E classes of the common test conditions (CTC) in AI, RA and LDP configurations, respectively.
In this paper, some of the most significant image quality indexes are reviewed and compared with a new method for blockness distortion evaluation. The paper begins with a brief survey on classical measures based on nu...
详细信息
ISBN:
(纸本)0780339061
In this paper, some of the most significant image quality indexes are reviewed and compared with a new method for blockness distortion evaluation. The paper begins with a brief survey on classical measures based on numerical difference between original and reconstructed image data (e.g., MSE, SNR and PSNR) and advanced methods aiming at considering the perceptive aspects of image degradation (e.g., Hosaka Plots and other methods based on Human visual System properties like Information Content or Perceptual image Distortion). After, four innovative methods for blockness distortion measurement are proposed: two based on DCT analysis, and two on differential Sobel operator. Results on standard pictures confirm the efficiency of the proposed measures.
An image is often corrupted by additive gaussian noise during its acquisition and transmission. Denoising has to be performed on these images to retain the signal and to suppress the noise. Denoising can be performed ...
详细信息
ISBN:
(纸本)0780386744
An image is often corrupted by additive gaussian noise during its acquisition and transmission. Denoising has to be performed on these images to retain the signal and to suppress the noise. Denoising can be performed by various methods like thresholding, filtering etc. But these methods did not consider the local space scale information of the image. Here a new type of neural network is constructed for noise reduction, where the space scale information of the image is considered. This method gives a good numerical results and also better visual effects. Keywords: Denoising,Discrete Wavelet Transform, Continuous soft thresholding, Least Mean Square rule.
In this paper, we propose a new super resolution technique based on the interpolation followed by registering them using iterative back projection (IBP). Low resolution images are being interpolated and then the inter...
详细信息
ISBN:
(纸本)9781467355636;9781467355629
In this paper, we propose a new super resolution technique based on the interpolation followed by registering them using iterative back projection (IBP). Low resolution images are being interpolated and then the interpolated images are being registered in order to generate a sharper high resolution image. The proposed technique has been tested on Lena, Elaine, Pepper, and Baboon. The quantitative peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) results as weil as the visual results show the superiority of the proposed technique over the conventional and state-of-art image super resolution techniques. For Lena's image, the PSNR is 6.52 dB high er than the bicubic interpolation.
Designing visual content and characters for games is a time consuming task even for designers and illustrators with experience. Most of the game companies and developers use procedural methods to automate the design p...
详细信息
ISBN:
(纸本)9781665450928
Designing visual content and characters for games is a time consuming task even for designers and illustrators with experience. Most of the game companies and developers use procedural methods to automate the design process. The visual content produced by these algorithms is limited in terms of variation. In this paper, we propose to use Generative Adversarial Networks (GANs) for visual content production. Two different rpg and dnd visualimage datasets were collected over the internet for training and 6 different GAN models were trained on them. In 3 of 18 experiments, transfer learning methods are used because of the limited datasets. The Frechet Inception Distance metric was used to compare the model results. As a result, SNGAN was the most successful in both datasets. Moreover, the transfer learning method (WGAN-GP, BigGAN) was more successful than the from scratch method.
The article focuses on the audio and video analysis for multimedia interactive services. It describes a system that automates home video editing. It automatically extracts a set of highlight segments from a set of raw...
详细信息
The article focuses on the audio and video analysis for multimedia interactive services. It describes a system that automates home video editing. It automatically extracts a set of highlight segments from a set of raw home videos and aligns them with user-supplied incidental music based on the content of the video and incidental music. Finally, it introduces a method for interactive image retrieval using query feedback. It learns the user query as well as the correspondence between high-level user concepts and their low-level machine representation by performing retrievals according to multiple queries supplied by the user during the course of a retrieval session.
暂无评论