Saliency prediction can be treated as the activity of the human visual system (HVS). The most effective method should highly approximate the response of HVS to the perceived information. Motivated by that orientation ...
详细信息
ISBN:
(数字)9781728180687
ISBN:
(纸本)9781728180694
Saliency prediction can be treated as the activity of the human visual system (HVS). The most effective method should highly approximate the response of HVS to the perceived information. Motivated by that orientation selectivity (OS) mechanism occuring in primary visual cortex (PVC) tells us how the HVS extracts visual information for scene understanding, we propose a novel saliency model by combining an orientation selectivity based local feature called "excitement" map and a visual acuity based global feature called "acuity" map. Further, a saliency augmented operator based on visual error sensitivity is designed to enhance the saliency map. Experimental results on three benchmark databases demonstrate the superior performance of the proposed method compared to ten classical/ state-of-the-art algorithms.
image upscaling to obtain high quality digital image is one of the active research topics as it is applicable in the consumer electronics industries. Traditional image upscaling techniques have low computational compl...
详细信息
ISBN:
(纸本)9781728123929
image upscaling to obtain high quality digital image is one of the active research topics as it is applicable in the consumer electronics industries. Traditional image upscaling techniques have low computational complexity and applicable for real-time processing, but reconstructed image often contains artifacts and undesirable visual effect. The relationship between image interpolation and super-resolution leads our assumption that the interpolated image can be further optimized and may be considered as a part of super-resolution algorithm. In this paper, we propose a new image super-resolution method to combine fast image interpolation with iterative back-projection. This method does not require any external pre-trained datasets and has low computation time while the quality of the reconstructed image can be measured up to the high programming complexity methods such as the dictionary and deep convolutional neural networks.
In this paper, we proposed an optimized model based on the visual attention mechanism(VAM) for no-reference stereoscopic image quality assessment (SIQA). A CNN model is designed based on dual attention mechanism (DAM)...
详细信息
ISBN:
(数字)9781728180687
ISBN:
(纸本)9781728180694
In this paper, we proposed an optimized model based on the visual attention mechanism(VAM) for no-reference stereoscopic image quality assessment (SIQA). A CNN model is designed based on dual attention mechanism (DAM), which includes channel attention mechanism and spatial attention mechanism. The channel attention mechanism can give high weight to the features with large contribution to final quality, and small weight to features with low contribution. The spatial attention mechanism considers the inner region of a feature, and different areas are assigned different weights according to the importance of the region within the feature. In addition, data selection strategy is designed for CNN model. According to VAM, visual saliency is applied to guide data selection, and a certain proportion of saliency patches are employed to fine tune the network. The same operation is performed on the test set, which can remove data redundancy and improve algorithm performance. Experimental results on two public databases show that the proposed model is superior to the state-of-the-art SIQA methods. Cross-database validation shows high generalization ability and high effectiveness of our model.
Due to the rapid development in digital technology, image enhancement has become a necessity to extract data and use it in many fields that may be medical, agricultural security, and many other fields. There are many ...
Due to the rapid development in digital technology, image enhancement has become a necessity to extract data and use it in many fields that may be medical, agricultural security, and many other fields. There are many image enhancement technique used in imageprocessing. The proposed method have been presented in this paper attempt to improve the performance of histogram equalization with different kind of medical image by using Gaussian filter and gamma where combined together to improve the illumination, contrast of images ,reduce the noise, and also improve image quality coefficients. Try to measure the entropy of image and compare the result of output image with input image and to give desired result. The quality coefficients for the medical images processed with the HE and Gaussian filter and Gamma with certain value, according to the scientific proof. When the image quality was improved, the visual perception was improved.
The common attention mechanism has been widely adopted in prevalent image captioning frameworks. In most of the prior work, attention weights were only determined by visual features as well as the hidden states of Rec...
详细信息
ISBN:
(数字)9783030368029
ISBN:
(纸本)9783030368029;9783030368012
The common attention mechanism has been widely adopted in prevalent image captioning frameworks. In most of the prior work, attention weights were only determined by visual features as well as the hidden states of Recurrent Neural Network (RNN), while the interaction of visual features was not modelled. In this paper, we introduce the self-attention into the current image captioning framework to leverage the nonlocal correlation among visual features. Moreover, we propose three distinctive methods to fuse the self-attention and the conventional attention mechanism. Extensive experiments on MSCOCO dataset show that the self-attention can empower the captioning model to achieve competitive performance with the state-of-the-art methods.
Facial Expressions are an integral part of human communication. Therefore, correct classification of facial expression in image and video data has been an important quest for researchers and software development indus...
详细信息
ISBN:
(数字)9781728162898
ISBN:
(纸本)9781728162904
Facial Expressions are an integral part of human communication. Therefore, correct classification of facial expression in image and video data has been an important quest for researchers and software development industry. In this paper we propose the video classification method using Recurrent Neural Networks (RNN) in addition to Convolution Neural Networks (CNN) to capture temporal as well spatial features of a video sequence. The methodology is tested on The Ryerson Audio-visual Database of Emotional Speech and Song (RAVDESS). Since no other results were available on this dataset using only visual analysis, the proposed method provides the first benchmark of 61% test accuracy on given dataset.
Underwater image enhancement is important for images captured in underwater because underwater images often suffer from color cast, low contrast and degraded visibility due to the absorption and scattering of light in...
详细信息
image retargeting is the technique to display images via devices with various aspect ratios and sizes. Traditional content-Aware retargeting methods rely on low-level features to predict pixel-wise importance and can ...
详细信息
This paper proposes Graph Grouping (GG) loss for metric learning and its application to face verification. GG loss predisposes image embeddings of the same identity to be close to each other, and those of different id...
详细信息
ISBN:
(数字)9781728180687
ISBN:
(纸本)9781728180694
This paper proposes Graph Grouping (GG) loss for metric learning and its application to face verification. GG loss predisposes image embeddings of the same identity to be close to each other, and those of different identities to be far from each other by constructing and optimizing graphs representing the relation between images. Further, to reduce the computational cost, we propose an efficient way to compute GG loss for cases where embeddings are L 2 normalized. In experiments, we demonstrate the effectiveness of the proposed method for face verification on the VoxCeleb dataset. The results show that the proposed GG loss outperforms conventional losses for metric learning.
As is known to us, visually induced motion sickness (VIMS) is often experienced in a virtual environment. Learning the visual attention of people with VIMS contributes to related research in the field of virtual reali...
详细信息
暂无评论