The blind image quality assessment (BIQA) metric based on deep neural network (DNN) achieves the best evaluation accuracy at present, and the depth of neural networks plays a crucial role for deep learning-based BIQA ...
The blind image quality assessment (BIQA) metric based on deep neural network (DNN) achieves the best evaluation accuracy at present, and the depth of neural networks plays a crucial role for deep learning-based BIQA metric. However, training a DNN for quality assessment is known to be hard because of the lack of labeled data, and getting quality labels for a large number of images is very time consuming and costly. Therefore, training a deep BIQA metric directly will lead to over-fitting in all likelihood. In order to solve this problem, we introduced a weakly supervised approach for learning a deep BIQA metric. First, we pre-trained a novel encoder-decoder architecture by using the training data with weak quality annotations. The annotation is the error map between the distorted image and its undistorted version, which can roughly describes the distribution of distortion and can be easily acquired for training. Next, we fine-tuned the pre-trained encoder on the quality labeled data set. Moreover, we used the group convolution to reduce the parameters of the proposed metric and further reduce the risk of over-fitting. These training strategies, which reducing the risk of over-fitting, enable us to build a very deep neural network for BIQA to have a better performance. Experimental results showed that the proposed model had the state-of-art performance for various images with different distortion types.
The task of person re-identification (re-id) is to find the same pedestrian across non-overlapping cameras. Normally, the performance of person re-id can be affected by background clutters. However, existing segmentat...
详细信息
The task of person re-identification (re-id) is to find the same pedestrian across non-overlapping cameras. Normally, the performance of person re-id can be affected by background clutters. However, existing segmentation algorithms are hard to obtain perfect foreground person images. To effectively leverage the body (foreground) cue, and in the meantime pay attention to discriminative information in the background (e.g., companion or vehicle), we propose to use a cross-learning strategy to take both foreground and other discriminative information into account. In addition, since currently existing foreground segmentation result always involves noise, we use Label Smoothing Regularization (LSR) to strengthen the generalization capability during our learning process. In experiments, we pick up two state-of-the-art person re-id methods to verify the effectiveness of our proposed cross-learning strategy. Our experiments are carried out on two publicly available person re-id datasets. Obvious performance improvements can be observed on both datasets.
In this paper, we propose a near-duplicate image retrieval method based on multiple features. Combining the deep features extracted from the VGG relu6 layer with the improved local feature descriptors, we attempt to s...
详细信息
ISBN:
(纸本)9781538644584
In this paper, we propose a near-duplicate image retrieval method based on multiple features. Combining the deep features extracted from the VGG relu6 layer with the improved local feature descriptors, we attempt to simulate the nearduplicate image retrieval process of the human brain through a two-layer retrieval structure. Inspired by the proposed CROW feature, we calculate the weights on VGG shallow pooling layer and extract the interest domains for screening surf feature points. At the same time, a center weight is proposed to improve the VLAD algorithm. Experiments show that our method can not only obtain the visually similar results of an image, but also obtain the results that contain the visually prominent parts of the image.
image upscaling to obtain high quality digital image is one of the active research topics as it is applicable in the consumer electronics industries. Traditional image upscaling techniques have low computational compl...
详细信息
image upscaling to obtain high quality digital image is one of the active research topics as it is applicable in the consumer electronics industries. Traditional image upscaling techniques have low computational complexity and applicable for real-time processing, but reconstructed image often contains artifacts and undesirable visual effect. The relationship between image interpolation and super-resolution leads our assumption that the interpolated image can be further optimized and may be considered as a part of super-resolution algorithm. In this paper, we propose a new image super-resolution method to combine fast image interpolation with iterative back-projection. This method does not require any external pre-trained datasets and has low computation time while the quality of the reconstructed image can be measured up to the high programming complexity methods such as the dictionary and deep convolutional neural networks.
An automatic calibration procedure for a fisheye camera is presented in this paper by employing a flat panel monitor. The procedure does not require precise camera-monitor alignment, and any manual input of data or co...
详细信息
ISBN:
(纸本)9781538644584
An automatic calibration procedure for a fisheye camera is presented in this paper by employing a flat panel monitor. The procedure does not require precise camera-monitor alignment, and any manual input of data or commands, making it useful for factory automation for mass production of such cameras. The fully automatic calibration procedure, which requires the generation of various test patterns on the display, and analysis of fisheye images of these patterns, consists of the following steps: (i) estimate the image center of the camera, (ii) identify the line on the monitor which intersects optical axis of the camera perpendicularly, and (iii) along the above line, obtain calibration data needed in de-warping the fisheye image. Experimental results demonstrate that the proposed approach performs satisfactorily in terms of effectiveness and accuracy.
Cross-entropy loss function (CEL) is widely used for training a multi-class classification deep convolutional neural network (DCNN). While CEL has been successfully implemented in image classification tasks, it only f...
Cross-entropy loss function (CEL) is widely used for training a multi-class classification deep convolutional neural network (DCNN). While CEL has been successfully implemented in image classification tasks, it only focuses on the posterior probability of correct class when the labels of training images are one-hot. It cannot be discriminated against the classes not belong to correct class (wrong classes) directly. Negative Log Likelihood Ratio Loss (NLLR) is proposed to better discriminate the correct class from competing wrong classes. But optimization of the loss function is normally presented as a minimization problem. In training DCNN, the value of NLLR is not constantly positive or negative, which affects the convergence of NLLR adversely. So, we propose competing ratio loss (CRL), which calculates the posterior probability ratio between the correct class and competing wrong classes to better widen the difference between the probability of the correct class and the probabilities of wrong classes, which also assures the value of CRL is constantly positive. Through massive experiments, we demonstrate the effectiveness and robustness of CRL on deep convolutional neural networks, our CRL outperforms CEL and NLLR on CIFAR-10/100 datasets.
With the introduction of Convolutional Neural Networks, models for image classification achieve higher classification accuracy. Based on the pattern of the design of CNN architectures, increasing the number of layers ...
详细信息
ISBN:
(纸本)9781538644584
With the introduction of Convolutional Neural Networks, models for image classification achieve higher classification accuracy. Based on the pattern of the design of CNN architectures, increasing the number of layers equates to a higher classification accuracy, but also increases the number of parameters and model size. This negatively affects the model training time, processing time, and memory requirement. We develop ZipNet, a CNN architecture with a higher classification accuracy than ZFNet, the winner of ILSVRC 2013, but with 48.5x smaller model size and 48.7x fewer parameters. The classification accuracy of ZipNet is higher than the performance of ZFNet and SqueezeNet on all configurations of the Caltech-256 dataset with varying number of training examples.
image retrieval with convolutional neural network (CNN) has obtained a lot of attention. In this paper, we focus on a more challenging task: fine-grained image retrieval. We propose a simple and effective feature aggr...
详细信息
ISBN:
(纸本)9781538644584
image retrieval with convolutional neural network (CNN) has obtained a lot of attention. In this paper, we focus on a more challenging task: fine-grained image retrieval. We propose a simple and effective feature aggregation method using generalized-mean pooling (GeM pooling), which can make better use of information from the output tensor of the convolutional layer. In addition, we propose a simple feature selection scheme to remove noise and background. Experimental results demonstrate that our aggregation method not only outperformed state-of-the-art aggregation methods for general image retrieval, but also reach up to the same level of existing aggregation method for fine-grained image retrieval, with more compact representation and less memory cost.
Based human visual speed perception characteristic, a video quanlity assessment(VQA) algorithm is introduced in the paper. Natural video statistics features extraction and weighting factors are incorporated in the sch...
详细信息
ISBN:
(纸本)9781728152103
Based human visual speed perception characteristic, a video quanlity assessment(VQA) algorithm is introduced in the paper. Natural video statistics features extraction and weighting factors are incorporated in the scheme. In the VQA, considering the impact both video content itself and HVS's characteristic on human subjective perception, we propose weighting factors to scale the effect of those features, and it contains two parts: motion information and perception noise. And natural video statistics features relate to the spatial and temporal domain are extracted. The weighting factors would be used to combine both the temporal and spatial features, then generate the quality of each frame. Finally, the video quality score can be obtained by pooling scheme. LIVE database, EPFL-PoliMI database and some other generated test videos were used in our experiments, and the results indicate our model has outstanding performance.
Stereoscopic-3D (S3D) displays are widely used but present problems related to experiences of visual discomfort for human vision. One aspect of this issue is the movement of the gaze point within different depth field...
详细信息
ISBN:
(纸本)9781538644584
Stereoscopic-3D (S3D) displays are widely used but present problems related to experiences of visual discomfort for human vision. One aspect of this issue is the movement of the gaze point within different depth fields. Here we aim to analyze the relationship between eye movement patterns and visual comfort experienced when viewing S3D images. Rather than simply labeling eye movement data according to categories such as gaze, saccade and so on, we depoly nonparametric Bayesian method to analyze and cluster several eye movement patterns, and to relate them to visual comfort. The results are relevant to the prediction of visual comfort assessment in S3D images by automatic algorithms.
暂无评论