In an image Quality Assessment (IQA) scenario, the Human Vision System (HVS) always acts as the ultimate receiver and valuator of generated images. As an important feature of HVS, the visual attention data has been de...
详细信息
ISBN:
(纸本)9781538644584
In an image Quality Assessment (IQA) scenario, the Human Vision System (HVS) always acts as the ultimate receiver and valuator of generated images. As an important feature of HVS, the visual attention data has been demonstrated to be able to effectively improve the performance of existing objective quality metrics. However, this feature has not yet been well explored in the IQA of image interpolation. In this paper, we conduct an eye-tracking test on an interpolated image database and investigate the impact of visual attention on IQA of image interpolation. Two visual attention models, saliency map and Region Of Interest (ROI), are then obtained from the eye-tracking data. We further incorporate these models into non-integer interpolated IQA metric and examine their performances. Experiments show that the introduction of eye-tracking features obviously improves the conventional IQA metric for non-integer image interpolation.
High dynamic range (HDR) images capture the luminance information of the real world and have more detailed information than low dynamic range (LDR) images. In this paper, we propose a dual-streams global guided end-to...
High dynamic range (HDR) images capture the luminance information of the real world and have more detailed information than low dynamic range (LDR) images. In this paper, we propose a dual-streams global guided end-to-end learning method to reconstruct HDR image from a single LDR input that combines both global information and local image features. In our framework, global features and local features are separately learned in dual-streams branches. In the reconstructed phase, we use a fusion layer to fuse them so that the global features can guide the local features to better reconstruct the HDR image. Furthermore, we design mixed loss function including multi-scale pixel-wise loss, color similarity loss and gradient loss to jointly train our network. Comparative experiments are carried out with other state-of-the-art methods and our method achieves superior performance.
Inter-frame prediction plays an important role in video coding by predicting the current frame from previously encoded pictures, called reference pictures. In the case of camera motion, the content of a current frame ...
详细信息
Inter-frame prediction plays an important role in video coding by predicting the current frame from previously encoded pictures, called reference pictures. In the case of camera motion, the content of a current frame could be very different from its reference pictures and may consequently lead to a more difficult Motion Compensation (MC). The main idea of this paper is to process the input 2D video sequence in order to estimate the 3D geometry of the scene and then employ this data to virtually synthesize "geometrically compensated" reference pictures. Since these virtual reference pictures are more similar to the current frame, motion estimation and consequently coding efficiency could be enhanced. The proposed method is tested over six different video sequences and around 11% bitrate reduction is achieved compared to the High Efficiency Video Coding (HEVC) standard.
Affine motion compensation (AMC) has been adopted in the latest working draft of the Versatile Video Coding (VVC) standard jointly developed by ITU-T VCEG and ISO/IEC MPEG. The AMC in the VVC working draft is implemen...
详细信息
Affine motion compensation (AMC) has been adopted in the latest working draft of the Versatile Video Coding (VVC) standard jointly developed by ITU-T VCEG and ISO/IEC MPEG. The AMC in the VVC working draft is implemented as a sub-block based MC rather than pixel-based MC in order to reduce the memory access bandwidth and computation complexity, which loses prediction efficiency. The proposed algorithm in this paper is to improve the MC granularity by using pixel-based optical flow refinement. The affine prediction efficiency will be close to the pixel-based MC refinement without increasing external memory access. The experiment results show that on average the proposed algorithm can achieve 0.86% and 0.87% BD rate saving for random access and low delay B configurations, respectively, comparing to the VTM-4.0. The BD rate saving is up to 3.10%.
visual 3D reconstruction builds the 3D map of the environment from images and is essential in a wide range of applications such as robotics, augmented reality and relic preservation. In this paper, we integrate the vi...
详细信息
visual 3D reconstruction builds the 3D map of the environment from images and is essential in a wide range of applications such as robotics, augmented reality and relic preservation. In this paper, we integrate the visual 3D reconstruction with the mobility of mobile platforms to address the active visual 3D reconstruction problem in multi-agent networks. We first establish a statistical model of the visual 3D reconstruction problem in multi-agent networks. Then we propose a next-best-view selection scheme to find the best camera configuration in active reconstruction. Moreover, we propose a statistical evaluation criterion to substitute traditional laser-scanned-model-based methods to measure the reconstruction quality under certain camera configuration. Numerical results verify the effectiveness of our methods.
Fine-grained sketch-based image retrieval (FG-SBIR) addresses the problem of matching an input sketch with a specific photo containing the same instance. The key challenge of learning a FG-SBIR model is to bridge the ...
详细信息
Fine-grained sketch-based image retrieval (FG-SBIR) addresses the problem of matching an input sketch with a specific photo containing the same instance. The key challenge of learning a FG-SBIR model is to bridge the domain gap between photo and sketch. Most existing approaches build a joint embedding space where two domains can be directly compared. They only focus on the highly abstract features in final fully connected (FC) layer, ignore some low-level semantic concepts in convolutional layers. In this paper, we propose a multiple triplet-ranking model in FG-SBIR task. Specially, we introduce an auxiliary supervision loss function in the convolutional layer, and we use the fusion of features from convolutional layer and final FC layer to build the joint embedding space. Extensive experiments show that the proposed multiple triplet-ranking model significantly outperforms the state-of-the-art.
The risk of solitary death is rising because there is an increasing number of elderly living alone in Japan. Therefore, attempts are made to watch elderly remotely from his/her family. However, these systems have prob...
详细信息
ISBN:
(纸本)9781538644584
The risk of solitary death is rising because there is an increasing number of elderly living alone in Japan. Therefore, attempts are made to watch elderly remotely from his/her family. However, these systems have problems such as difficulty in confirming the status of the elderly in real time and privacy issues. In this paper, we propose a method to detect abnormal condition using infrared array sensor.
In recent years, with the rapid development of deep learning, single image super-resolution based on convolution neural network has achieved extensive research. However, most CNN-based method has difficulty in trainin...
In recent years, with the rapid development of deep learning, single image super-resolution based on convolution neural network has achieved extensive research. However, most CNN-based method has difficulty in training and obtaining high quality images for large scale factors. To address these issues, we propose a network, which reconstructs HR images at large factors by progressively performing 2× SR on the input from the previous level. At each level, cascaded residual multi-scale aggregation blocks are used. The U-residual unit in it makes network simplifier and training easier without performance degradation. The multi-scale dilated unit in it provides more comprehensive information for image reconstruction. Before upsampling, the channel attention mechanism is adopted to recalibrate features. We train the network with two-stage training strategy which could accelerate the convergence and achieve better performance. Experiment results show that our proposed method is superior to the state-of-the-art methods on most datasets, especially on Urban100.
To learn the optimal similarity function between probe and gallery images in Person re-identification, effective deep metric learning methods have been extensively explored to obtain discriminative feature embedding. ...
To learn the optimal similarity function between probe and gallery images in Person re-identification, effective deep metric learning methods have been extensively explored to obtain discriminative feature embedding. However, existing metric loss like triplet loss and its variants always emphasize pair-wise relations but ignore the distribution context in feature space, leading to inconsistency and sub-optimal. In fact, the similarity of one pair not only decides the match of this pair, but also has potential impacts on other sample pairs. In this paper, we propose a novel Distribution Context Aware (DCA) loss based on triplet loss to combine both numerical similarity and relation similarity in feature space for better clustering. Extensive experiments on three benchmarks including Market-1501, DukeMTMC-reID and MSMT17, evidence the favorable performance of our method against the corresponding baseline and other state-of-the-art methods.
The proceedings contain 96 papers. The special focus in this conference is on Cognitive Systems and Information processing. The topics include: RBF Network Imitation Posture Judgment Algorithm Based on Improved PSO;pa...
ISBN:
(纸本)9789811379857
The proceedings contain 96 papers. The special focus in this conference is on Cognitive Systems and Information processing. The topics include: RBF Network Imitation Posture Judgment Algorithm Based on Improved PSO;path Planning of Maritime Autonomous Surface Ships in Unknown Environment with Reinforcement Learning;Evaluation of sEMG-Based Feature Extraction and Effective Classification Method for Gait Phase Detection;computer-Based Attention Training Improves Brain Cognitive Control Function: Evidences from Event-Related Potentials;Comparison of Facial Emotion Recognition Based on imagevisual Features and EEG Features;a Measure of the Consciousness Revealed by the Cerebral Cortex Spontaneous Activity;multi-scale Neural Style Transfer Based on Deep Semantic Matching;a Method of Attitude Control Based on Deep Deterministic Policy Gradient;The Semaphore Identification and Fault Troubleshooting Modus for Spacecraft Originating from Deep Learning and RF Method;utilizing Chinese Dictionary Information in Named Entity Recognition;network Improved by Auxiliary Part Features for Person Re-identification;a Unified Framework of Deep Neural Networks by Capsules;reformative Vehicle License Plate Recognition Algorithm Based on Deep Learning;performance Comparison Between Genetic Fuzzy Tree and Reinforcement Learning in Gaming Environment;the Third Kind of Bayes’ Theorem Links Membership Functions to Likelihood Functions and Sampling Distributions;speech Signal Classification Based on Convolutional Neural Networks;two-Input Gegenbauer Orthogonal Neural Network with Growing-and-Pruning Weights and Structure Determination;a Lightweight Convolutional Neural Network for Silkworm Cocoons Fast Classification;a Novel Convolutional Neural Network for Facial Expression Recognition;Deep CNN-Based Radar Detection for Real Maritime Target Under Different Sea States and Polarizations.
暂无评论