Optimization of the tradeoff between computation time and image quality is essential for reconstructing high-quality magnetic resonance image (MRI) from a limited number of acquired samples in a short time using compr...
详细信息
ISBN:
(纸本)9781450366151
Optimization of the tradeoff between computation time and image quality is essential for reconstructing high-quality magnetic resonance image (MRI) from a limited number of acquired samples in a short time using compressed sensing (CS) algorithms. In this paper, we achieve this for the edge preserving non-linear diffusion reconstruction (NLDR) which eliminates the critical step-size tuning of the total variation (TV) based CS-MRI. Based on optimization of contrast parameter that controls noise and signal in sensitivity modulated channel images, we propose an a-switching NLDR technique for a faster approximation of reconstruction image without affecting the image quality. Proposed algorithm exploits the difference in the extent of undersampling artifacts in signal-background regions of the channel images to arrive at different estimates of contrast parameter, leading to an effective optimization of speed and quality. While maintaining better image quality as compared to conventional TV reconstruction, the switched NLDR also achieves 25-35% gain in convergence time over NLDR without switching. This makes the switched NLDR a better candidate for fast reconstruction over traditional TV and NLDR approaches. In the detailed numerical experiments, we have compared and optimized the tradeoff for various state-of-the-art choices of contrast parameter.
Nowadays, machine learning has become one of the basic technologies used in solving various computervision tasks such as feature detection, image segmentation, object recognition and tracking. In many applications, v...
详细信息
ISBN:
(数字)9781510619425
ISBN:
(纸本)9781510619425
Nowadays, machine learning has become one of the basic technologies used in solving various computervision tasks such as feature detection, image segmentation, object recognition and tracking. In many applications, various complex systems such as robots are equipped with visual sensors from which they learn state of surrounding environment by solving corresponding computervision tasks. Solutions of these tasks are used for making decisions about possible future actions. It is not surprising that when solving computervision tasks we should take into account special aspects of their subsequent application in model-based predictive control. Reinforcement learning is one of modern machine learning technologies in which learning is carried out through interaction with the environment. In recent years, Reinforcement learning has been used both for solving such applied tasks as processing and analysis of visual information, and for solving specific computervision problems such as filtering, extracting image features, localizing objects in scenes, and many others. The paper describes shortly the Reinforcement learning technology and its use for solving computervision problems.
In this work a memory efficient topological map generation algorithm has been proposed using local descriptors. A topological map is a graphical data structure where each node signifies an area within an environment. ...
详细信息
ISBN:
(纸本)9781450366151
In this work a memory efficient topological map generation algorithm has been proposed using local descriptors. A topological map is a graphical data structure where each node signifies an area within an environment. These nodes are connected by links which ensure the presence of a physical path between the pair. Experiments have been conducted with feature descriptors using a vocabulary based approach. These approaches take huge memory and time. To deal with these a KD-tree based map generation algorithm has been proposed where each node in the tree stores a descriptor and a table of occurrence. This table stores node ids of the locations, where the corresponding descriptor is present. The map generation algorithm is a two-stage algorithm. In the first stage, the visual similarity based position identification is conducted in order to check for loop-closures. It is followed by a corrective step on validating the decision of loop closure, if any. The table of occurrence keeps track of presence of each descriptor. The least occurring descriptors are pruned at regular intervals, making the algorithm memory-efficient. The approach has been experimented with several benchmark datasets.
Monocular SLAM refers to using a single camera to estimate robot ego motion while building a map of the environment. While Monocular SLAM is a well studied problem, automating Monocular SLAM by integrating it with tra...
详细信息
ISBN:
(纸本)9781450366151
Monocular SLAM refers to using a single camera to estimate robot ego motion while building a map of the environment. While Monocular SLAM is a well studied problem, automating Monocular SLAM by integrating it with trajectory planning frameworks is particularly challenging. This paper presents a novel formulation based on Reinforcement Learning (RL) that generates fail safe trajectories wherein the SLAM generated outputs do not deviate largely from their true values. Quintessentially, the RL framework successfully learns the otherwise complex relation between perceptual inputs and motor actions and uses this knowledge to generate trajectories that do not cause failure of SLAM. We show systematically in simulations how the quality of the SLAM dramatically improves when trajectories are computed using RL. Our method scales effectively across Monocular SLAM frameworks in both simulation and in real world experiments with a mobile robot.
Accurate and robust visual object tracking is one of the most challenging computervision problems. Recently, discriminative correlation filter trackers have shown promising results on benchmark datasets with continuo...
详细信息
ISBN:
(纸本)9781450366151
Accurate and robust visual object tracking is one of the most challenging computervision problems. Recently, discriminative correlation filter trackers have shown promising results on benchmark datasets with continuous performance improvements in tracking accuracy and robustness. Still, these algorithms fail to track as the target object and background conditions undergo drastic changes over time. They are also incapable to resume tracking once the target is lost, limiting the ability to track long term. The proposed BoVW-CFT is a classifier-based generic technique to handle tracking uncertainties in correlation filter trackers. Tracking failures in correlation trackers are automatically identified and an image classifier with training, testing and online update stages is proposed as detector in the tracking scenario using Bag of VisualWords (BoVW) features. The proposed detector falls under the parts based model and is quite well suited in the tracking framework. Further, the online training stage in the proposed framework with updated model or training samples, incorporates temporal information, helping to detect rotated, blurred and scaled versions of the target. On detecting a target loss in the correlation tracker, the trained classifier, referred to as detector, is invoked to re-initialize the tracker with the actual target location. Therefore, for each tracking uncertainty, two output patches are obtained, one each from the base tracker and the classifier. The final target location is estimated using the normalized cross-correlation with the initial target patch. The method has the advantages of mitigating the model drift in correlation trackers and learns a robust model that tracks long term. Extensive experimental results demonstrate an improvement of 4.1% in the expected overlap, 1.86% in accuracy and 15.46% in robustness on VOT2016 and 1.82% in overlap precision, 2.32% in AUC and 2.87% in success rates on OTB100.
In this paper we present a novel methodology for recognizing human activity in Egocentric video based on the Bag of Visual Features. The proposed technique is based on the assumption that, only a portion of the whole ...
详细信息
ISBN:
(纸本)9781450366151
In this paper we present a novel methodology for recognizing human activity in Egocentric video based on the Bag of Visual Features. The proposed technique is based on the assumption that, only a portion of the whole video can be sufficient to identify an activity. Rather, we argue that, for activity recognition in egocentric videos, the proposed approach performs better than any deep learning based method. Because, in egocentric videos, often the person wiring the sensor, becomes static for long time, or moves his head frequently. In both the cases, it becomes difficult to learn the spatio-temporal pattern of the video during action. The proposed approach divides the video into smaller video segments called Video Units. Spatio-temporal features extracted from the units, are clustered to construct the dictionary of Action Units (AU). The AUs are ranked based upon their score of likeliness. The scores are obtained by constructing a weighted graph with the AUs as vertices and edge weights calculated based on the frequencies of occurrences of the AUs during the activity. The less significant AUs are pruned out from the dictionary, and the revised dictionary of key AUs are used for activity *** test our approach on benchmark egocentric dataset and achieve a good accuracy.
This article presents an algorithm for salient object detection by leveraging the Bayesian surprise of the Restricted Boltzmann Machine (RBM). Here an RBMis trained on patches sampled randomly from the input image. Du...
详细信息
ISBN:
(纸本)9781450366151
This article presents an algorithm for salient object detection by leveraging the Bayesian surprise of the Restricted Boltzmann Machine (RBM). Here an RBMis trained on patches sampled randomly from the input image. Due to this random sampling, the RBM is likely to get more exposed to background patches than that of the object. Thus, the trained RBM will minimize the free energy of its hidden states with respect to the background patches as opposed to the object. This, according to the free energy principle, implies minimizing Bayesian surprise which is a measure for saliency based on Kullback Leibler divergence between the input and reconstructed patch distribution. Hence, when the trained RBM is exposed to patches from the object region, it would have high divergence and in turn a high Bayesian surprise. Thus such pixels with high Bayesian surprise could be considered as salient pixels. For each pixel, a neighborhood (with the same size of training patch) is considered and is fed to the trained RBM to obtain the reconstructed patch. Thereafter, the Kullback Leibler divergence between the input and reconstructed neighborhood of each pixel is computed to measure the Bayesian surprise and is stored in the corresponding position in a matrix to form the saliency map. Experiments are carried out on three datasets namelyMSRA-10K, ECSSD and DUTS. The results obtained depict promising performance by the proposed approach.
Person re-identification has great applications in video surveillance. It can be viewed as recognizing the same person across non-overlapping cameras. Video-based person re-identification methods are gaining increased...
详细信息
ISBN:
(纸本)9781450366151
Person re-identification has great applications in video surveillance. It can be viewed as recognizing the same person across non-overlapping cameras. Video-based person re-identification methods are gaining increased attention due to the better discriminative nature of spatio-temporal feature representations. Current video-based methods make use of RNN to extract temporal information. In this paper, we propose a novel Moving Average Recurrent Neural Network (MA-RNN) model that can build a strong feature representation by taking both previous and present inputs at each time stamp. Specifically, here the recurrent layer produces a better sequential information by looking back directly in to the past values where as general RNNs has only an indirect dependence on the previous values in the form of hidden-state information. The proposed model is tested on two publicly available datasets: iLIDS-VID and PRID-2011 and it performed better in comparison with the state-of-the-art methods with a significant margin. We also analyze the effect of the depth of previous input dependence of the MA-RNN model on the matching accuracy.
Face Recognition (FR) using Convolutional Neural Network (CNN) based models have achieved considerable success in constrained environments. They however fail to perform well in unconstrained scenarios, especially when...
详细信息
ISBN:
(纸本)9781450366151
Face Recognition (FR) using Convolutional Neural Network (CNN) based models have achieved considerable success in constrained environments. They however fail to perform well in unconstrained scenarios, especially when the images are captured using surveillance cameras. These probe samples suffer from degradations such as noise, poor illumination, low resolution, blur as well as aliasing, when compared to the rich training (gallery) set, comprising mostly of mugshot images captured in laboratory settings. These images in the training (gallery) set are crisp and have high contrast, compared to the probe samples. To cope with this scenario, we propose a novel dual-pathway generative adversarial network (DP-GAN) which maps low resolution images captured using surveillance camera into their corresponding high resolution images, which are gallery-like, using a novel combination of multi-scale reconstruction and Jensen-Shannon divergence based loss. These images thus obtained are then used to train a deep domain adaptation (deep-DA) network to perform the task of FR. The proposed network achieves superior results (>90%) on four benchmark surveillance face datasets, evident from the rank-1 recognition rates when compared with recent state-of-the-art CNN-based techniques.
暂无评论