Recent trends in image segmentation algorithms have shown various large scale networks with impressive performance for natural scene images. However most of the networks come with costly overheads such has large memor...
详细信息
ISBN:
(纸本)9781450366151
Recent trends in image segmentation algorithms have shown various large scale networks with impressive performance for natural scene images. However most of the networks come with costly overheads such has large memory requirements or dependence on huge number of parallel processing units. In most cases costly graphicsprocessing units or GPUs are used to boost computational capability. However for creating products in the real world we need to consider speed, performance as well as cost of deployment. We propose a novel "spark" module which is a combination of the "fire" module of SqueezeNet and depth-wise separable convolutions. Along withthis modified SqueezeNet as an encoder we also propose the use of depth-wise separable transposed convolution for a decoder. the resultant encoder-decoder network has approximately 49 times lesser number of parameters than SegNet and almost 223 times lesser number of parameters than fully convolutional networks(FCN). Even in a CPU the network completes a forward pass for a single sample in approximately 0.39 seconds which is almost 5.1 times faster as compared to SegNet and almost 8.7 times faster compared to FCN.
Optimization of the tradeoff between computation time and image quality is essential for reconstructing high-quality magnetic resonance image (MRI) from a limited number of acquired samples in a short time using compr...
详细信息
ISBN:
(纸本)9781450366151
Optimization of the tradeoff between computation time and image quality is essential for reconstructing high-quality magnetic resonance image (MRI) from a limited number of acquired samples in a short time using compressed sensing (CS) algorithms. In this paper, we achieve this for the edge preserving non-linear diffusion reconstruction (NLDR) which eliminates the critical step-size tuning of the total variation (TV) based CS-MRI. Based on optimization of contrast parameter that controls noise and signal in sensitivity modulated channel images, we propose an a-switching NLDR technique for a faster approximation of reconstruction image without affecting the image quality. Proposed algorithm exploits the difference in the extent of undersampling artifacts in signal-background regions of the channel images to arrive at different estimates of contrast parameter, leading to an effective optimization of speed and quality. While maintaining better image quality as compared to conventional TV reconstruction, the switched NLDR also achieves 25-35% gain in convergence time over NLDR without switching. this makes the switched NLDR a better candidate for fast reconstruction over traditional TV and NLDR approaches. In the detailed numerical experiments, we have compared and optimized the tradeoff for various state-of-the-art choices of contrast parameter.
For challenging visual recognition tasks such as scene classification and object detection there is a need to bridge the semantic gap between low-level features and the semantic concept descriptors. this requires mapp...
详细信息
Monocular SLAM refers to using a single camera to estimate robot ego motion while building a map of the environment. While Monocular SLAM is a well studied problem, automating Monocular SLAM by integrating it with tra...
详细信息
ISBN:
(纸本)9781450366151
Monocular SLAM refers to using a single camera to estimate robot ego motion while building a map of the environment. While Monocular SLAM is a well studied problem, automating Monocular SLAM by integrating it with trajectory planning frameworks is particularly challenging. this paper presents a novel formulation based on Reinforcement Learning (RL) that generates fail safe trajectories wherein the SLAM generated outputs do not deviate largely from their true values. Quintessentially, the RL framework successfully learns the otherwise complex relation between perceptual inputs and motor actions and uses this knowledge to generate trajectories that do not cause failure of SLAM. We show systematically in simulations how the quality of the SLAM dramatically improves when trajectories are computed using RL. Our method scales effectively across Monocular SLAM frameworks in both simulation and in real world experiments with a mobile robot.
In this work a memory efficient topological map generation algorithm has been proposed using local descriptors. A topological map is a graphical data structure where each node signifies an area within an environment. ...
详细信息
ISBN:
(纸本)9781450366151
In this work a memory efficient topological map generation algorithm has been proposed using local descriptors. A topological map is a graphical data structure where each node signifies an area within an environment. these nodes are connected by links which ensure the presence of a physical path between the pair. Experiments have been conducted with feature descriptors using a vocabulary based approach. these approaches take huge memory and time. To deal withthese a KD-tree based map generation algorithm has been proposed where each node in the tree stores a descriptor and a table of occurrence. this table stores node ids of the locations, where the corresponding descriptor is present. the map generation algorithm is a two-stage algorithm. In the first stage, the visual similarity based position identification is conducted in order to check for loop-closures. It is followed by a corrective step on validating the decision of loop closure, if any. the table of occurrence keeps track of presence of each descriptor. the least occurring descriptors are pruned at regular intervals, making the algorithm memory-efficient. the approach has been experimented with several benchmark datasets.
In this paper we present a novel methodology for recognizing human activity in Egocentric video based on the Bag of Visual Features. the proposed technique is based on the assumption that, only a portion of the whole ...
详细信息
ISBN:
(纸本)9781450366151
In this paper we present a novel methodology for recognizing human activity in Egocentric video based on the Bag of Visual Features. the proposed technique is based on the assumption that, only a portion of the whole video can be sufficient to identify an activity. Rather, we argue that, for activity recognition in egocentric videos, the proposed approach performs better than any deep learning based method. Because, in egocentric videos, often the person wiring the sensor, becomes static for long time, or moves his head frequently. In boththe cases, it becomes difficult to learn the spatio-temporal pattern of the video during action. the proposed approach divides the video into smaller video segments called Video Units. Spatio-temporal features extracted from the units, are clustered to construct the dictionary of Action Units (AU). the AUs are ranked based upon their score of likeliness. the scores are obtained by constructing a weighted graph withthe AUs as vertices and edge weights calculated based on the frequencies of occurrences of the AUs during the activity. the less significant AUs are pruned out from the dictionary, and the revised dictionary of key AUs are used for activity *** test our approach on benchmark egocentric dataset and achieve a good accuracy.
Person re-identification has great applications in video surveillance. It can be viewed as recognizing the same person across non-overlapping cameras. Video-based person re-identification methods are gaining increased...
详细信息
ISBN:
(纸本)9781450366151
Person re-identification has great applications in video surveillance. It can be viewed as recognizing the same person across non-overlapping cameras. Video-based person re-identification methods are gaining increased attention due to the better discriminative nature of spatio-temporal feature representations. Current video-based methods make use of RNN to extract temporal information. In this paper, we propose a novel Moving Average Recurrent Neural Network (MA-RNN) model that can build a strong feature representation by taking both previous and present inputs at each time stamp. Specifically, here the recurrent layer produces a better sequential information by looking back directly in to the past values where as general RNNs has only an indirect dependence on the previous values in the form of hidden-state information. the proposed model is tested on two publicly available datasets: iLIDS-VID and PRID-2011 and it performed better in comparison withthe state-of-the-art methods with a significant margin. We also analyze the effect of the depth of previous input dependence of the MA-RNN model on the matching accuracy.
Super-resolution (SR) is a technique to improve the resolution of an image from a sequence of input images or from a single image. As SR is an ill-posed inverse problem, it leads to many suboptimal solutions. Since mo...
详细信息
Face Recognition (FR) using Convolutional Neural Network (CNN) based models have achieved considerable success in constrained environments. they however fail to perform well in unconstrained scenarios, especially when...
详细信息
ISBN:
(纸本)9781450366151
Face Recognition (FR) using Convolutional Neural Network (CNN) based models have achieved considerable success in constrained environments. they however fail to perform well in unconstrained scenarios, especially when the images are captured using surveillance cameras. these probe samples suffer from degradations such as noise, poor illumination, low resolution, blur as well as aliasing, when compared to the rich training (gallery) set, comprising mostly of mugshot images captured in laboratory settings. these images in the training (gallery) set are crisp and have high contrast, compared to the probe samples. To cope withthis scenario, we propose a novel dual-pathway generative adversarial network (DP-GAN) which maps low resolution images captured using surveillance camera into their corresponding high resolution images, which are gallery-like, using a novel combination of multi-scale reconstruction and Jensen-Shannon divergence based loss. these images thus obtained are then used to train a deep domain adaptation (deep-DA) network to perform the task of FR. the proposed network achieves superior results (>90%) on four benchmark surveillance face datasets, evident from the rank-1 recognition rates when compared with recent state-of-the-art CNN-based techniques.
Super-resolving a noisy image is a challenging problem, and needs special care as compared to the conventional super resolution approaches, when the power of noise is unknown. In this scenario, we propose an approach ...
详细信息
暂无评论