Saliency plays a key role in various computervision tasks. Extracting salient regions from images and videos have been a well established problem of computervision. While segmenting salient objects from images depen...
详细信息
We propose two novel approaches to classify indian monuments according to their distinct architectural styles. While the historical significance of most indian monuments is well documented, the details of their archit...
详细信息
image co-segmentation is jointly segmenting two or more images sharing common foreground objects. In this paper, we propose a novel graph convolution neural network (graph CNN) based end-to-end model for performing co...
详细信息
ISBN:
(纸本)9781450366151
image co-segmentation is jointly segmenting two or more images sharing common foreground objects. In this paper, we propose a novel graph convolution neural network (graph CNN) based end-to-end model for performing co-segmentation. At the beginning, each input image is over-segmented into a set of superpixels. Next, a weighted graph is formed using the over-segmented images exploiting spatial adjacency and both intra-image and inter-image feature similarities among the image superpixels (nodes). Subsequently, the proposed network, consisting of graph convolution layers followed by node classification layers, classifies each superpixel either into the common foreground or its complement. During training, along withthe co-segmentation network, an additional network is introduced to exploit the corresponding semantic labels, and the two networks share the same weights in graph convolution layers. the whole model is learned in an end-to-end fashion using a novel cost function comprised of a superpixel wise binary cross entropy and a multi-label cross entropy. We also use empirical class probabilities in the loss function to deal with class imbalance. Experimental results reflect that the proposed technique is very competitive withthe state-of-the-art methods on two challenging datasets, Internet and Pascal-VOC.
Popular events are often video recorded simultaneously by a general crowd using smartphones. In the present work, we propose a robust recurrent neural network (RNN) based approach for geo-localizing these events using...
详细信息
ISBN:
(纸本)9781450366151
Popular events are often video recorded simultaneously by a general crowd using smartphones. In the present work, we propose a robust recurrent neural network (RNN) based approach for geo-localizing these events using sensor data collected by user smartphones while recording such events. For this task we use GPS and compass sensors, which are commonly available on the smartphones. the circular nature (modulo 2p) of the orientation data from compass limits the ability of the classical neural networks (NN) to geo-localize these events. We mitigate this issue by incorporating circular nodes in our network and show the performance improvements. We train the proposed NN model using simulated data and apply it directly on real data. We train several RNN models using this strategy and show our analyses. the proposed work outperforms all previous approaches in terms of event geo-localization accuracy.
Phase unwrapping is an important problem in several applications that attempts to restore original phase from wrapped phase. In this paper, we propose a novel phase unwrapping model based on the deep convolutional neu...
详细信息
ISBN:
(纸本)9781450366151
Phase unwrapping is an important problem in several applications that attempts to restore original phase from wrapped phase. In this paper, we propose a novel phase unwrapping model based on the deep convolutional neural network by formulating the phase unwrapping as a semantic segmentation problem. the proposed architecture consists of a convolutional encoder network and corresponding decoder network followed by a pixel-wise classification layer. One of the critical challenges in DCNN is availability of large set of labeled training data. this issue is effectively circumvented for the proposed framework through a generic simulation procedure that automatically generates large labeled data. Results from the proposed method are compared with widely used quality-guided phase unwrapping algorithm for various SNR values. It is found that the proposed method is performing well both in terms of accuracy and computational time, even in the presence strong noise. To the best of our knowledge, this is the first work that uses convolutional neural network for phase unwrapping, and this will hopefully pave the way to a new class of techniques for unwrapping the phase.
In this paper, we propose a pipeline for generating a 2D floorplan using depth cameras. In our pipeline we use an existing approach to recovering the camera motion trajectories from the depth and RGB sequences. Given ...
详细信息
ISBN:
(纸本)9781450366151
In this paper, we propose a pipeline for generating a 2D floorplan using depth cameras. In our pipeline we use an existing approach to recovering the camera motion trajectories from the depth and RGB sequences. Given these motion estimates we construct a full 3D representation of the scanned indoor spaces. For generating a floorplan we need to abstract the large volumes of registered 3D data into a simplified rectilinear representation. We evaluate two approaches to solve this problem, viz slicing the reconstructed volume at a given height and direct segmentation of the 3D point cloud representation into individual planar segments. We also note that the fidelity of our estimated floorplan crucially depends on the accuracy of the estimation of the ground plane orientation. We examine the comparative accuracies of two ground plane estimation methods for each of the above mentioned approaches to rectilinear abstraction. Given the line drawing abstractions of the individual rooms, we merge them into a consistent floorplan. We present results on a real-world floorplan estimation problem and demonstrate its accuracy. Additionally, the implications of errors in the individual components of our pipeline are also studied.
Bayesian Sparse Signal Recovery (SSR) for Multiple Measurement Vectors, when elements of each row of solution matrix are correlated, is addressed in the paper. We propose a standard linear Gaussian observation model a...
详细信息
ISBN:
(纸本)9781450366151
Bayesian Sparse Signal Recovery (SSR) for Multiple Measurement Vectors, when elements of each row of solution matrix are correlated, is addressed in the paper. We propose a standard linear Gaussian observation model and a three-level hierarchical estimation framework, based on Gaussian Scale Mixture (GSM) model with some random and deterministic parameters, to model each row of the unknown solution matrix. this hierarchical model induces heavy-tailed marginal distribution over each row which encompasses several choices of distributions viz. Laplace distribution, Student's t distribution and Jeffery prior. Automatic Relevance Determination (ARD) phenomenon introduces sparsity in the model. It is interesting to see that Block Sparse Bayesian Learning framework is a special case of the proposed framework when induced marginal is Jeffrey prior. Experimental results for synthetic signals are provided to demonstrate its effectiveness. We also explore the possibility of using Multiple Measurement Vectors to model Dynamic Hand Posture Database which consists of sequence of temporally correlated hand posture sequence. It can be seen that by exploiting temporal correlation information present in the successive image samples, the proposed framework can reconstruct the data with less linear random measurements with high fidelity.
In this paper, a new approach is proposed for the detection of JPEG anti-forensic operations. It is based on the fact that when a JPEG anti-forensic operation is applied, the values of DCT coefficients are changed. th...
详细信息
the contour tree represents the topology of level sets of a scalar function. Nodes of the tree correspond to critical level sets and arcs of the tree represent a collection of topologically equivalent level sets conne...
详细信息
ISBN:
(纸本)9781450366151
the contour tree represents the topology of level sets of a scalar function. Nodes of the tree correspond to critical level sets and arcs of the tree represent a collection of topologically equivalent level sets connecting two critical level sets. the augmented contour tree contains degree-2 nodes on the arcs that represent regular level sets. the degree-2 nodes correspond to regular points of the scalar function and other critical points that do not affect the number of level set components. the augmented contour tree is significantly larger in size and requires more effort to compute when compared to the contour tree. Applications of the contour tree to data exploration and visualization require the augmented contour tree. Current approaches propose algorithms to compute the contour tree and the augmented contour tree from scratch. Precomputing and storing the large augmented contour tree will not be necessary if the contour tree can be augmented on-demand. this paper poses the problem of computing the augmented contour tree given a contour tree as input. Computational experiments demonstrate that the on-demand augmentation can be computed fast while resulting in good memory savings.
Haze during the bad weather, degrades the visibility of the scene drastically. Degradation of scene visibility varies with respect to the transmission coefficient/map (T-c) of the scene. Estimation of accurate T-c is ...
详细信息
ISBN:
(纸本)9781450366151
Haze during the bad weather, degrades the visibility of the scene drastically. Degradation of scene visibility varies with respect to the transmission coefficient/map (T-c) of the scene. Estimation of accurate T-c is key step to reconstruct the haze free scene. Previously, local as well as global priors were proposed to estimate the T-c. We, on the other hand, propose integration of local and global approaches to learn both point level and object level T-c. the proposed local encoder decoder network (LEDNet) estimates the scene transmission map in two stages. During first stage, network estimates the point level T-c using parallel convolutional filters and spatial invariance filtering. the second stage comprises of a two level encoder-decoder architecture which anticipates the object level T-c. We also propose, local air-light estimation (LAE) algorithm, which is able to obtain the air-light component of the outdoor scene. Combination of LEDNet and LAE improves the accuracy of haze model to recover the scene radiance. Structural similarity index, mean square error and peak signal to noise ratio are used to evaluate the performance of the proposed approach for single image haze removal. Experiments on benchmark datasets show that LEDNet outperforms the existing state-of-the-art methods for single image haze removal.
暂无评论