Purely, data-driven large scale image classification has been achieved using various feature descriptors like SIFT, HOG etc. Major milestone in this regards is Convolutional Neural Networks (CNN) based methods which l...
详细信息
ISBN:
(纸本)9781467385640
Purely, data-driven large scale image classification has been achieved using various feature descriptors like SIFT, HOG etc. Major milestone in this regards is Convolutional Neural Networks (CNN) based methods which learn optimal feature descriptors as filters. Little attention has been given to the use of domain knowledge. Ontology plays an important role in learning to categorize images into abstract classes where there may not be a clear visual connect between category and image, for example identifying image mood - happy, sad and neutral. Our algorithm combines CNN and ontology priors to infer abstract patterns in indian Monument images. We use a transfer learning based approach in which, knowledge of domain is transferred to CNN while training (top down transfer) and inference is made using CNN prediction and ontology tree/priors (bottom up transfer). We classify images to categories like Tomb, Fort and Mosque. We demonstrate that our method improves remarkably over logistic classifier and other transfer learning approach. We conclude with a remark on possible applications of the model and note about scaling this to bigger ontology.
Automatic technique of 2D to 3D image conversion is proposed using manifold learning and sequential labeling which generates very reliable and accurate 3D depth maps that are very close to ground truth depths. In pape...
详细信息
ISBN:
(纸本)9781467385640
Automatic technique of 2D to 3D image conversion is proposed using manifold learning and sequential labeling which generates very reliable and accurate 3D depth maps that are very close to ground truth depths. In paper, LLE which is a non linear and neighborhood preserving embedding algorithm is used for depth estimation of a 2D image. And then, fixed point supervised learning algorithm is applied to construct consistent and smooth 3D output. the high dimensional data points or pixels of the input frames can be represented by a linear combination of its nearest neighbors and a lower dimensional point is reconstructed while preserving the local and geometric properties of the frames. the neighbors are assigned to each input point in the image data set and their weight vectors are computed that best linearly reconstruct the input point from its neighbors. To get the depth value of input point in new image, the reconstruction weights of its closest neighbors in training samples are multiplied withtheir corresponding ground truth depth values. the fixed point learning algorithm takes depths from manifold and other image features as input vectors and generates more consistent and accurate depthimages for better 3D conversion.
Human attention tends to get focused on the most prominent components of a scene which are in sharp contrast withthe background. these are termed as salient regions. Saliency is defined in terms of local and global f...
详细信息
ISBN:
(纸本)9781467385640
Human attention tends to get focused on the most prominent components of a scene which are in sharp contrast withthe background. these are termed as salient regions. Saliency is defined in terms of local and global feature contrasts. the human brain perceives an object of salient type based on its difference withthe surroundings in terms of color and texture. there have been many color based approaches in the past for salient object detection. In this paper, we define the uncertainty of a window being salient or background in terms of information extracted from different color components. the uncertainty associated withthe elements of a fuzzy set is described by a membership function, which gives the degree of association of each element to the set. the overall uncertainty is sought to be quantified by an entropy function. To locate the salient parts of the image, we make use of the entropy to compute a new set of features from color and luminance components of the image. Extensive comparisons withthe state-of-the-art methods in terms of precision, recall and F-Measure are made on a publicly available dataset to prove the effectiveness of this approach.
this paper presents an efficient combination of two well-known tracking algorithms, Tracking-Learning-Detection (TLD) and Compressive Tracking (CT) to devise an algorithm which takes advantages of both and outperforms...
详细信息
ISBN:
(纸本)9781467385640
this paper presents an efficient combination of two well-known tracking algorithms, Tracking-Learning-Detection (TLD) and Compressive Tracking (CT) to devise an algorithm which takes advantages of both and outperforms them on their short-ends by virtue of other. TLD fails in cases including full out-of-plane rotation, fast motion and articulated object tracking. While CT fails in resuming tracking once the object leaves the frame and comes back. We propose a combining algorithm mentioned as Algorithm 1, which robustly handles all the tracking challenges. Different thresholds are set which can be varied to weigh each component as required. the proposed algorithm is tested on different test sequences involving challenging tracking scenarios such as fast motion and their success rates are calculated in Table I. the proposed algorithm works favourably against both algorithms in terms of robustness and success rate.
Fine-grained visual classification has been considered for image data in various domains of environmental importance such as birds, animals and plants. this work considers the classification problem of the latter, bas...
详细信息
ISBN:
(纸本)9781467385640
Fine-grained visual classification has been considered for image data in various domains of environmental importance such as birds, animals and plants. this work considers the classification problem of the latter, based on the leaf shape. Traditional works in such areas typically propose better features, or sophisticated classification frameworks. In this work, we ask a different question: Given simple and efficient features, and a well-known binary classifier such as support vector machine (SVM), among various strategies, what may be a good way to pose the multi-class classification problem as multiple binary classifications ? In this respect, we compare three different strategies, all of which use the same set of features. From our results, we conclude that, one of these three approaches, based on hierarchical class-grouping, clearly outperforms the others, with high classification accuracy. this suggest that classification strategy is an important aspect for the given features and classifiers. To our knowledge, such a study in the fine-grained classification area (and particularly for the nascent area of leafclassification), has not yet been explored.
Underwater images suffer from non uniform contrast and poor visibility due to bad illumination and color cast in deep water. Such images have a hazy and color diminished appearance making underwater studies a difficul...
详细信息
ISBN:
(纸本)9781467385640
Underwater images suffer from non uniform contrast and poor visibility due to bad illumination and color cast in deep water. Such images have a hazy and color diminished appearance making underwater studies a difficult task. Researches in last decades performed color correction, assuming that underwater images have bluish color cast which is not always true. In this paper, a new image enhancement approach is proposed which modifies the gray world algorithm by finding the color cast using fuzzy logic and then removing the color cast by optimizing the correction method using Bacterial Foraging Optimisation (BFO). Proposed approach is adaptive in nature as it finds the intensity of color cast instead of assuming it which improves the quality of underwater images. Computed results have enhanced visual details, contrast and color performance.
Banknote identification systems, withtheir wide applications in Automated Teller Machines (ATMs), vending machines and currency recognition aids for the visually impaired, are one of the most widely researched fields...
详细信息
ISBN:
(纸本)9781467385640
Banknote identification systems, withtheir wide applications in Automated Teller Machines (ATMs), vending machines and currency recognition aids for the visually impaired, are one of the most widely researched fields today. the present paper proposes a novel technique for recognition of indian currency banknotes by adopting a modular approach. the proposed work extracts distinct and unique features of indian currency notes such as central numeral, RBI seal, colour band and identification mark for the visually impaired and employs algorithms optimized for the detection of each specific feature. the proposed technique has been evaluated over a large data set for recognition of indian banknotes of various denominations and physical conditions including new notes, wrinkled notes and non-uniform illumination. thorough analysis yields a high true positive rate (desired feature identified correctly) of 95.11% and a low false positive rate (undesired feature recognition minimized) of 0.09765% for emblem recognition, an accuracy of 97.02% for central numeral detection, and 100% accuracies for both recognition of identification mark and colour matching in CIE LAB colour space.
We propose a method to address the problem of Video Summarization, which aims to generate a summarized video by preserving the salient activities of the input video for a user specified time. We model the motion of a ...
详细信息
ISBN:
(纸本)9781467385640
We propose a method to address the problem of Video Summarization, which aims to generate a summarized video by preserving the salient activities of the input video for a user specified time. We model the motion of a feature points as Gaussian Mixture Model (GMM) to select the key feature points, which in-turn estimate the salient frames. the saliency of feature points depends on the contribution of motion in entire video and user specified time duration of summary. We generate a summarized video keeping chronology of salient frames to avoid the viewing ambiguity for the viewers. We demonstrate the proposed method for different stored surveillance videos and achieve retention ratio as 1 for the closest condensation ratio obtained for stroboscopic approach and also demonstrate the proposed GMM method with interactively selected region of interest (ROI) based results.
Hand Gesture Recognition is one of the natural ways of human computer interaction (HCI) which has wide range of technological as well as social applications. A dynamic hand gesture can be characterized by its shape, p...
详细信息
ISBN:
(纸本)9781467385640
Hand Gesture Recognition is one of the natural ways of human computer interaction (HCI) which has wide range of technological as well as social applications. A dynamic hand gesture can be characterized by its shape, position and movement. this paper presents a user independent framework for dynamic hand gesture recognition in which a novel algorithm for extraction of key frames is proposed. this algorithm is based on the change in hand shape and position, to find out the most important and distinguishing frames from the video of the hand gesture, using certain parameters and dynamic threshold. For classification, Multiclass Support Vector Machine (MSVM) is used. Experiments using the videos of hand gestures of indian Sign Language show the effectiveness of the proposed system for various dynamic hand gestures. the use of key frame extraction algorithm speeds up the system by selecting essential frames and therefore eliminating extra computation on redundant frames.
In this paper we address the problem of hole filling in a point cloud of 3D object. Even with most popular 3D scanning devices like Microsoft Kinect and Time of Flight (ToF) cameras, occlusions during the scanning pro...
详细信息
ISBN:
(纸本)9781467385640
In this paper we address the problem of hole filling in a point cloud of 3D object. Even with most popular 3D scanning devices like Microsoft Kinect and Time of Flight (ToF) cameras, occlusions during the scanning process result in occurrence of missing regions or holes in 3D data. We propose a framework for hole filling in a point cloud of 3D object using Riemannian metric tensor and Christoffel symbols as a set of geometric features, which capture the inherent geometry of the 3D object. the framework involves detection and extraction of the boundary points surrounding the hole, decomposition of boundary points into basic shapes and selective surface interpolation to fill the hole. We demonstrate the performance of the proposed method on point clouds with different complexities and sizes for both synthetically generated holes and real missing regions during the capturing process on 3D models of heritage sites.
暂无评论