Compact representation of visual content has emerged as an important topic in the context of large scale image/video retrieval. the recently proposed Vector of Locally Aggregated Descriptors (VLAD) has shown to outper...
详细信息
ISBN:
(纸本)9781479915880
Compact representation of visual content has emerged as an important topic in the context of large scale image/video retrieval. the recently proposed Vector of Locally Aggregated Descriptors (VLAD) has shown to outperform other existing techniques for retrieval. In this paper, we propose two spatio-temporal features for constructing VLAD vectors for videos in the context of large scale video retrieval. Given a particular query video, our aim is to retrieve similar videos from the database. Experiments are conducted on UCF50 and HMDB51 datasets, which pose challenges in the form of camera motion, view-point variation, large intra-class variation, etc. the paper proposes the following two spatio-temporal features for constructing VLADs i) Local Histogram of Oriented Optical Flow (LHOOF), and ii) Space-Time Invariant Points (STIP). the performance of these proposed features are compared with SIFT based spatial feature. the mean average precision (MAP) indicates the better retrieval performance of the proposed spatio-temporal feature over spatial feature.
the use of artificial intelligence has made life easier for farmers, as analysis of agricultural data allows farmers to make informed decisions, supported by large data sets and processed by machine learning algorithm...
详细信息
Stamps and logos are generally used for authenticating the source of a document. For automatic document processing, identification and segmentation of stamps and logos are essential. In the past, methods to detect sta...
详细信息
ISBN:
(纸本)9781467385640
Stamps and logos are generally used for authenticating the source of a document. For automatic document processing, identification and segmentation of stamps and logos are essential. In the past, methods to detect stamps and logos were limited to specific shapes, colors, or training data. However, stamps and logos can be of any shape or color. In this paper, we have proposed a novel stamp and logo detection technique. Our approach is based on the fact that stamps and logos, in general, are not the primary contents of a document. this fact motivates us to propose an outlier detection technique for the same purpose in a feature space. Based on some geometric features, the detected outliers are classified as stamps and logos. Our method shows good performance in case of separating them from text. Moreover, this technique is capable of detecting logos as well as chromatic and achromatic stamps.
One of the important requirements for a good object detector is a set of robust visual features. these features extracted from the reference images containing the desired object instance will be used to identify the o...
详细信息
ISBN:
(纸本)9781467385640
One of the important requirements for a good object detector is a set of robust visual features. these features extracted from the reference images containing the desired object instance will be used to identify the objects from the test images. In this paper, we propose a new feature set for object detection, called the Histogram of Radon Projections (HRP). To compute this feature descriptor, the image is first divided into smaller cells and for each cell, the Radon transform values are calculated for different orientations and weighted votes for each transform coefficient are accumulated into bins. these bin values are block-normalized and collected together to get the final descriptor. We use this descriptor for car detection using gray-scale images and pedestrian detection using RGB images. the performance of this descriptor is compared withthat of HOG and it is found that the new descriptor performs better for both gray-scale and RGB images.
Purely, data-driven large scale image classification has been achieved using various feature descriptors like SIFT, HOG etc. Major milestone in this regards is Convolutional Neural Networks (CNN) based methods which l...
详细信息
ISBN:
(纸本)9781467385640
Purely, data-driven large scale image classification has been achieved using various feature descriptors like SIFT, HOG etc. Major milestone in this regards is Convolutional Neural Networks (CNN) based methods which learn optimal feature descriptors as filters. Little attention has been given to the use of domain knowledge. Ontology plays an important role in learning to categorize images into abstract classes where there may not be a clear visual connect between category and image, for example identifying image mood - happy, sad and neutral. Our algorithm combines CNN and ontology priors to infer abstract patterns in indian Monument images. We use a transfer learning based approach in which, knowledge of domain is transferred to CNN while training (top down transfer) and inference is made using CNN prediction and ontology tree/priors (bottom up transfer). We classify images to categories like Tomb, Fort and Mosque. We demonstrate that our method improves remarkably over logistic classifier and other transfer learning approach. We conclude with a remark on possible applications of the model and note about scaling this to bigger ontology.
Lung tumor estimation on imaging modalities is required to assess the extent of the tumor for diagnosis. Segmentation of tumor in Cone-Beam Computed Tomography (CBCT) images is non-trivial due to its imaging artifacts...
详细信息
ISBN:
(纸本)9781467385640
Lung tumor estimation on imaging modalities is required to assess the extent of the tumor for diagnosis. Segmentation of tumor in Cone-Beam Computed Tomography (CBCT) images is non-trivial due to its imaging artifacts. Here we propose a novel technique for image registration of 18-Fluoro deoxyglucose Positron Emission Tomography (PET) and Computed Tomography(CT) images with CBCT images. the computation is performed in two stages. In the first stage, mutual information based rigid image registration is performed to obtain a rough global alignment of CBCT image withthe corresponding PET and CT images. this result is fed to the second stage to perform deformable image registration between a pair of corresponding CBCT volumes of the same patient captures at different time instances using a viscous fluid model. the technique is adapted in both 2D (for slicewise computation) and 3D space (for computing with volume), and a comparative performance is presented with a simulated deformation model.
thoracic trauma often results in rib fractures, which demand swift and accurate diagnosis for effective treatment. However, detecting these fractures on rib CT scans poses considerable challenges, involving the analys...
详细信息
ISBN:
(纸本)9798400710759
thoracic trauma often results in rib fractures, which demand swift and accurate diagnosis for effective treatment. However, detecting these fractures on rib CT scans poses considerable challenges, involving the analysis of many image slices in sequence. Despite notable advancements in algorithms for automated fracture segmentation, the persisting challenges stem from the diverse shapes and sizes of these fractures. To address these issues, this study introduces a sophisticated deep-learning model with an auxiliary classification task designed to enhance the accuracy of rib fracture segmentation. the auxiliary classification task is crucial in distinguishing between fractured ribs and negative regions, encompassing non-fractured ribs and surrounding tissues, from the patches obtained from CT scans. By leveraging this auxiliary task, the model aims to improve feature representation at the bottleneck layer by highlighting the regions of interest. Experimental results on the RibFrac dataset demonstrate significant improvement in segmentation performance.
Online handwriting recognition research has recently received significant thrust. Specifically for indian scripts, handwriting recognition has not been focused much till in the near past. However, due to generous Gove...
详细信息
ISBN:
(纸本)9781479915880
Online handwriting recognition research has recently received significant thrust. Specifically for indian scripts, handwriting recognition has not been focused much till in the near past. However, due to generous Government funding through the group on Technology Development for indian Languages (TDIL) of the Ministry of Communication & Information Technology (MC&IT), Govt. of India, research in this area has received due attention and several groups are now engaged in research and development works for online handwriting recognition in different indian scripts. An extensive bottleneck of the desired progress in this area is the difficulty of collection of large sample databases of online handwriting in various scripts. Towards the same, recently a user-friendly tool on Android platform has been developed to collect data on handheld devices. this tool is called ISIgraphy and has been uploaded in the Google Play for free download. this application is designed well enough to store handwritten data samples in large scales in user-given file names for distinct users. Its use is script independent, meaning that it can collect and store handwriting samples written in any language, not necessarily an indian script. It has an additional module for retrieval and display of stored data. Moreover, it can directly send the collected data to others via electronic mail.
Music transcription refers to the process of analyzing a piece of music to generate a sequence of constituent notes and their duration. Transcription of music from audio signals is fraught with problems due to auditor...
详细信息
ISBN:
(纸本)9781450347532
Music transcription refers to the process of analyzing a piece of music to generate a sequence of constituent notes and their duration. Transcription of music from audio signals is fraught with problems due to auditory interference such as ambient noise, multiple instruments playing simultaneously, accompanying vocals or polyphonic sounds. For several instruments, there exists added information for music transcription which can be derived from a video sequence of the instrument as it is being played. this paper proposes a method to utilize this visual information for the case of keyboard-like instruments to generate a transcript automatically, by analyzing the video frames. We present encouraging results under varying lighting conditions on different song sequences played out on a keyboard.
image registration is an essential step in many computervision applications which demands high accuracy for significantly random and complex deformations. In medical imageprocessing applications image registration i...
详细信息
ISBN:
(纸本)9781450366151
image registration is an essential step in many computervision applications which demands high accuracy for significantly random and complex deformations. In medical imageprocessing applications image registration is a basic preprocessing step to non-rigidly align images from different acquisition environments to an atlas image. We propose a novel non-rigid image registration method to get more reliable registration even under noisy conditions with manageable time complexity. the proposed method explores the the inherent multi-resolution capability of wavelets to perform nonrigid registration in a graph environment. the inherent time complexity of wavelet feature map calculation is avoided using Chebyshev Polynomial approximations for the wavelet operators.
暂无评论