A perceptual video hashing function maps the perceptual content of a video into a fixed-length binary string called the perceptual hash. Perceptual hashing is a promising solution to the content-identification and the...
详细信息
A perceptual video hashing function maps the perceptual content of a video into a fixed-length binary string called the perceptual hash. Perceptual hashing is a promising solution to the content-identification and the content-authentication problems. The projections of image and video data onto a subspace have been exploited in the literature to get a compact hash function. We propose a new perceptual video hashing algorithm based on the Achlioptas's random projections. Simulation results show that the proposed perceptual hash function is robust to common signal and imageprocessing attacks.
Automatic segmentation of brain tumors from Magnetic Resonance images is a challenging task due to the wide variation in intensity, size, location of tumors in images. Defining a precise boundary for a tumor is essent...
详细信息
A tone mapping operator converts High Dynamic Range (HDR) images to Low Dynamic Range (LDR) images, which can be seen on LDR displays. There has been a lot of research done in the direction of an optimal Tone Mapping ...
详细信息
We propose an evolving scheme to detect slow as well as fast moving objects in a video sequence. The proposed scheme employ both spatio-temporal and temporal segmentation to obtain the video object plane and hence det...
详细信息
We propose an evolving scheme to detect slow as well as fast moving objects in a video sequence. The proposed scheme employ both spatio-temporal and temporal segmentation to obtain the video object plane and hence detection. We propose a compound Markov random field model as the a priori image model that takes into account the spatial distribution of the current frame, temporal frames and the edge maps of the temporal frames. The spatio-temporal segmentation is cast as a pixel labeling problem and the labels are the MAP estimates. The MAP estimates of a frame are obtained by a hybrid algorithm. The spatial segmentation of a given frame evolves to generate the spatial segmentation of the subsequent frames. The evolved spatial segmentation together with the temporal segmentation produces the Video Object Plane (VOP) and hence detection. Our scheme does require the computation of spatio-temporal segmentation of the initial frame thus speeding up the whole process. The results of the proposed scheme are compared with JSEG method are found to be better in terms of the misclassification error.
Entropy of order q (depending on the information contained in a sequence of gray levels of length q) and conditional entropy of an image are defined. Using these definitions, two algorithms are formulated and implemen...
详细信息
Entropy of order q (depending on the information contained in a sequence of gray levels of length q) and conditional entropy of an image are defined. Using these definitions, two algorithms are formulated and implemented with the help of its co-occurrence matrix. Their superiority for image thresholding (object-background classification) is established.< >
We consider here a change detection problem: to find regions of change on a test image with respect to a reference image. Unlike the state-of-the-art change detection and background subtraction algorithms that compute...
详细信息
We consider here a change detection problem: to find regions of change on a test image with respect to a reference image. Unlike the state-of-the-art change detection and background subtraction algorithms that compute only local (pixel location-based) changes, we propose to minimize a novel region-based energy functional based on Bhattacharya coefficient involving histograms of image features. The optimization of the proposed energy functional simply consists of two very efficient searches if a crude segmentation such as a bounding box around the region of change is sufficient. Also, it allows variational optimization via level set-based curve evolution for supervised binary image labeling. The framework is demonstrated to cope well with considerable camera motion and shifts of objects between the test and the reference images. We illustrate encouraging results on finding bounding box around abnormality from brain MRI, object detection for maritime surveillance, and segmenting oil-sand particles from conveyor belt images.
We propose a way to incorporate a priori information in a 3D stereo reconstruction process from a pair of calibrated face images. A 3D mesh modeling the surface is iteratively deformed in order to minimize an energy f...
详细信息
We propose a way to incorporate a priori information in a 3D stereo reconstruction process from a pair of calibrated face images. A 3D mesh modeling the surface is iteratively deformed in order to minimize an energy function. Differential information about the object shape is used to generate an adaptive mesh that can fulfil the compacity and the accuracy requirements. Moreover in areas where the stereo information is not reliable enough to accurately recover the surface shape, because of inappropriate texture or bad lighting conditions, we incorporate geometric constraints related to the differential properties of the surface, that can be intuitive or refer to predefined geometric properties of the object to be reconstructed. They can be applied to scalar fields, such as curvature values, or structural features, such as crest lines. Therefore, we generate a 3D face model using computervision techniques that is compact, accurate and consistent with the a priori knowledge about the underlying surface.
Segmentation of cursive handwriting is one of the most challenging problems in the area of handwritten character recognition. In this paper, we propose a novel approach towards character segmentation in a handwritten ...
详细信息
Segmentation of cursive handwriting is one of the most challenging problems in the area of handwritten character recognition. In this paper, we propose a novel approach towards character segmentation in a handwritten document. It is based on the vertex characterization of outer isothetic polygonal covers so that each cover corresponds to a particular word or part of a word. The proposed method has the potential to segment skewed text without deskewing them. Experiment is done on several Bangla handwritings of different individuals. The average success rate is 96.04\%. This method can be considered as a significant preprocessing step towards the development of a handwritten Bangla OCR system.
In this paper, we present algorithms for resizing and transcoding of images in the transform domain. The approach is based on wavelet filtering (analysis/synthesis) along with downsampling/upsampling operation in the ...
详细信息
In this paper, we present algorithms for resizing and transcoding of images in the transform domain. The approach is based on wavelet filtering (analysis/synthesis) along with downsampling/upsampling operation in the block DCT space. We use linear filtering in the block DCT domain to perform convolution. To obtain results equivalent to linear convolution, filtering is performed on the three adjacent blocks. To reduce the complexity, we perform sampling rate change and filtering operations in a single combined step. The proposed approach achieves the best performance in terms of quality compared to the DCT domain technique.
Visual Question Answering (VQA) system responds to a natural language question in context of an image. This problem has been primarily formulated as a classification problem with the answers as the finite number of cl...
详细信息
ISBN:
(纸本)9798400716256
Visual Question Answering (VQA) system responds to a natural language question in context of an image. This problem has been primarily formulated as a classification problem with the answers as the finite number of classes. Thus, the generated response consists of a single word or a short phrase. However, this also limits the linguistic capabilities of such a system. In contrast, this work presents a Sentence-based VQA (S-VQA) which responds to questions with complete sentences as answers. The first contribution of this work is the development of a dataset from the Task Directed image Understanding Challenge (TDIUC) VQA dataset using natural language rules and pretrained para-phrasers. This new dataset is referred to as TDIUC-SVQA. The second contribution involves the performance evaluation of multiple models on the TDIUC-SVQA dataset. This is performed by using two multi-modal models. The Bottom-Up Top-Down Attention based VQA model is combined with LSTM decoder and Attention-on-Attention Network for answer generation. The proposed models are observed to provide improved results compared to the baseline model.
暂无评论