this book constitutes the thoroughly refereed post-conference proceedings of the 6th Pacific Rim Symposium on image and Video Technology, PSIVT 2013, held in Guanajuato, México in October/November 2013. the total...
详细信息
ISBN:
(数字)9783642538421
ISBN:
(纸本)9783642538414
this book constitutes the thoroughly refereed post-conference proceedings of the 6th Pacific Rim Symposium on image and Video Technology, PSIVT 2013, held in Guanajuato, México in October/November 2013. the total of 43 revised papers was carefully reviewed and selected from 90 submissions. the papers are organized in topical sections on image/video processing and analysis, image/video retrieval and scene understanding, applications of image and video technology, biomedical imageprocessing and analysis, biometrics and image forensics, computational photography and arts, computer and robot vision, pattern recognition and video surveillance.
this article presents an algorithm for salient object detection by leveraging the Bayesian surprise of the Restricted Boltzmann Machine (RBM). Here an RBMis trained on patches sampled randomly from the input image. Du...
详细信息
ISBN:
(纸本)9781450366151
this article presents an algorithm for salient object detection by leveraging the Bayesian surprise of the Restricted Boltzmann Machine (RBM). Here an RBMis trained on patches sampled randomly from the input image. Due to this random sampling, the RBM is likely to get more exposed to background patches than that of the object. thus, the trained RBM will minimize the free energy of its hidden states with respect to the background patches as opposed to the object. this, according to the free energy principle, implies minimizing Bayesian surprise which is a measure for saliency based on Kullback Leibler divergence between the input and reconstructed patch distribution. Hence, when the trained RBM is exposed to patches from the object region, it would have high divergence and in turn a high Bayesian surprise. thus such pixels with high Bayesian surprise could be considered as salient pixels. For each pixel, a neighborhood (withthe same size of training patch) is considered and is fed to the trained RBM to obtain the reconstructed patch. thereafter, the Kullback Leibler divergence between the input and reconstructed neighborhood of each pixel is computed to measure the Bayesian surprise and is stored in the corresponding position in a matrix to form the saliency map. Experiments are carried out on three datasets namelyMSRA-10K, ECSSD and DUTS. the results obtained depict promising performance by the proposed approach.
作者:
Qaffou, IssamLISI Laboratory
Faculty of Science Semlalia Cadi Ayyad University Computer Science Department Marrakech Morocco
Selecting optimal parameters' values in a computervision task is a challenge for users because of the multitude of possible cases. this paper discusses the use of artificial intelligence methods and techniques to...
详细信息
Autonomous navigation of generic monocular quadcopter in the indoor environment requires sophisticated approaches for perception, planning and control. this paper presents a system which enables a miniature quadcopter...
详细信息
ISBN:
(纸本)9781450347532
Autonomous navigation of generic monocular quadcopter in the indoor environment requires sophisticated approaches for perception, planning and control. this paper presents a system which enables a miniature quadcopter with a frontal monocular camera to autonomously navigate and explore the unknown indoor environment. Initially, the system estimates dense depth map of the environment from a single video frame using our proposed novel supervised Hierarchical Structured Learning (fin) technique, which yields both high accuracy levels and better generalization. the proposed HSL approach discretizes the overall depth range into multiple sets. It structures these sets hierarchically and recursively through partitioning the set of classes into two subsets with subsets representing apportioned depth range of the parent set, forming a binary tree. the binary classification method is applied to each internal node of binary tree separately using Support Vector Machine (svM). Whereas, the depth estimation of each pixel of the image starts from the root node in top-down approach, classifying repetitively till it reaches any of the leaf node representing its estimated depth. the generated depth map is provided as an input to Convolutional Neural Network (CNN), which generates flight planning commands. Finally, trajectory planning and control module employs a convex programming technique to generate collision-free minimum time trajectory which follows these flight planning commands and produces appropriate control inputs for the quadcopter. the results convey unequivocally the advantages of depth perception by HSL, while repeatable flights of successful nature in typical indoor corridors confirm the efficacy of the pipeline.
the objective of image fusion is to combine information from multiple images of the same scene in order to deliver only the useful information. the discrete cosine transform (DCT) based methods of image fusion are mor...
详细信息
Automatically describing the contents of an image is one of the fundamental problems in artificial intelligence. Recent research has primarily focussed on improving the quality of the generated descriptions. It is pos...
详细信息
We propose an image search system for flowers on a mobile phone. Mobile phones have more limited resources than desktop computers in terms of CPU, RAM and data storage. the database that we created has 45 classes. We ...
详细信息
ISBN:
(纸本)9781467308762
We propose an image search system for flowers on a mobile phone. Mobile phones have more limited resources than desktop computers in terms of CPU, RAM and data storage. the database that we created has 45 classes. We used 182 training images and 246 test images. We used an HSV histogram as a color feature. the accuracy rate using only color features was 44.86% with radius C=20. We use SURF as a shape feature. the accuracy rate using only shape-based feature was 47.31% with SURF vectors S=25. We combine both color and shape features to achieved accuracy 61.61%.
this paper presents an integrated model that uses machine learning techniques to perform text-to-text, image-to-text, and audio-to-text conversions, with particularly focus on indian languages. the proposed model whic...
详细信息
In this paper the new two-dimensional (TD) adaptive filter algorithms are introduced. the presented algorithms are TD variable step-size (VSS) normalized least mean squares (TD-VSS-NLMS) and TD-VSS affine projection a...
详细信息
this paper presents a self-supervised learning (SSL) based framework, specifically designed for Handwritten Mathematical Expression Recognition (HMER). the proposed approach incorporates a momentum encoding technique ...
详细信息
ISBN:
(纸本)9798400710759
this paper presents a self-supervised learning (SSL) based framework, specifically designed for Handwritten Mathematical Expression Recognition (HMER). the proposed approach incorporates a momentum encoding technique and a non-linear projection head into the image encoder to effectively address the common issue of dimensional collapse in self-supervised learning (SSL) methods. Our approach consists of two main steps: first, we use self-supervised pre-training to train the image encoder to obtain strong feature representations from HME images. Subsequently, we fine-tune the model using a Transformer network to predict LaTeX sequences from HME images. the assessment demonstrates that our SSL framework surpasses other existing SSL frameworks as well as several supervised methods in terms of performance. the findings indicate that our approach is highly advanced, emphasizing its potential to enhance the robustness and efficiency of feature representations in HMER tasks. the integration of momentum encoding and a non-linear projection head in the image encoder is shown to enhance the durability and effectiveness of feature representations, leading to superior performance in HMER tasks. Our experiments reveal that our approaches achieve an expression recognition rate (ExpRate) of 62. 17%, 61. 03%, 64. 8% on the CROHME 2014,2016,2019 test datasets respectively. the CROHME 2019 test data achieves the highest ExpRate, which is state-of-the-art (SOTA). this success is achieved by overcoming the challenges of dimensional collapse and leveraging the advantages of both self-supervised and supervised learning.(1)
暂无评论