the process of adding the geographical identification data to an image is called geotagging and is important for a range of applications starting from tourism to law enforcement agencies. the most convenient way of ad...
详细信息
ISBN:
(纸本)9789897584022
the process of adding the geographical identification data to an image is called geotagging and is important for a range of applications starting from tourism to law enforcement agencies. the most convenient way of adding location metadata to an image is GPS geotagging. this article presents an alternative way of adding the approximate location metadata to an urban scene image by finding similar images in a dataset of geotagged images. the matching is done by extracting the image features and descriptors and matching them. the dataset consists in geotagged 360 degrees panoramic images. We explored three methods of matching the images, each one being an iteration of the previous method. the first method used only feature detection and matching using AKAZE and FLANN, the second method performed image segmentation to provide a mask for extracting features and descriptors only from buildings and the third method preprocessed the dataset to obtain better accuracy. We managed to improve the accuracy of the system by 25%. Following the in-depth analysis of the results we will present the results as well as future improvements.
One of the foremost requisite for human perception and computervision task is to get an image with all objects in focus. the image fusion process, as one of the solutions, allows getting a clear fused image from seve...
详细信息
ISBN:
(纸本)9781479915880
One of the foremost requisite for human perception and computervision task is to get an image with all objects in focus. the image fusion process, as one of the solutions, allows getting a clear fused image from several images acquired with different focus levels of a scene. In this paper, a novel framework for multi-focus image fusion is proposed, which is computationally simple since it realizes only in the spatial domain. the proposed framework is based on the fractal dimensions of the images into the fusion process. the extensive experiments on different multifocus image sets demonstrate that it is consistently superior to the conventional image fusion methods in terms of visual and quantitative evaluations.
image Captioning (IC) is the process of automatically augmenting an image with semantically-laden descriptive text. While English IC has made remarkable strides forward in the past decade, very little work exists on I...
详细信息
ISBN:
(纸本)9789897584022
image Captioning (IC) is the process of automatically augmenting an image with semantically-laden descriptive text. While English IC has made remarkable strides forward in the past decade, very little work exists on IC for other languages. One possible solution to this problem is to boostrap off of existing English IC systems for image understanding, and then translate the outcome to the required language. Unfortunately, as this paper will show, translated IC is lacking due to the error accumulation of the two tasks;IC and translation. In this paper, we address the problem of image captioning in Arabic. We propose an end-to-end model that directly transcribes images into Arabic text. Due to the lack of Arabic resources, we develop an annotated dataset for Arabic image captioning (AIC). We also develop a base model for AIC that relies on text translation from English image captions. the two models are evaluated withthe new dataset, and the results show the superiority of our end-to-end model.
In all imageprocessing applications, it is important to extract the appropriate information from an image. But often the captured image is not clear enough to give the required information due to the imaging environm...
详细信息
ISBN:
(纸本)9781479915880
In all imageprocessing applications, it is important to extract the appropriate information from an image. But often the captured image is not clear enough to give the required information due to the imaging environment. thus, it is essential to enhance the clarity of the image by some post-processing techniques. image deblurring is one of such techniques to remove the blurry effect of the captured image. this paper looks into this problem in a different way, where the deblurring of an image is addressed by solving image super-resolution problem. the blurred image is first down-sampled and then it is fed to the super-resolution framework to produce the deblurred high resolution image. In addition, the proposed approach states the requirement of edge preservation in the problem. the experimental results are comparable withthe existing image deblurring algorithms.
In this paper, we present a fast and efficient algorithm for regularization and resampling of triangular meshes generated by 3D reconstruction methods such as stereoscopy, laser scanning etc. We also present a scheme ...
详细信息
ISBN:
(纸本)9781479915880
In this paper, we present a fast and efficient algorithm for regularization and resampling of triangular meshes generated by 3D reconstruction methods such as stereoscopy, laser scanning etc. We also present a scheme for efficient parallel implementation of the proposed algorithm and the time gain with increasing number of processor cores.
A perceptual video hashing function maps the perceptual content of a video into a fixed-length binary string called the perceptual hash. Perceptual hashing is a promising solution to the content-identification and the...
详细信息
ISBN:
(纸本)9781479915880
A perceptual video hashing function maps the perceptual content of a video into a fixed-length binary string called the perceptual hash. Perceptual hashing is a promising solution to the content-identification and the content-authentication problems. the projections of image and video data onto a subspace have been exploited in the literature to get a compact hash function. We propose a new perceptual video hashing algorithm based on the Achlioptas's random projections. Simulation results show that the proposed perceptual hash function is robust to common signal and imageprocessing attacks.
We explore the applicability of spectrograms in Deep learning applications and in guiding creative decisions. To this end, we propose Spectrogrand, a novel spectrogram-driven end-to-end Generative AI pipeline creating...
详细信息
ISBN:
(纸本)9798400710759
We explore the applicability of spectrograms in Deep learning applications and in guiding creative decisions. To this end, we propose Spectrogrand, a novel spectrogram-driven end-to-end Generative AI pipeline creating interesting audiovisuals from text prompts and incorporating lightweight computational creativity metrics. this process involves selecting a music piece to underpin the audiovisual, generating an album cover image for the music, and performing neural style transfer on spectrogram chunks to generate the frames for the audiovisual. To democratise the benefits of this pipeline, we open-source the tool, computational creativity metrics, and associated data (1).
A mathematical model of controlled motion (called the Dubins car in the literature on optimal control) is considered. this model is widely used to describe various motions: an airplane in a horizontal plane, a car, et...
详细信息
Text-based CAPTCHAs (completely automated public Turing test to tell computers and humans apart) are widely used to prevent unauthorized access by bots. However, there have been advancements in image segmentation and ...
详细信息
ISBN:
(纸本)9789897584022
Text-based CAPTCHAs (completely automated public Turing test to tell computers and humans apart) are widely used to prevent unauthorized access by bots. However, there have been advancements in image segmentation and character recognition techniques, which can be used for bot access;therefore, distorted characters that are difficult even for humans to recognize are often utilized. thus, a new text-based CAPTCHA technology with anti-segmentation properties is required. In this study, we propose CAPTCHA that uses stereoscopy based on binocular disparity. Generating a character area and its background withthe identical color patterns, it becomes impossible to extract the character regions if the left and right images are analyzed separately, which is a huge advantage of our method. However, character regions can be extracted by using disparity estimation or subtraction processing using bothimages;thus, to prevent such attacks, we intentionally add noise to the image. the parameters characterizing the amount of added noise are adjusted based on experiments with subjects wearing a head-mounted display to realize stereo vision. With optimal parameters, the recognition rate reaches 0.84;moreover, sufficient robustness against bot attacks is achieved.
Approximate Nearest-Neighbour Field has been an area of interest in recent research for a wide variety of topics in graphics and multimedia community. Medical imageprocessing is a relatively unaffected field by these...
详细信息
ISBN:
(纸本)9781479915880
Approximate Nearest-Neighbour Field has been an area of interest in recent research for a wide variety of topics in graphics and multimedia community. Medical imageprocessing is a relatively unaffected field by these developments in ANNF computations, brought about by various extremely efficient algorithms like PatchMatch. In this paper, we use Generalized PatchMatch for Optic Disk detection, in retinal images, and show that by making use of efficient ANNF computations we are able to generate results with 98% accuracy with an average time of 0.5 sec. this is significantly faster than conventional Optic Disk detection methods, which average at 95-97% accuracy with 3-5 sec average computation time.
暂无评论