this paper aims at describing an approach developed for the recognition of gestures on digital images. In this way, two shape descriptors were used: the histogram of oriented gradients (HOG) and Zernike invariant mome...
详细信息
ISBN:
(纸本)9781467379625
this paper aims at describing an approach developed for the recognition of gestures on digital images. In this way, two shape descriptors were used: the histogram of oriented gradients (HOG) and Zernike invariant moments (ZIM). A feature vector composed by the information acquired with both descriptors was used to train and test a two stage Neural Network, which is responsible for performing the recognition. In order to evaluate the approach in a practical context, a dataset containing 9600 images representing 40 different gestures (signs) from brazilian Sign Language (Libras) was composed. this approach showed high recognition rates (hit rates), reaching a final average of 96.77%.
Deep Learning methods are currently the state-of-the-art in many computer Vision and imageprocessing problems, in particular image classification. After years of intensive investigation, a few models matured and beca...
详细信息
ISBN:
(纸本)9781538606193
Deep Learning methods are currently the state-of-the-art in many computer Vision and imageprocessing problems, in particular image classification. After years of intensive investigation, a few models matured and became important tools, including Convolutional Neural Networks (CNNs), Siamese and Triplet Networks, Auto-Encoders (AEs) and Generative Adversarial Networks (GANs). the field is fast-paced and there is a lot of terminologies to catch up for those who want to adventure in Deep Learning waters. this paper has the objective to introduce the most fundamental concepts of Deep Learning for computer Vision in particular CNNs, AEs and GANs, including architectures, inner workings and optimization. We offer an updated description of the theoretical and practical knowledge of working withthose models. After that, we describe Siamese and Triplet Networks, not often covered in tutorial papers, as well as review the literature on recent and exciting topics such as visual stylization, pixel-wise prediction and video processing. Finally, we discuss the limitations of Deep Learning for computer Vision.
this paper presents a novel 3D partial shape retrieval algorithm based on time-series analysis. Given a piece of a 3D shape, the proposed method encodes the shape descriptor given by the Heat Kernel Signature (HKS) as...
详细信息
ISBN:
(纸本)9781509035687
this paper presents a novel 3D partial shape retrieval algorithm based on time-series analysis. Given a piece of a 3D shape, the proposed method encodes the shape descriptor given by the Heat Kernel Signature (HKS) as a time-series, where the time is considered an ordered sequence of vertices provided by the Fiedler vector. Finally, a similarity metric is created using a well-known tool in time-series analysis called Cross Recurrence Plot (CRP). the good performance of our method is also attested in a large collection of shape models.
In addition to speech, gestures have been considered as a means of interacting with a computer as naturally as possible. Like speech, gestures can be acquired and recognized using Hidden Markov Models (HMMs), but ther...
详细信息
ISBN:
(纸本)0769520324
In addition to speech, gestures have been considered as a means of interacting with a computer as naturally as possible. Like speech, gestures can be acquired and recognized using Hidden Markov Models (HMMs), but there are several problems that must be overcome. In this paper we propose solutions to two of these problems: the feature extraction and the HMMs training. First, the acquisition is done by means of a high speed vision camera which allows the position of a hand to be obtained every 1 ms. this simplifies the feature extraction task and also allows low-level fusion with speech to be considered, which is a future goal. Secondly, we introduce quantized features, after carefully selecting extracted features, in order to avoid drastically increasing the size of the gesture database needed for good training of the HMMs. We finally show results that demonstrate the ability of such quantized features to significantly improve the recognition rate despite a rather small database and to allow user-independent recognition of gestures.
Surface representation and processing is one of the key topics in computergraphics and geometric modeling, since it greatly affects the range of possible applications. In this paper we will present recent advances in...
详细信息
Surface representation and processing is one of the key topics in computergraphics and geometric modeling, since it greatly affects the range of possible applications. In this paper we will present recent advances in geometry processingthat are related to the Laplacian processing framework and differential representations. this framework is based on linear operators defined oil polygonal meshes, and furnishes a variety of processing applications, such as shape approximation and compact representation, mesh editing, watermarking and morphing. the core of the framework is the definition of differential coordinates and new bases for efficient mesh geometry representation, based on the mesh Laplacian operator.
Facial expression synthesis has gained significant attention in the image synthesis field. Generative Adversarial Network (GAN) models have recently gained popularity due to the high-quality synthetic images they prod...
详细信息
ISBN:
(纸本)9798350338737;9798350338720
Facial expression synthesis has gained significant attention in the image synthesis field. Generative Adversarial Network (GAN) models have recently gained popularity due to the high-quality synthetic images they produce. However, these models require complex network architectures that can take days to train, even with high-performance graphicsprocessing Units (GPUs). Many efforts have been made to accelerate and compress such models, but little attention has been paid to the resolution of the images. this study aims to assess the impact of input/output spatial resolution on the resources needed for training a facial expression synthesis model, as well as on the quality of the results. Our results indicate that the produced images and videos had similar quality results measured through objective measures for the spatial resolution of 128 x 128, 256 x 256, and 480 x 480. Furthermore, we found that lower-resolution images could significantly reduce the time required to generate new facial expressions without compromising quality, as measured by objective measures.
Data acquisition and analysis are important areas for science, directly related to image reconstruction. Much acquired data can be corrupted by various factors, such as external noise sources or those inherent to the ...
详细信息
ISBN:
(纸本)9798350376043;9798350376036
Data acquisition and analysis are important areas for science, directly related to image reconstruction. Much acquired data can be corrupted by various factors, such as external noise sources or those inherent to the application, but can be treated mathematically. this work aims to reconstruct images corrupted by Gaussian and Rician noise, using DC programming and a non-convex version of the total variation (TV) model. the tests are performed with a variation of BDCA (smoothing of the first DC component) and nmBDCA algorithms. the obtained results are evaluated both in quality (PSNR and SSIM) and in CPU time, covering medical computed tomography (CT) images and magnetic resonance images (MRI).
computergraphics techniques for image generation are living an era where, day after day, the quality of produced content is impressing even the more skeptical viewer. Although it is a great advance for industries lik...
详细信息
ISBN:
(纸本)9781538622193
computergraphics techniques for image generation are living an era where, day after day, the quality of produced content is impressing even the more skeptical viewer. Although it is a great advance for industries like games and movies, it can become a real problem when the application of such techniques is applied for the production of fake images. In this paper we propose a new approach for computer generated images detection using a deep convolutional neural network model based on ResNet-50 and transfer learning concepts. Unlike the state-of-the-art approaches, the proposed method is able to classify images between computer generated or photo generated directly from the raw image data with no need for any pre-processing or hand-crafted feature extraction whatsoever. Experiments on a public dataset comprising 9700 images show an accuracy higher than 94%, which is comparable to the literature reported results, without the drawback of laborious and manual step of specialized features extraction and selection.
image dehazing can be described as the problem of mapping from a hazy image to a haze-free image. Most approaches to this problem use physical models based on simplifications and priors. In this work we demonstrate th...
详细信息
ISBN:
(纸本)9781538622193
image dehazing can be described as the problem of mapping from a hazy image to a haze-free image. Most approaches to this problem use physical models based on simplifications and priors. In this work we demonstrate that a convolutional neural network with a deep architecture and a large image database is able to learn the entire process of dehazing, without the need to adjust parameters, resulting in a much more generic method. We evaluate our approach applying it to real scenes corrupted by haze. the results show that even though our network is trained with simulated indoor images, it is capable of dehazing real outdoor scenes, learning to treat the degradation effect itself, not to reconstruct the scene behind it.
Deep neural networks are extensively used for solving a variety of computer vision problems. However, in order for these networks to obtain good results, a large amount of data is necessary for training. In image clas...
详细信息
ISBN:
(纸本)9781665423540
Deep neural networks are extensively used for solving a variety of computer vision problems. However, in order for these networks to obtain good results, a large amount of data is necessary for training. In image classification, this training data consists of images and labels that indicate the class portrayed by each image. Obtaining this large labeled dataset is very time and resource consuming. therefore, domain adaptation methods allow different, but semantic-related, datasets that are already labeled to be used during training, thus eliminating the labeling cost. In this work, the effects of embedding dimensionality reduction in a state-of-the-art domain adaptation method are analyzed. Furthermore, we experiment with a different approach that use the available data from all domains to compute the confidence of pseudo-labeled samples. We show through experiments in commonly used datasets that, in fact, the proposed modifications led to better results in the target domain in some scenarios.
暂无评论