Methods for superpixel segmentation have become very popular in computer vision. Recently, a graph-based framework named ISF (Iterative Spanning Forest) was proposed to obtain connected superpixels (supervoxels in 3D)...
详细信息
Methods for superpixel segmentation have become very popular in computer vision. Recently, a graph-based framework named ISF (Iterative Spanning Forest) was proposed to obtain connected superpixels (supervoxels in 3D) based on multiple executions of the image Foresting Transform (IFT) algorithm from a given choice of four components: a seed sampling strategy, an adjacency relation, a connectivity function, and a seed recomputation procedure. In this paper, we extend ISF to introduce a unique characteristic among superpixel segmentation methods. Using the new framework, termed as Recursive Iterative Spanning Forest (RISF), one can recursively generate multiple segmentation scales on region adjacency graphs (i.e., a hierarchy of superpixels) without sacrificing the efficiency and effectiveness of ISF. In addition to a hierarchical segmentation, RISF allows a more effective geodesic seed sampling strategy, with no negative impact in the efficiency of the method. For a fixed number of scales using 2D and 3D image datasets, we show that RISF can consistently outperform the most competitive ISF-based methods.
Map charts are used in diverse domains to show geographic data (e.g., climate research, oceanography, business analysis, etc.). These charts can be found in news articles, scientific papers, and on the Web. However, m...
详细信息
Map charts are used in diverse domains to show geographic data (e.g., climate research, oceanography, business analysis, etc.). These charts can be found in news articles, scientific papers, and on the Web. However, many map charts are available only as bitmap images, hindering machine interpretation of the visualized data for indexing and reuse. We propose a pipeline to recover both the visual encodings and underlying data from bitmap images of geographic maps with color-encoded scalar values. We evaluate our results using map images from scientific documents, achieving high accuracy along each step of our proposal. In addition, we present two applications: data extraction and map reprojection to enable improved visual representations of map charts.
The quality of the input fingerprint has a big impact on the performance of the Automatic Fingerprint Identification System (AFIS). So, the fingerprint enhancement is an important and necessary step to refine the qual...
详细信息
The quality of the input fingerprint has a big impact on the performance of the Automatic Fingerprint Identification System (AFIS). So, the fingerprint enhancement is an important and necessary step to refine the quality of images. Over the past few years, fingerprint enhancement approaches have been proposed to investigate and test technologies in an attempt to find improvements. One of the most common methods in the literature to achieve that is the convolution with Gabor filters. By using coherent parameters and successive iterations, it is possible to highlight clearly the lines present in the images. This paper analyzes and presents improvements in a renowned algorithm that uses a contextual iterative filtering. Experimental results show that the proposed upgrades developed in this research obtained gains of 21% over the baseline.
Understanding how a classifier partitions a high-dimensional input space and assigns labels to the parts is an important task in machine learning. Current methods for this task mainly use color-coded sample scatterplo...
详细信息
Understanding how a classifier partitions a high-dimensional input space and assigns labels to the parts is an important task in machine learning. Current methods for this task mainly use color-coded sample scatterplots, which do not explicitly show the actual decision boundaries or confusion zones. We propose an image-based technique to improve such visualizations. The method samples the 2D space of a dimensionality-reduction projection and color-code relevant classifier outputs, such as the majority class label, the confusion, and the sample density, to render a dense depiction of the high-dimensional decision boundaries. Our technique is simple to implement, handles any classifier, and has only two simple-to-control free parameters. We demonstrate our proposal on several real-world high-dimensional datasets, classifiers, and two different dimensionality reduction methods.
Knowing the state of the disconnect switches in a power distribution substation is important to avoid accidents, damaged equipment, and service interruptions. This information is usually provided by human operators, w...
详细信息
Knowing the state of the disconnect switches in a power distribution substation is important to avoid accidents, damaged equipment, and service interruptions. This information is usually provided by human operators, who can commit errors because of the cluttered environment, bad weather or lighting conditions, or lack of attention. In this paper, we introduce an approach for determining the state of each switch in a substation, based on images captured by regular pan-tilt-zoom surveillance cameras. The proposed approach includes noise reduction, image registration using phase correlation, and classification using a convolutional neural network and a support vector machine fed with gradient-based descriptors. By combining information given in an initial labeling stage with imageprocessing techniques to reduce variations in viewpoint, our approach achieved 100% accuracy on experiments performed at a real substation over multiple days. We also show how modifications to the standard phase correlation image registration algorithm can make it more robust to lighting variations, and how SIFT (Scale-Invariant Feature Transform) descriptors can be made more robust in scenarios where the relevant objects may be brighter or darker than the background.
We propose a new approach for segmenting a document image into its page components (e.g. text, graphics and tables). Our approach consists of two main steps. In the first step, a set of scores corresponding to the out...
详细信息
We propose a new approach for segmenting a document image into its page components (e.g. text, graphics and tables). Our approach consists of two main steps. In the first step, a set of scores corresponding to the output of a convolutional neural network, one for each of the possible page component categories, is assigned to each connected component in the document. The labeled connected components define a fuzzy over-segmentation of the page. In the second step, spatially close connected components that are likely to belong to a same page component are grouped together. This is done by building an attributed region adjacency graph of the connected components and modeling the problem as an edge removal problem. Edges are then kept or removed based on a pre-trained classifier. The resulting groups, defined by the connected subgraphs, correspond to the detected page components. We evaluate our method on the ICDAR2009 dataset. Results show that our method effectively segments pages, being able to detect the nine types of page components. Furthermore, as our approach is based on simple machine learning models and graph-based techniques, it should be easily adapted to the segmentation of a variety of document types.
In this paper, we propose a new approach for the classification of reaching targets before movement onset, during visually-guided reaching in 3D space. Our approach combines the discriminant power of two-dimensional E...
详细信息
ISBN:
(纸本)9781538622193
In this paper, we propose a new approach for the classification of reaching targets before movement onset, during visually-guided reaching in 3D space. Our approach combines the discriminant power of two-dimensional Electroencephalography (EEG) signals (i.e., EEG images) built from short epochs, with the feature extraction and classification capabilities of deep learning (DL) techniques, such as the Convolutional Neural Networks (CNN). In this work, reaching motions are performed into four directions: left, right, up and down. To allow more natural reaching movements, we explore the use of Virtual Reality (VR) to build an experimental setup that allows the subject to perform self-paced reaching in 3D space while standing. Our results reported an increase both in classification performance and early detection in the majority of our experiments. To our knowledge this is the first time that EEG images and CNN are combined for the classification of reaching targets before movement onset.
Data clustering is one of the main challenges when solving Data Science problems. Despite its progress over almost one century of research, clustering algorithms still fail in identifying groups naturally related to t...
详细信息
Data clustering is one of the main challenges when solving Data Science problems. Despite its progress over almost one century of research, clustering algorithms still fail in identifying groups naturally related to the semantics of the problem. Moreover, the technological advances add crucial challenges with a considerable data increase, which are not handled by most techniques. We address these issues by proposing a divide-and-conquer approach to a clustering technique, which is unique in finding one group per dome of the probability density function of the data - the Optimum-Path Forest (OPF) clustering algorithm. Our approach can use all samples, or at least many samples, in the unsupervised learning process without affecting the grouping performance and, therefore, being less likely to lose relevant grouping information. We show that it can obtain satisfactory results when segmenting natural images into superpixels.
In this paper, we propose the use of dynamic-images-based approach for action recognition. Specifically, we exploit the multimodal information recorded by a Kinect sensor (RGB-D and skeleton joint data). We combine se...
详细信息
In this paper, we propose the use of dynamic-images-based approach for action recognition. Specifically, we exploit the multimodal information recorded by a Kinect sensor (RGB-D and skeleton joint data). We combine several ideas from rank pooling and skeleton optical spectra to generate dynamic images to summarize an action sequence into single flow images. We group our dynamic images into five groups: a dynamic color group (DC); a dynamic depth group (DD) and three dynamic skeleton groups (DXY, DYZ, DXZ). As action is composed of different postures along time, we generated N different dynamic images with the main postures for each dynamic group. Next, we applied a pre-trained flow-CNN to extract spatiotemporal features with a max-mean aggregation. The proposed method was evaluated on a public benchmark dataset, the UTD-MHAD, and achieved the state-of-the-art result.
We present a method for synchronizing three-dimensional (3D) point cloud from 3D scene with estimation using a 3D Lidar and an RGB camera. These 3D points sensed by the 3D Lidar are not captured at the same time, whic...
详细信息
暂无评论