Datasets of documents in Arabic are urgently needed to promote computervision and natural language processing research that addresses the specifics of the language. Unfortunately, publicly available Arabic datasets a...
详细信息
Datasets of documents in Arabic are urgently needed to promote computervision and natural language processing research that addresses the specifics of the language. Unfortunately, publicly available Arabic datasets are limited in size and restricted to certain document domains. This paper presents the release of BE-Arabic-9K, a dataset of more than 9000 high-quality scanned images from over 700 Arabic books. Among these, 1500 images have been manually segmented into regions and labeled by their functionality. BE-Arabic-9K includes book pages with a wide variety of complex layouts and page contents, making it suitable for various document layout analysis and text recognition research tasks. The paper also presents a page layout segmentation and text extraction baseline model based on fine-tuned Faster R-CNN structure (FFRA). This baseline model yields cross-validation results with an average accuracy of 99.4% and F1 score of 99.1% for text versus non-text block classification on 1500 annotated images of BE-Arabic-9K. These results are remarkably better than those of the state-of-the-art Arabic book page segmentation system ECDP. FFRA also outperforms three other prior systems when tested on a competition benchmark dataset, making it an outstanding baseline model to challenge.
In this paper, we introduce a local image descriptor, DAISY, which is very efficient to compute densely. We also present an EM-based algorithm to compute dense depth and occlusion maps from wide-baseline image pairs u...
详细信息
In this paper, we introduce a local image descriptor, DAISY, which is very efficient to compute densely. We also present an EM-based algorithm to compute dense depth and occlusion maps from wide-baseline image pairs using this descriptor. This yields much better results in wide-baseline situations than the pixel and correlation-based algorithms that are commonly used in narrow-baseline stereo. Also, using a descriptor makes our algorithm robust against many photometric and geometric transformations. Our descriptor is inspired from earlier ones such as SIFT and GLOH but can be computed much faster for our purposes. Unlike SURF, which can also be computed efficiently at every pixel, it does not introduce artifacts that degrade the matching performance when used densely. It is important to note that our approach is the first algorithm that attempts to estimate dense depth maps from wide-baseline image pairs, and we show that it is a good one at that with many experiments for depth estimation accuracy, occlusion detection, and comparing it against other descriptors on laser-scanned ground truth scenes. We also tested our approach on a variety of indoor and outdoor scenes with different photometric and geometric transformations and our experiments support our claim to being robust against these.
The development of customized imageprocessing applications is time consuming and requires high level skills. This paper describes the design of an interactive application generation system oriented towards producing ...
详细信息
The development of customized imageprocessing applications is time consuming and requires high level skills. This paper describes the design of an interactive application generation system oriented towards producing imageprocessing software programs. The description is focused on two models which constitute the core of the human-computer interaction. First, the formulation model identifies and organizes information that is assumed necessary and sufficient for developing imageprocessing applications. This model is represented as a domain ontology which provides primitives for the formulation language. Second, the interaction model defines ways to acquire such information from end-users. The result of the interaction is an application ontology from which a suitable software is generated. This model emphases the gradual emergence of a semantics of the problem through purely symbolic representations. Based on these two models, a prototype system has been implemented to conduct experiments. (C) 2010 Elsevier Ltd. All rights reserved.
A new technique is described for the registration of edge-detected images. While an extensive literature exists on the problem of image registration, few of the current approaches include a well-defined measure of the...
详细信息
A new technique is described for the registration of edge-detected images. While an extensive literature exists on the problem of image registration, few of the current approaches include a well-defined measure of the statistical confidence associated with the solution. Such a measure is essential for many autonomous applications, where registration solutions that are dubious (involving poorly focused images or terrain that is obscured by clouds) must be distinguished from those that are reliable ( based on clear images of highly structured scenes). The technique developed herein utilizes straightforward edge pixel matching to determine the "best" among a class of candidate translations. A well-established statistical procedure, the McNemar test, is then applied to identify which other candidate solutions are not significantly worse than the best solution. This allows for the construction of confidence regions in the space of the registration parameters. The approach is validated through a simulation study and examples are provided of its application in numerous challenging scenarios. While the algorithm is limited to solving for two-dimensional translations, its use in validating solutions to higher-order (rigid body, affine) transformation problems is demonstrated.
We propose a learning approach to tracking explicitly minimizing the computational complexity of the tracking process subject to user-defined probability of failure (loss-of-lock) and precision. The tracker is formed ...
详细信息
We propose a learning approach to tracking explicitly minimizing the computational complexity of the tracking process subject to user-defined probability of failure (loss-of-lock) and precision. The tracker is formed by a Number of Sequences of Learned Linear Predictors (NoSLLiP). Robustness of NoSLLiP is achieved by modeling the object as a collection of local motion predictors-object motion is estimated by the outlier-tolerant RANSAC algorithm from local predictions. The efficiency of the NoSLLiP tracker stems 1) from the simplicity of the local predictors and 2) from the fact that all design decisions, the number of local predictors used by the tracker, their computational complexity (i.e., the number of observations the prediction is based on), locations as well as the number of RANSAC iterations, are all subject to the optimization (learning) process. All time-consuming operations are performed during the learning stage-tracking is reduced to only a few hundred integer multiplications in each step. On PC with 1xK8 3200+, a predictor evaluation requires about 30 mu s. The proposed approach is verified on publicly available sequences with approximately 12,000 frames with ground truth. Experiments demonstrate superiority in frame rates and robustness with respect to the SIFT detector, Lucas-Kanade tracker, and other trackers.
This paper explores the combination of inertial sensor data with vision. Visual and inertial sensing are two sensory modalities that can be explored to give robust solutions on image segmentation and recovery of 3D st...
详细信息
This paper explores the combination of inertial sensor data with vision. Visual and inertial sensing are two sensory modalities that can be explored to give robust solutions on image segmentation and recovery of 3D structure from images, increasing the capabilities of autonomous robots and enlarging the application potential of vision systems. In biological systems, the information provided by the vestibular system is fused at a very early processing stage with vision, playing a key role on the execution of visual movements such as gaze holding and tracking, and the visual cues aid the spatial orientation and body equilibrium. In this paper, we set a framework for using inertial sensor data in vision systems, and describe some results obtained. The unit sphere projection camera model is used, providing a simple model for inertial data integration. Using the vertical reference provided by the inertial sensors, the image horizon line can be determined. Using just one vanishing point and the vertical, we can recover the camera's focal distance and provide an external bearing for the system's navigation frame of reference. Knowing the geometry of a stereo rig and its pose from the inertial sensors, the collineation of level planes can be recovered, providing enough restrictions to segment and reconstruct vertical features and leveled planar patches.
The uidA gene codifies for a glucuronidase (GUS) enzyme which has been used as a biotechnological tool during the last years. When uidA gene is fused to a gene's promotor region, it is possible to evaluate the act...
详细信息
The uidA gene codifies for a glucuronidase (GUS) enzyme which has been used as a biotechnological tool during the last years. When uidA gene is fused to a gene's promotor region, it is possible to evaluate the activity of this one in response to a stimulus. Arabidopsis thaliana has served as the biological platform to elucidate molecular and regulatory signaling responses in plants. Transgenic lines of A. thaliana, tagged with the uidA gene, have allowed explaining how plants modify their hormonal pathways depending on the environmental conditions. Although the information extracted from microscopic images of these transgenic plants is often qualitative and in many publications is not subjected to quantification, in this paper we report the development of an informatics tool focused on computervision for processing and analysis of digital images in order to analyze the expression of the GUS signal in A. thaliana roots, which is strongly correlated with the intensity of the grayscale images. This means that the presence of the GUS-induced color indicates where the gene has been actively expressed, such as our statistical analysis has demonstrated after treatment of A. thaliana DR5::GUS with naphtalen-acetic acid (0.0001 mM and 1 mM). GUSignal is a free informatics tool that aims to be fast and systematic during the image analysis since it executes specific and ordered instructions, to offer a segmented analysis by areas or regions of interest, providing quantitative results of the image intensity levels.
We address the problem of autocalibration of a moving camera with unknown constant intrinsic parameters. Existing autocalibration techniques use numerical optimization algorithms whose convergence to the correct resul...
详细信息
We address the problem of autocalibration of a moving camera with unknown constant intrinsic parameters. Existing autocalibration techniques use numerical optimization algorithms whose convergence to the correct result cannot be guaranteed, in general. To address this problem, we have developed a method where an interval branch-and-bound method is employed for numerical minimization. Thanks to the properties of Interval Analysis this method converges to the global solution with mathematical certainty and arbitrary accuracy and the only input information it requires from the user are a set of point correspondences and a search interval. The cost function is based on the Huang-Faugeras constraint of the essential matrix. A recently proposed interval extension based on Bernstein polynomial forms has been investigated to speed up the search for the solution. Finally, experimental results are presented.
In this paper we introduce the Boosted Random Ferns (BRFs) to rapidly build discriminative classifiers for learning and detecting object categories. At the core of our approach we use standard random ferns, but we int...
详细信息
In this paper we introduce the Boosted Random Ferns (BRFs) to rapidly build discriminative classifiers for learning and detecting object categories. At the core of our approach we use standard random ferns, but we introduce four main innovations that let us bring ferns from an instance to a category level, and still retain efficiency. First, we define binary features on the histogram of oriented gradients-domain (as opposed to intensity-), allowing for a better representation of intra-class variability. Second, both the positions where ferns are evaluated within the sliding window, and the location of the binary features for each fern are not chosen completely at random, but instead we use a boosting strategy to pick the most discriminative combination of them. This is further enhanced by our third contribution, that is to adapt the boosting strategy to enable sharing of binary features among different ferns, yielding high recognition rates at a low computational cost. And finally, we show that training can be performed online, for sequentially arriving images. Overall, the resulting classifier can be very efficiently trained, densely evaluated for all image locations in about 0.1 seconds, and provides detection rates similar to competing approaches that require expensive and significantly slower processing times. We demonstrate the effectiveness of our approach by thorough experimentation in publicly available datasets in which we compare against state-of-the-art, and for tasks of both 2D detection and 3D multi-view estimation.
暂无评论