This paper describes a skeletonization approach that has desirable characteristics for the analysis of static handwritten scripts. We concentrate on the situation where one is interested in recovering the parametric c...
详细信息
This paper describes a skeletonization approach that has desirable characteristics for the analysis of static handwritten scripts. We concentrate on the situation where one is interested in recovering the parametric curve that produces the script. Using Delaunay tessellation techniques where static images are partitioned into sub-shapes, typical skeletonization artifacts are removed, and regions with a high density of line intersections are identified. An evaluation protocol, measuring the efficacy of our approach is described. Although this approach is particularly useful as a pre-processing step for algorithms that estimate the pen trajectories of static signatures, it can also be applied to other static handwriting recognition techniques.
Advances in digital technologies have allowed us to generate more images than ever. Images of scanned documents are examples of these images that form a vital part in digital libraries and archives. Scanned degraded d...
详细信息
Advances in digital technologies have allowed us to generate more images than ever. Images of scanned documents are examples of these images that form a vital part in digital libraries and archives. Scanned degraded documents contain background noise and varying contrast and illumination, therefore, document image binarisation must be performed in order to separate foreground from background layers. Image binarisation is performed using either local adaptive thresholding or global thresholding;with local thresholding being generally considered as more successful. This paper presents a novel method to global thresholding, where a neural network is trained using local threshold values of an image in order to determine an optimum global threshold value which is used to binarise the whole image. The proposed method is compared with five local thresholding methods, and the experimental results indicate that our method is computationally cost-effective and capable of binarising scanned degraded documents with superior results.
Static handwritten scripts originate as images on documents and do not, by definition, contain any dynamic information. To improve the accuracy of static handwriting recognition systems, many techniques aim to estimat...
详细信息
Static handwritten scripts originate as images on documents and do not, by definition, contain any dynamic information. To improve the accuracy of static handwriting recognition systems, many techniques aim to estimate dynamic information from the static scripts. Mostly, the pen trajectories of the scripts are estimated. However, the efficacy of the resulting pen trajectories are rarely evaluated quantitatively. This paper proposes a protocol for the objective evaluation of automatically determined pen trajectories. A hidden Markov model is derived from a ground-truth trajectory. An estimated trajectory is then matched to the derived model. Statistics describing substitution, insertion and deletion errors are then computed from this match. The proposed algorithm is especially useful for performance comparisons between different pen trajectory estimation algorithms. (C) 2008 Elsevier Ltd. All rights reserved.
The problem of projecting multidimensional data into lower dimensions has been pursued by many researchers due to its potential application to data analyses of various kinds. This paper presents a novel multidimension...
详细信息
The problem of projecting multidimensional data into lower dimensions has been pursued by many researchers due to its potential application to data analyses of various kinds. This paper presents a novel multidimensional projection technique based on least square approximations. The approximations compute the coordinates of a set of projected points based on the coordinates of a reduced number of control points with defined geometry. We name the technique Least Square Projections ( LSP). From an initial projection of the control points, LSP defines the positioning of their neighboring points through a numerical solution that aims at preserving a similarity relationship between the points given by a metric in mD. In order to perform the projection, a small number of distance calculations are necessary, and no repositioning of the points is required to obtain a final solution with satisfactory precision. The results show the capability of the technique to form groups of points by degree of similarity in 2D. We illustrate that capability through its application to mapping collections of textual documents from varied sources, a strategic yet difficult application. LSP is faster and more accurate than other existing high-quality methods, particularly where it was mostly tested, that is, for mapping text sets.
We present a method for creating 2 1/2D models from line drawings of opaque solid objects. As input, we use a single drawing composed of strokes indicative of surface geometry, but not of texture, color or shading. We...
详细信息
We present a method for creating 2 1/2D models from line drawings of opaque solid objects. As input, we use a single drawing composed of strokes indicative of surface geometry, but not of texture, color or shading. We attempt to allow the artist to draw naturally, differing from many previous approaches. Our system allows both perspective and orthographic projection to be used and we make no a priori assumptions about the type of model to be produced (i.e. planar, curved, normalon). The frontal geometry of the input drawing is reconstructed by placing constraints at the contours and solving a 2D variational system for the smoothest piecewise smooth surface. An analysis of line labelling allows us to determine what constraints are possible and/or required for each input line. However, because line labelling produces a combinatorial explosion of valid output geometries, we allow the user to guide the constraint selection and optimization with a simple user interface that abstracts the technical details away from the user. The system produces candidate reconstructions using different constraint values, from which the user selects the one that most closely approximates the model represented by the drawing. These choices allow the system to determine the constraints and reconstruct the model. The system runs at interactive speeds. (c) 2007 Published by Elsevier Ltd.
Most biologists keep data in separate databases. These databases are not necessary well-structured, Plant identification keys are among such data. They are data-rich description containing plant identification termino...
详细信息
ISBN:
(纸本)9789889867140
Most biologists keep data in separate databases. These databases are not necessary well-structured, Plant identification keys are among such data. They are data-rich description containing plant identification terminologies and maybe used to identify various plant species. The way the data is kept often requires the species identification to be done using rules that are applied sequentially. Done manually, this is very time consuming. Information extraction (IE) is a process of selecting information such as names, terms, or phrases, from a natural language textdocuments. This information is then structured into a specified template for retrieval. This method is applied to plant identification keys kept by the biologists. Before the keys are extracted from the description,they have to go through a number of processes. In this paper, we illustrate the pre-processing and processing methods with an example from a database, with emphasis on the approximate string matching algorithm to extract the most relevant keys from the description.
Many libraries, museums, and other organizations contain large collections of handwritten historical documents, for example, the papers of early presidents like George Washington at the Library of Congress. The first ...
详细信息
Many libraries, museums, and other organizations contain large collections of handwritten historical documents, for example, the papers of early presidents like George Washington at the Library of Congress. The first step in providing recognition/retrieval tools is to automatically segment handwritten pages into words. State of the art segmentation techniques like the gap metrics algorithm have been mostly developed and tested on highly constrained documents like bank checks and postal addresses. There has been little work on full handwritten pages and this work has usually involved testing on clean artificial documents created for the purpose of research. Historical manuscript images, on the other hand, contain a great deal of noise and are much more challenging. Here, a novel scale space algorithm for automatically segmenting handwritten ( historical) documents into words is described. First, the page is cleaned to remove margins. This is followed by a gray-level projection profile algorithm for finding lines in images. Each line image is then filtered with an anisotropic Laplacian at several scales. This procedure produces blobs which correspond to portions of characters at small scales and to words at larger scales. Crucial to the algorithm is scale selection, that is, finding the optimum scale at which blobs correspond to words. This is done by finding the maximum over scale of the extent or area of the blobs. This scale maximum is estimated using three different approaches. The blobs recovered at the optimum scale are then bounded with a rectangular box to recover the words. A postprocessing filtering step is performed to eliminate boxes of unusual size which are unlikely to correspond to words. The approach is tested on a number of different data sets and it is shown that, on 100 sampled documents from the George Washington corpus of handwritten document images, a total error rate of 17 percent is observed. The technique outperforms a state-of-the-art gap met
Static signatures originate as handwritten images on documents and by definition do not contain any dynamic information. This lack of information makes static signature verification systems significantly less reliable...
详细信息
Static signatures originate as handwritten images on documents and by definition do not contain any dynamic information. This lack of information makes static signature verification systems significantly less reliable than their dynamic counterparts. This study involves extracting dynamic information from static images, specifically the pen trajectory while the signature was created. We assume that a dynamic version of the static image is available (typically obtained during an earlier registration process). We then derive a hidden Markov model from the static image and match it to the dynamic version of the image. This match results in the estimated pen trajectory of the static image.
暂无评论