Clustering techniques such as K-means and Forgy as well as their improved version ISODATA group data around one seed point for each cluster. It is well known that these methods do not work well if the shape of the clu...
详细信息
Clustering techniques such as K-means and Forgy as well as their improved version ISODATA group data around one seed point for each cluster. It is well known that these methods do not work well if the shape of the cluster is elongated or nonconvex. We argue that for a elongated or nonconvex shaped cluster, more than one seed is needed, In this paper a multiseed clustering algorithm is proposed. A density based representative point selection algorithm is used to choose the initial seed points. To assign several seed points to one cluster, a minimal spanning tree guided novel technique is proposed. Also, a border point detection algorithm is proposed for the detection of shape of the cluster. This border in turn signifies whether the cluster is elongated or not. Experimental results show the efficiency of this clustering technique.
In this paper we propose a neural network model to synthesise texture images. The model is based on a continuous Hopfield-like network where each pixel of the image is occupied by a neuron that is eight-connected to i...
详细信息
In this paper we propose a neural network model to synthesise texture images. The model is based on a continuous Hopfield-like network where each pixel of the image is occupied by a neuron that is eight-connected to its neighbours. A state of the neuron denotes a certain grey level of the corresponding pixel. The firing of the neuron changes its state, and hence the grey level of the corresponding pixel. Different two-tone and grey-tone texture images can be synthesised by manipulating the connection weights and by varying the algorithm iteration number. For grey-tone texture synthesis, a Markov chain principle has been employed to decide on the multiple state transition of a neuron. The model can be employed for texture propagation with the advantage that it allows propagation without showing any blocky effect.
An NLP system for Indian languages should have a lexical subsystem that is driven by a morphological analyzer. Such an analyzer should be able to parse a word into its constituent morphemes and obtain lexical projecti...
详细信息
An NLP system for Indian languages should have a lexical subsystem that is driven by a morphological analyzer. Such an analyzer should be able to parse a word into its constituent morphemes and obtain lexical projection of the word as a unification of the projections of the constituent morphemes. Lexical projections considered here are f-structures of the Lexical Functional Grammar (LFG). A formalism has been proposed, by which the lexicon writer may specify the lexicon in four levels. The specifications are compiled into a stored lexical knowledge base on one hand and a formulation of derivational morphology called Augmented Finite State Automata (AFSA) on the other to achieve a compact lexical representation. The aspects of AFSA, especially its power of morphological parsing of words in a computationally attractive manner, has been discussed. An additional utility of the AFSA, in the form of spelling error corrector, has also been discussed. Bangla, or Bengali is considered as a case study. Implementation notes based on object-oriented programming principles has been provided.
Given a set of points in multi-dimensional space, we propose a new definition for the neighbors of an arbitrary point P. The definition tries to capture the idea that the neighbors should be as near to P and as symmet...
详细信息
Given a set of points in multi-dimensional space, we propose a new definition for the neighbors of an arbitrary point P. The definition tries to capture the idea that the neighbors should be as near to P and as symmetrically placed around P as possible. In contrast, the conventional nearest neighborhood considers only nearness as the criterion for neighborhood. We propose an iterative procedure to compute the neighbors where the first neighbor is the nearest neighbor. The second and other neighbors are chosen so that at any stage the distance between the centroid of the neighbors and P is as small as possible. The centroid criterion takes care of symmetrical placement of the neighbors. One can use median instead of centroid to define the neighbors. The new definition is free from any user-specified parameter and can be used for pattern classification, clustering and low-level description of dot patterns.
This paper considers optical character recognition (OCR) of Bangla, the second most popular script in the Indian subcontinent. A complete OCR system is described for documents of single Bangla font, where more than th...
详细信息
This paper considers optical character recognition (OCR) of Bangla, the second most popular script in the Indian subcontinent. A complete OCR system is described for documents of single Bangla font, where more than three hundred character shapes are recognized by a combination of template and feature-matching approach. Here the document image captured by a flatbed scanner is subject to tilt correction, line, word and character segmentation, simple and compound character separation, feature extraction and finally character recognition. Some character occurrence statistics have been computed to aid the recognition process. The simple character recognition is done by a feature-based tree classifier, and the compound character recognition involves a template matching approach preceded by a feature-based grouping. At present, recognition accuracy of about 96% is obtained by the system.
Extraction of some meta-information from printed documents without carrying out optical character recognition (OCR) is considered. It can be statistically verified that important terms in technical articles are mainly...
详细信息
ISITRA is a new scheme of signal decomposition and reconstruction. In ISITRA, the space of PRF sets is much larger and more well-behaved than that in the existing schemes like filter bank or wavelets. Since such a spa...
详细信息
This paper looks at the problem of searching for Indian language (IL) content on the Web. Even though the amount of IL content that is available on the Web is growing rapidly, searching through this content using the ...
详细信息
ISBN:
(纸本)9781605584164
This paper looks at the problem of searching for Indian language (IL) content on the Web. Even though the amount of IL content that is available on the Web is growing rapidly, searching through this content using the most popular websearch engines poses certain problems. Since the popular search engines do not use any stemming / orthographic normalization for Indian languages, recall levels for IL searches can be low. We provide some examples to indicate the extent of this problem, and suggest a simple and efficient solution to the problem. Copyright 2008 ACM.
Spelling error is broadly classified in two categories namely non word error and real word error. In this paper a localized real word error detection and correction method is proposed where the scores of bigrams gener...
详细信息
A few studies of online Bangla handwriting recognition such as isolated character recognition or limited vocabulary cursive word recognition are found in the literature. However, development of an end-to-end recogniti...
详细信息
暂无评论