检索结果-内蒙古大学图书馆

International Conference on Document Analysis and recognition

作者： U. Pal M. Mitra B.B. Chaudhuri Computer Vision and Pattern Recognition Unit Indian Statistical Institute Calcutta India

ISBN: (纸本)0769512631

There are many documents where text lines are not parallel to each other i.e. these lines have different inclinations with the horizontal lines (multi-skew documents). For the OCR of such a document we have to estimate the skew angle of individual text lines because a single rotation cannot de-skew all text lines of the document. In this paper, we describe a robust technique for multi-skew angle detection from Indian documents containing the most popular Indian scripts Devnagari and Bangla. Most characters in these scripts have horizontal lines at the top, called head-lines. The character head-lines usually connect one another in a word and the word appears as a single component. In the proposed method, the connected components are at first labeled and selected. The upper envelopes of selected components are found by column-wise scanning from the top of the component. Portions of the upper envelope satisfying the properties of a digital straight line are detected. They are then clustered into groups belonging to single text lines. Estimates from these individual clusters give the skew angle of each text line. The proposed multi-skew detection technique has an accuracy about 98.3%.

关键词： Strips Fourier transforms Optical character recognition software Robustness computer vision pattern recognition Envelope detectors Humans Goniometers Gray-scale

来源：评论

学校读者我要写书评

暂无评论

Automatic recognition of printed Oriya script

Automatic recognition of printed Oriya script

引用

International Conference on Document Analysis and recognition

作者： B.B. Chaudhuri U. Pal M. Mitra Computer Vision and Pattern Recognition Unit Indian Statistical Institute Calcutta India

ISBN: (纸本)0769512631

The paper deals with an optical character recognition system for printed Oriya, a popular Indian script. The development of OCR for this script is difficult because a large number of characters have to be recognized. In the proposed system, the digitized document image is first passed through preprocessing modules like skew correction, line segmentation, zone detection, word and character segmentation, etc. These modules have been developed by combining some conventional techniques with some newly proposed ones. Next, individual characters are recognized using a combination of stroke and run-number based features, along with features obtained from the concept of a water reservoir. The feature detection methods are simple and robust. A prototype of the system has been tested on a variety of printed Oriya material, and currently achieves 96.3% character level accuracy on average.

关键词： Character recognition Optical character recognition software Image segmentation Water resources Reservoirs computer vision Robustness Prototypes Materials testing System testing

来源：评论

学校读者我要写书评

暂无评论

Automatic identification of English, Chinese, Arabic, Devnagari and Bangla script line

Automatic identification of English, Chinese, Arabic, Devnag...

引用

International Conference on Document Analysis and recognition

作者： U. Pal B.B. Chaudhuri Computer Vision and Pattern Recognition Unit Indian Statistical Institute Calcutta India

In a general situation, a document page may contain several scriptforms. For optical character recognition (OCR) of such a document page, it is necessary to separate the scripts before feeding them to their individual OCR systems. An automatic technique for the identification of printed Roman, Chinese, Arabic, Devnagari and Bangla text lines from a single document is proposed. Shape based features, statistical features and some features obtained from the concept of a water reservoir are used for script identification. The proposed scheme has an accuracy of about 97.33%.

关键词： Water resources Reservoirs Optical character recognition software Shape Water storage Probability computer vision pattern recognition Optical devices Fractals

来源：评论

学校读者我要写书评

暂无评论

A cascaded genetic algorithm for efficient optimization and pattern matching 2nd

引用

2nd International Conference on Advances in pattern recognition, ICAPR 2001

作者： Garai, Gautam Computer Division Saha Institute of Nuclear Physics 1/AF Bidhannagar Calcutta700064 India Computer Vision and Pattern Recognition Unit Indian Statistical Institute 203 B. T. Road Calcutta700035 India

ISBN: (纸本)3540417672

A modified Genetic Algorithm (GA) based search strategy is presented here that is computationally more efficient than the conventional GA. Here the idea is to start a GA with the chromosomes of small length. Such chromosomes represent possible solutions with coarse resolution. A finite space around the position of solution in the first stage is subject to the GA at the second stage. Since this space is much smaller than the original search space, chromosomes of same length now represent finer resolution. In this way, the search progresses from coarse to fine solution in a cascaded manner. Since chromosomes of small size are used at each stage, the overall approach becomes computationally more efficient than a single stage algorithm with the same degree of final resolution. Also, since at the lower stage we work on low resolution, the algorithm can avoid local spurious extrema. The effectiveness of the proposed GA has been demonstrated for the optimization of some synthetic functions and on pattern recognition problems namely dot pattern matching and object matching with edge map. © Springer-Verlag Berlin Heidelberg 2001.

关键词： Genetic algorithms

来源：评论

学校读者我要写书评

暂无评论

Water reservoir based approach for touching numeral segmentation

Water reservoir based approach for touching numeral segmenta...

引用

International Conference on Document Analysis and recognition

作者： U. Pal A. Belaid C. Choisy Group READ LORIA Campus Scientifique France Computer Vision and Pattern Recognition Unit Indian Statistical Institute India Group READ LORIA Campus Scientifique Vandoeuvre-les-Nancy France

ISBN: (纸本)0769512631

Deals with a scheme for automatic segmentation of unconstrained handwritten connected numerals. The scheme is mainly based on features obtained from a new concept based on a water reservoir. A reservoir is a metaphor to illustrate the region where numerals touch. The reservoir is obtained by considering accumulation of water poured from the top or from the bottom of the numerals. At first, considering the reservoir location and size, touching positions (top, middle and bottom) are decided. Next, by analyzing the reservoir boundary, touching position and topological features of the touching pattern, the best cutting point is determined. Finally, combined with morphological structural features the cutting path for segmentation is generated.

关键词： Water resources Reservoirs Water storage pattern analysis computer vision pattern recognition Handwriting recognition Joining IEEE

来源：评论

学校读者我要写书评

暂无评论

Bangla pronouns - a corpus based study

引用

Literary and Linguistic Computing 2000年第4期15卷 433-444页

作者： Dash, Ns Computer Vision and Pattern Recognition Unit Indian Statistical Institute Calcutta 700035 203 Barrackpore Trunk Road India

Bangla is the second most widely spoken language in the Indian subcontinent, yet has not been the focus of much research activity in either corpus linguistics or language engineering to date. This paper describes the automatic processing of pronouns in three and a half million words of Bangla corpus data. A corpus-based analysis of Bangla pronouns is developed, and a new approach to the analysis of Bangla pronouns is taken as a consequence. On the basis of this analysis a system is then developed to identify and analyse Bangla pronouns in corpus data. © 2000 Oxford University Press.

关键词：

来源：评论

学校读者我要写书评

暂无评论

A syntactic approach for processing mathematical expressions in printed documents

A syntactic approach for processing mathematical expressions...

引用

International Conference on pattern recognition

作者： U. Garain B.B. Chaudhuri Indian Statistical Institute Computer Vision and Pattern Recognition Unit Calcutta India

ISBN: (纸本)0769507506

We propose an approach for understanding mathematical expressions in printed documents. The overall approach is divided into three main steps: (i) detection of mathematical expressions in a document, (ii) recognition of the symbols present in the expression and (iii) arrangement of the recognized symbols. The detection of mathematical expressions is done through recognition of a few most common symbols and exploiting some structural features of the expressions. A hybrid of feature based and a template-based technique is used for the recognition of symbols. A two-pass approach is used for arrangement of the symbols. The first pass (scanning or lexical analysis) performs a micro-level examination of the symbols in order to identify the symbol groups occurring in them and to determine their categories or descriptors. The second pass (parsing or syntax analysis) processes the descriptors synthesized in the first pass, to determine the syntactic structure of the expression. A set of predefined rules guides the activities in both the passes. Experiments conducted using this approach on a large number of documents show high accuracy.

关键词： Optical character recognition software computer vision pattern recognition Performance analysis Equations Document handling Testing Books White spaces

来源：评论

学校读者我要写书评

暂无评论

Indian language multimedia and information retrieval 3

Indian language multimedia and information retrieval

引用

3rd International Conference on Computational Intelligence and Multimedia Applications, ICCIMA 1999

作者： Chaudhuri, B.B. Computer Vision and Pattern Recognition Unit Indian Statistical Institute Calcutta India

ISBN: (纸本)0769503004

Over the last decade or so, remarkable developments in computer technology have given a major impetus to research in the field of multimedia. With the proliferation of the Internet and the increasingly widespread use of sophisticated computers, the multimedia revolution has arrived in India as well. It is therefore time to take stock of the situation: to evaluate how existing techniques can be used in the Indian context and to determine what new methods have to be developed. This paper summarizes the current state of multimedia technology in India and points to directions for further work. As more and more people in India begin to use computers and the Internet, multimedia capabilities will start playing a vital role in solving problems in many different areas. Education is probably one of the most important areas where multimedia technology can have a major impact. Already, multimedia educational systems are being developed in Indian languages. Several interactive encyclopaedia-like environments are also being marketed on CD-ROMs, and cover topics ranging from Indian classical music to Indian history, using text, images and sound. Some of the other possible applications of multimedia technology are: the development of digital libraries, news and information dissemination services, medicine, business and commerce, and the entertainment industry. Multimedia information technology is thus poised to become an exciting area for research and development activities in India. © 1999 IEEE.

关键词： CD-ROM

来源：评论

学校读者我要写书评

暂无评论

Extraction of type style based meta-information from imaged documents 5

Extraction of type style based meta-information from imaged ...

引用

5th International Conference on Document Analysis and recognition, ICDAR 1999

作者： Garain, U. Chaudhuri, B.B. Computer Vision and Pattern Recognition Unit Indian Statistical Institute 203 Barrackpore Trunk Road Calcutta700 035 India

ISBN: (纸本)0769503187

Extraction of some meta-information from printed documents without an OCR approach is considered. It can be statistically verified that important terms in articles are printed in italic, bold and all capital style. Detection of these type styles helps in automatic extraction of the lines containing titles, authors' names, subtitles, references as well as sentences having important terms occurring in the text. It also helps in improving the OCR performance for reading the italic text. Some experimental results on the performance of the approach on good quality as well as degraded document images are presented. © 1999 IEEE.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Segmentation of Bangla handwritten text into characters by recursive contour following 5

Segmentation of Bangla handwritten text into characters by r...

引用

5th International Conference on Document Analysis and recognition, ICDAR 1999

作者： Bishnu, A. Chaudhuri, B.B. Computer Vision and Pattern Recognition Unit Indian Statistical Institute 203 B. T. Road Calcutta700035 India

ISBN: (纸本)0769503187

Segmentation of handwritten words into characters is one of the important components in handwritten text OCR. In this paper we put forward a method for the segmentation of handwritten Bangla (an Indo-Bangladeshi language) text into characters. Based on certain characteristics of Bangla writing methods, different zones across the height of the word are detected. These zones provide certain structural information about the constituent characters of the respective word. In Bangla handwritten texts often there is overlap between rectangular hulls of successive characters. As such the characters are seldom vertically separable. So, we propose a method of recursive contour following in one of the zones across the height of the word to find out the extents within which the main portion of the character lies. If the successive characters are not touching in the zone of contour following, the algorithm gives fairly good results. © 1999 IEEE.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：