检索结果-内蒙古大学图书馆

Bangla pronouns - a corpus based study

Literary and Linguistic Computing 2000年第4期15卷 433-444页

作者： Dash, Ns Computer Vision and Pattern Recognition Unit Indian Statistical Institute Calcutta 700035 203 Barrackpore Trunk Road India

Bangla is the second most widely spoken language in the indian subcontinent, yet has not been the focus of much research activity in either corpus linguistics or language engineering to date. This paper describes the automatic processing of pronouns in three and a half million words of Bangla corpus data. A corpus-based analysis of Bangla pronouns is developed, and a new approach to the analysis of Bangla pronouns is taken as a consequence. On the basis of this analysis a system is then developed to identify and analyse Bangla pronouns in corpus data. © 2000 Oxford University Press.

关键词：

来源：评论

学校读者我要写书评

暂无评论

A syntactic approach for processing mathematical expressions in printed documents

A syntactic approach for processing mathematical expressions...

引用

International Conference on pattern recognition

作者： U. Garain B.B. Chaudhuri Indian Statistical Institute Computer Vision and Pattern Recognition Unit Calcutta India

ISBN: (纸本)0769507506

We propose an approach for understanding mathematical expressions in printed documents. The overall approach is divided into three main steps: (i) detection of mathematical expressions in a document, (ii) recognition of the symbols present in the expression and (iii) arrangement of the recognized symbols. The detection of mathematical expressions is done through recognition of a few most common symbols and exploiting some structural features of the expressions. A hybrid of feature based and a template-based technique is used for the recognition of symbols. A two-pass approach is used for arrangement of the symbols. The first pass (scanning or lexical analysis) performs a micro-level examination of the symbols in order to identify the symbol groups occurring in them and to determine their categories or descriptors. The second pass (parsing or syntax analysis) processes the descriptors synthesized in the first pass, to determine the syntactic structure of the expression. A set of predefined rules guides the activities in both the passes. Experiments conducted using this approach on a large number of documents show high accuracy.

关键词： Optical character recognition software computer vision pattern recognition Performance analysis Equations Document handling Testing Books White spaces

来源：评论

学校读者我要写书评

暂无评论

indian language multimedia and information retrieval 3

Indian language multimedia and information retrieval

引用

3rd International Conference on Computational Intelligence and Multimedia Applications, ICCIMA 1999

作者： Chaudhuri, B.B. Computer Vision and Pattern Recognition Unit Indian Statistical Institute Calcutta India

ISBN: (纸本)0769503004

Over the last decade or so, remarkable developments in computer technology have given a major impetus to research in the field of multimedia. With the proliferation of the Internet and the increasingly widespread use of sophisticated computers, the multimedia revolution has arrived in India as well. It is therefore time to take stock of the situation: to evaluate how existing techniques can be used in the indian context and to determine what new methods have to be developed. This paper summarizes the current state of multimedia technology in India and points to directions for further work. As more and more people in India begin to use computers and the Internet, multimedia capabilities will start playing a vital role in solving problems in many different areas. Education is probably one of the most important areas where multimedia technology can have a major impact. Already, multimedia educational systems are being developed in indian languages. Several interactive encyclopaedia-like environments are also being marketed on CD-ROMs, and cover topics ranging from indian classical music to indian history, using text, images and sound. Some of the other possible applications of multimedia technology are: the development of digital libraries, news and information dissemination services, medicine, business and commerce, and the entertainment industry. Multimedia information technology is thus poised to become an exciting area for research and development activities in India. © 1999 IEEE.

关键词： CD-ROM

来源：评论

学校读者我要写书评

暂无评论

Extraction of type style based meta-information from imaged documents 5

Extraction of type style based meta-information from imaged ...

引用

5th International Conference on Document Analysis and recognition, ICDAR 1999

作者： Garain, U. Chaudhuri, B.B. Computer Vision and Pattern Recognition Unit Indian Statistical Institute 203 Barrackpore Trunk Road Calcutta700 035 India

ISBN: (纸本)0769503187

Extraction of some meta-information from printed documents without an OCR approach is considered. It can be statistically verified that important terms in articles are printed in italic, bold and all capital style. Detection of these type styles helps in automatic extraction of the lines containing titles, authors' names, subtitles, references as well as sentences having important terms occurring in the text. It also helps in improving the OCR performance for reading the italic text. Some experimental results on the performance of the approach on good quality as well as degraded document images are presented. © 1999 IEEE.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Segmentation of Bangla handwritten text into characters by recursive contour following 5

Segmentation of Bangla handwritten text into characters by r...

引用

5th International Conference on Document Analysis and recognition, ICDAR 1999

作者： Bishnu, A. Chaudhuri, B.B. Computer Vision and Pattern Recognition Unit Indian Statistical Institute 203 B. T. Road Calcutta700035 India

ISBN: (纸本)0769503187

Segmentation of handwritten words into characters is one of the important components in handwritten text OCR. In this paper we put forward a method for the segmentation of handwritten Bangla (an Indo-Bangladeshi language) text into characters. Based on certain characteristics of Bangla writing methods, different zones across the height of the word are detected. These zones provide certain structural information about the constituent characters of the respective word. In Bangla handwritten texts often there is overlap between rectangular hulls of successive characters. As such the characters are seldom vertically separable. So, we propose a method of recursive contour following in one of the zones across the height of the word to find out the extents within which the main portion of the character lies. If the successive characters are not touching in the zone of contour following, the algorithm gives fairly good results. © 1999 IEEE.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Script line separation from indian multi-script documents 5

Script line separation from Indian multi-script documents

引用

5th International Conference on Document Analysis and recognition, ICDAR 1999

作者： Pal, U. Chaudhuri, B.B. Computer Vision and Pattern Recognition Unit Indian Statistical Institute 203 B. T. Road Calcutta35 India

ISBN: (纸本)0769503187

In a multi-lingual country like India, a document page may contain more than one script form. Under the three-language formula, the document may be printed in English, Devnagari and one of the other official indian languages. For OCR of such a document page, it is necessary to separate these three script forms before feeding them to the OCRs of individual scripts. In this paper, an automatic technique of separating the text lines using script characteristics and shape based features is presented. At present, the system has an overall accuracy of about 98.5%. © 1999 IEEE.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Automatic separation of machine-printed and hand-written text lines 5

Automatic separation of machine-printed and hand-written tex...

引用

5th International Conference on Document Analysis and recognition, ICDAR 1999

作者： Pal, U. Chaudhuri, B.B. Computer Vision and Pattern Recognition Unit Indian Statistical Institute 203 B. T. Road Calcutta700 035 India

ISBN: (纸本)0769503187

There are many types of documents where machine-printed and hand-written texts appear intermixed. Since the optical character recognition (OCR) methodologies for machine-printed and hand-written texts are different, it is necessary to separate these two types of text before feeding them to the respective OCR systems. In this paper, we present such a scheme for both Bangla and Devnagari characters. The scheme is based on the structural and statistical features of the machine-printed and hand-written text lines. The classification scheme has an accuracy of about 98.3%. © 1999 IEEE.

关键词： Optical character recognition

来源：评论

学校读者我要写书评

暂无评论

An approach for processing mathematical expressions in printed document 3rd

An approach for processing mathematical expressions in print...

引用

3rd IAPR Workshop on Document Analysis Systems, DAS 1998

作者： Chaudhuri, B.B. Gamin, U. Computer Vision and Pattern Recognition Unit Indian Statistical Institute 203 B. T. Road Calcutta700 035 India

ISBN: (纸本)3540665072

In this paper, we propose an approach for understanding mathematical expressions in printed document. The system consists of three main components namely (i) detection of mathematical expressions in a document, (ii) recognition of the symbols present in the expression and (iii) meaningful arrangement of the recognized symbols. However, detection of mathematical expressions is done through recognition of symbols. Moreover, some structural features of the expressions are also used for this purpose. For recognition of the symbols a hybrid of feature based and template based recognition techniques is used. The bounding-box coordinates and the size information of the symbols help to determine the spatial relationships among the symbols. A set of predefined grammar rules is used to form the meaningful symbol groups to properly arrange the symbols. Experiments conducted using these approaches on a large number of documents show high accuracy. © Springer-Verlag Berlin Heidelberg 1999.

关键词： computers

来源：评论

学校读者我要写书评

暂无评论

Extraction of type style based meta-information from imaged documents

Extraction of type style based meta-information from imaged ...

引用

International Conference on Document Analysis and recognition

作者： U. Garain B.B. Chaudhuri Computer Vision & Pattern Recognition Unit Indian Statistical Institute Calcutta India

关键词： Optical character recognition software Data mining Sections computer vision pattern recognition Postal services Read only memory Pressing Search engines Image converters

来源：评论

学校读者我要写书评

暂无评论

Segmentation of Bangla handwritten text into characters by recursive contour following

Segmentation of Bangla handwritten text into characters by r...

引用

International Conference on Document Analysis and recognition

作者： A. Bishnu B.B. Chaudhuri Computer Vision and Pattern Recognition Unit Indian Statistical Institute Calcutta India

关键词： Image segmentation Optical character recognition software Hidden Markov models pattern recognition Image analysis Lattices computer vision Writing Feature extraction Linear programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：