检索结果-内蒙古大学图书馆

International Conference on Electrical, Computer and Communication Engineering

作者： Linkon Barua Pranab Kumar Dhar Lamia Alam Isao Echizen Department of CSE Chittagong University of Engineering and Technology (CUET) Chittagong-4349 Bangladesh Digital Content and Media Sciences Research Division National Institute of Informatics (NII) Tokyo 101-8430 Japan

ISBN: (纸本)9781509056286

Text compression algorithm performs compression at the character level. Bangla text has some unique features such as no distinct upper and lower case letter, consonant cluster (CC) and consonant with dependent vowel sign (CV) etc. The conventional Lempel-Ziv-Welch (LZW) algorithm is not suitable for compressing Bangle text. Therefore, in this paper, we propose a modified LZW (MLZW) algorithm which can compress Bangla text effectively and efficiently. In our proposed method, a dictionary with unicode ranges from 1-90 is used for Bangla characters. The compression process is started with checking the input character. If input character is a part of CC or CV, then CC or CV is considered as a character and search it in the dictionary. If the character to be encoded is already in dictionary, encode it with the dictionary index. Otherwise, the character is added to the dictionary and is encoded with its corresponding dictionary index. Simulation results indicate that the proposed MLZW algorithm compresses Bangla text effectively and efficiently. We observed that the proposed MLZW provides higher compression rate approximately 3% for dictionary index and 33% for output sequence compared with LZW algorithm.

关键词： Bangla Text ASCII code LZW Data Compression unicode Consonant Cluster

来源：评论

学校读者我要写书评

暂无评论

Mobile Device Keyboard Customization for a Newly Constructed Orthography of a Rural West African Language 16

Mobile Device Keyboard Customization for a Newly Constructed...

引用

8th International Conference on Information and Communication Technologies and Development (ICTD)

作者： Showalter, Esther H. Johns Hopkins Univ Baltimore MD 21218 USA

ISBN: (纸本)9781450343060

In this note the author describes an initiative to create a keyboard for Android mobile devices that can type characters for a West African language called Kaansa, spoken by perhaps 10,000 Kaan people in Burkina Faso. The Kaan community has only recently established a written orthography and begun formal literacy training for adults and youths. This note examines certain currently available mobile technologies to allow texting in Kaansa and considers future efforts to measure the impact of such technologies on the literacy rate among several demographics.

关键词： Literacy keyboard language rural West Africa Burkina Faso Kaan gan gna unicode International Phonetic Alphabet mobile Android Tavultesoft Keyman diacritic orthography

来源：评论

学校读者我要写书评

暂无评论

The Arabic unicode characters: State of the art

The Arabic Unicode characters: State of the art

引用

Workshop on Signal and Document Processing, SIDOP 2012

作者： Daagi, Manel Haboubi, Sofiene Signal and Document Processing Research Group National Engineering School of Tunis Belvedere 1002 Tunis Tunisia

来源：评论

学校读者我要写书评

暂无评论

A Semi-automatic Approach to Identifying and Unifying Ambiguously Encoded Arabic-Based Characters 20

A Semi-automatic Approach to Identifying and Unifying Ambigu...

引用

International Conference on Asian Language Processing (IALP)

作者： Jaf, Sardar Univ Durham Sch Engn & Comp Sci Durham England

ISBN: (纸本)9781509009220

In this study, we outline a potential problem in normalising texts that are based on a modified version of the Arabic alphabet. One of the main resources available for processing resource-scarce languages is raw text collected from the Internet. Many less-resourced languages, such as Kurdish, Farsi, Urdu, Pashtu, etc., use a modified version of the Arabic writing system. Many characters in harvested data from the Internet may have exactly the same form but encoded with different unicode values (ambiguous characters). The existence of ambiguous characters in words leads to word duplication, thus it is important to identify and unify ambiguous characters during the normalisation stage. Here, we demonstrate cases related to ambiguous Kurdish and Farsi characters and propose a semiautomatic approach to identifying and unifying them.

关键词： Kurdish Sorani unicode lexicography

来源：评论

学校读者我要写书评

暂无评论

A Framework for Recognition of Handwritten South Dravidian Tulu Script

A Framework for Recognition of Handwritten South Dravidian T...

引用

Conference on Advances in Signal Processing (CASP)

作者： Antony, P. J. Savitha, C. K. KVGCE Dept Comp Sci & Engn Sullia Dk India

ISBN: (纸本)9781509008490

Preserving old archives with readable and editable structure helps people to gain additional experience. Tulu is one of five noteworthy Dravidian dialect with numerous Tulu historical documents which are available within handwritten form. Tulu scripts are rich in patterns with many combinations of connected characters. Henceforth, machine recognition is a major challenge. Till now, no strategy is reported to recognize the Tulu script which is an ancient script in South India. The main aim of this paper is to introduce the salient features of Tulu script and listing the approaches utilized for handwritten character recognition. Subsequently, giving future research directions on recognition and understanding of Tulu script.

关键词： Tulu Script Handwritten Character Recognition Palm Leaf Manuscript unicode

来源：评论

学校读者我要写书评

暂无评论

unicode han character lookup service based on similar radicals

引用

International Journal of Smart Home 2012年第3期6卷 99-106页

作者： Lin, Jeng-Wei Lin, Feng-Sheng Department of Information Management Tunghai University Taiwan Institute of Information Science Academia Sinica Taiwan

unicode 6.1 (2012) had encoded more than 74,000 Han characters. This great repertory could solve the problem of unencoded Han characters to a significant extent. However, most information systems today still only support input and display of the first 20,902 encoded Han characters in unicode 1.0 (1991). Even in latest systems, designed to support 32-bit unicode and with suitable fonts installed, it is not easy to use these newly encoded Han characters. We note that many of these newly encoded Han characters are rarely used in users' everyday life. An ordinary user may have confusions of their glyph shapes, pronunciations, meanings, and usages. IMEs (input method editors) for Han characters usually require users to have good knowledge of wanted Han characters. It is not unusual users try but fail to input unfamiliar Han characters. In this paper, we present an auxiliary unicode Han character lookup service by radicals. One can use any Han character IME to key in one or more radicals to look up a wanted Han character. Every unicode Han character is decomposed as a glyph expression of radicals. The similarity between the glyph expression and user input is estimated by a derived edit distance algorithm. The most similar unicode Han characters are returned. As a result, the system provides users a convenient way to look up unfamiliar unicode Han characters.

关键词： Edit distance Glyph expression Han character lookup Radicals unicode

来源：评论

学校读者我要写书评

暂无评论

unicode国际音标输入法简述

引用

民族语文 2012年第5期 62-64页

作者：李龙王奕桦广西经济管理干部学院 530007 上海师范大学语言研究所 200234

国际音标自1888年确定一字一音形式至今历经数次增删,使用的字母及附加符号逾150个,远远超过键盘的键位数,要在QWERTY键盘上输入国际音标字母,就需要设计一种"多键位——单字符"的输入方式并利用某些软件层面的机制编写相应... 详细信息

国际音标自1888年确定一字一音形式至今历经数次增删,使用的字母及附加符号逾150个,远远超过键盘的键位数,要在QWERTY键盘上输入国际音标字母,就需要设计一种"多键位——单字符"的输入方式并利用某些软件层面的机制编写相应的输入法。为方便计,下文将unicode内收录的所有国际音标字母及附加符号的总集合称为国际音标字符集。

关键词：国际音标 unicode 输入法附加符号输入方式字母键盘软件

来源：评论

学校读者我要写书评

暂无评论

Speech Therapy System to Kannada Language 2

Speech Therapy System to Kannada Language

引用

2nd International Conference on Cognitive Computing and Information Processing (CCIP)

作者： Udayashankara, V. Havalgi, Swapna SJCE Dept IT Mysore Karnataka India

ISBN: (纸本)9781509010257

This paper presents an alternative communication technique to help people suffering from speech and language difficulties for various reasons. Electronic Speech synthesis is a process of generating human like speech from any text input to emulate human speaker. The objective of text to speech system is to convert an arbitrary Kannada text into its corresponding spoken waveform, using phoneme as basic unit for speech synthesis. A standard syllable level speech database consisting of 525 syllables is built for synthesizing naturally sounding speech. The main advantage of this system is the real time approach for conversion of entered text to corresponding speech. The initial and the final points of a speech waveform are determined using Maximum energy and zero crossing rate. The Unit selection based concatenation method is opted for syllable concatenation and the system is implemented using MATLAB.

关键词： Text processing Kannada Maximum Energy Zero Crossing rate Unit selection Concatenation Praat speech synthesis unicode

来源：评论

学校读者我要写书评

暂无评论

Haar Features based Handwritten Character Recognition System for Tulu Script 1

Haar Features based Handwritten Character Recognition System...

引用

IEEE International Conference on Recent Trends in Electronics, Information and Communication Technology (RTEICT)

作者： Antony, P. J. Savitha, C. K. Ujwal, U. J. KVGCE Dept Comp Sci & Engn Sullia Dk India

ISBN: (纸本)9781509007745

Automatic recognition of handwritten characters from scanned images helps to convert characters in an image into convenient editable and readable form. Tulu is a south Indian Dravidian language with rich set of handwritten patterns. This paper presents an approach to recognize the Tulu script using automatic character recognition mechanism. The recognition of handwritten Tulu characters is based on the AdaBoost algorithm using Haar features. Finally, recognized characters are mapped into an equivalent editable document of Kannada characters. Hence, make it to readable for the next generation by digital technology.

关键词： Boosting Character Recognition Tulu unicode

来源：评论

学校读者我要写书评

暂无评论

Optical Character Recognition (OCR) System for Roman Script & English Language using Artificial Neural Network (ANN) Classifier

Optical Character Recognition (OCR) System for Roman Script ...

引用

International Conference on Research Advances in Integrated Navigation Systems (RAINS)

作者： Mehta, Honey Singla, Sanjay Mahajan, Aarti IET Bhaddal Ropar Punjab India

ISBN: (纸本)9781509011117

Character recognition from scanned images is a very complex task. But as for record keeping we require all the data in digital format to perform various manipulation operations. The main issue in case of character recognition is the different styles and fonts in which the text is written. We proposed a new approach by using the concept of Artificial Neural Network and Nearest Neighbour approach for character recognition from scanned images. Three layers are used for classification purpose. First is the input layer consist the input given by the segmented characters, then hidden layer consist the neurons trained by the training network and the output layer consist output neurons to generate unicode.

关键词： unicode Artificial Neural Network Scanned images

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：