In today's world of globalization, local language storage and retrieval is essential for the developing nations like India. As our country is diversified by languages and only 10% of population is aware of English...
详细信息
ISBN:
(纸本)9781424452507
In today's world of globalization, local language storage and retrieval is essential for the developing nations like India. As our country is diversified by languages and only 10% of population is aware of English language, this diversity of languages is becoming barrier to understand and acquainted in digital world. It has been found that when services are provided in local languages, it has been strongly accepted and used. We propose a new methodology called 'Information Retrieval in Multilingual Environment' that provides the functionality of processing and retrieval of Indian languages. We propose a way of processing multilingual information where backend will be English database and front-end uses local languages like Hindi, Marathi or Gujrathi. Our system provides an interface to enter a keyword in local language, the keyword will be parsed, query will be formed and display the result in local language. We had developed an efficient algorithm to consider the tolerance by means of possible ways of writing styles in providing the keyword to extract exact data. We had taken a Shopping Mall as a domain, where many users need information in their local languages.
As various applications of wireless ad hoc networks have been proposed, security has become one of the salient research challenges and is receiving increasing attention. Recently, several security schemes for wireless...
详细信息
ISBN:
(数字)9783642157660
ISBN:
(纸本)9783642157653
As various applications of wireless ad hoc networks have been proposed, security has become one of the salient research challenges and is receiving increasing attention. Recently, several security schemes for wireless ad hoc networks have been proposed using encryption schemes. Our proposed scheme is more secure against adaptive chosen cipher text attacks because of randomness. In this paper we propose an algorithm that focuses on encryption of plain text over a range of languages supported by unicode. Our assumption is to make sure that sender and receiver must have share Secret key and Mapping array before establishing communication.
chared is a system which can detect character encoding of a text document provided the language of the document is known. The system supports a wide range of languages and the most commonly used character encodings. W...
详细信息
ISBN:
(纸本)9788026300779
chared is a system which can detect character encoding of a text document provided the language of the document is known. The system supports a wide range of languages and the most commonly used character encodings. We explain the details of the algorithm, describe the process of creating models for various languages and present results of an evaluation on a collection of Web pages.
New Mexico Stale University's Computing Research Lab has participated in research in ail three phases of the US Government's Tipster program. Our work on information retrieval has focused on research and devel...
详细信息
New Mexico Stale University's Computing Research Lab has participated in research in ail three phases of the US Government's Tipster program. Our work on information retrieval has focused on research and development of multilingual and cross-language approaches to automatic retrieval. The work on automatic systems has been supplemented by additional research into the role of the IR system user in interactive retrieval scenarios: monolingual, multilingual and cross-language. The combined efforts suggest that "universal" text retrieval. in which a user can find, access and use documents in the face of language differences and information overload, may be possible.
Text search is a well-known problem in computer science where the valid shifts of a pattern P in a text string T are found. This paper shows how to speed up text search by searching for P in a compressed version of T....
详细信息
Text search is a well-known problem in computer science where the valid shifts of a pattern P in a text string T are found. This paper shows how to speed up text search by searching for P in a compressed version of T. A fast compression algorithm was designed for this aim. This algorithm is based on the assumption that T is restricted to the letters of a single natural language. Relying on this assumption, a letter, in T or P, is encoded into a single byte instead of the two-byte unicode which shortens the string on which a text search algorithm works. The main disadvantage of this approach is the restriction of the alphabet of T to be from a single natural language. However, wide range of text documents comply to this assumption. Another issue is the overhead that is required to compress P and T, but it was found that the proposed compression algorithm is so fast such that its run-time can be paid for and still save text search time. Different approaches to store compressed T are also explored. The conducted experimental study showed that this approach does actually reduce the text search time.
Cryptography and steganography are foremost techniques used to ensure security and confidentiality of secret information. In steganography, data hiding capacity with security is a great challenge for the researchers. ...
详细信息
Cryptography and steganography are foremost techniques used to ensure security and confidentiality of secret information. In steganography, data hiding capacity with security is a great challenge for the researchers. In present scheme, a novel approach with combination of steganography and cryptography is proposed to achieve high data hiding capacity with greater security. To accomplish this goal, three level encryption is applied to hide secret message by using bit complement and bit right rotation. Different unicode characters such as zero width non-joiner (ZWNJ), zero width joiner (ZWJ) and zero width character (ZWC), are used to conceal secret information into English cover text. To embed secret data into cover text, firstly, algorithm implements three level encryption on confidential data and then resulting binary is embedded into cover text by using unicode characters. The results revealed that recently designed algorithm has higher data security due to light weight and efficient encryption mechanism along with 2-bit/char cover text capacity. Moreover, there is no overhead of secret key generation and exchange from source to destination. Projected technique is easy to implement, hard to break and reduces intruder's attention.
Preserving old archives with readable and editable structure helps people to gain additional experience. Tulu is one of five noteworthy Dravidian dialect with numerous Tulu historical documents which are available wit...
详细信息
ISBN:
(纸本)9781509008490
Preserving old archives with readable and editable structure helps people to gain additional experience. Tulu is one of five noteworthy Dravidian dialect with numerous Tulu historical documents which are available within handwritten form. Tulu scripts are rich in patterns with many combinations of connected characters. Henceforth, machine recognition is a major challenge. Till now, no strategy is reported to recognize the Tulu script which is an ancient script in South India. The main aim of this paper is to introduce the salient features of Tulu script and listing the approaches utilized for handwritten character recognition. Subsequently, giving future research directions on recognition and understanding of Tulu script.
Forensic examiners are frequently confronted with content in languages that they do not understand, and they could benefit from machine translation into their native language. But automated translation of file paths i...
详细信息
Forensic examiners are frequently confronted with content in languages that they do not understand, and they could benefit from machine translation into their native language. But automated translation of file paths is a difficult problem because of the minimal context for translation and the frequent mixing of multiple languages within a path. This work developed a prototype implementation of a file-path translator that first identifies the language for each directory segment of a path, and then translates to English those that are not already English nor artificial words. Brown's LA-Strings utility for language identification was tried, but its performance was found inadequate on short strings and it was supplemented with clues from dictionary lookup, unicode character distributions for languages, country of origin, and language-related keywords. To provide better data for language inference, words used in each directory over a large corpus were aggregated for analysis. The resulting directory-language probabilities were combined with those for each path segment from dictionary lookup and character-type distributions to infer the segment's most likely language. Tests were done on a corpus of 50.1 million file paths looking for 35 different languages. Tests showed 90.4% accuracy on identifying languages of directories and 93.7% accuracy on identifying languages of directory/file segments of file paths, even after excluding 44.4% of the paths as obviously English or untranslatable. Two of seven proposed language clues were shown to impair directory-language identification. Experiments also compared three translation methods: the Systran translation tool, Google Translate, and word-for-word substitution using dictionaries. Google Translate usually performed the best, but all still made errors with European languages and a significant number of errors with Arabic and Chinese. Published by Elsevier Ltd.
Libraries have been able to provide support for one non-Roman character set plus English for some time. However, as libraries throughout the world become automated, it is increasingly important to offer support for mu...
详细信息
ISBN:
(纸本)0938734849
Libraries have been able to provide support for one non-Roman character set plus English for some time. However, as libraries throughout the world become automated, it is increasingly important to offer support for multiple languages and multiple character sets. This paper discusses a number of the problems raised through developing a multi-character sets library system.
YES is a simplified stroke-based method for sorting Chinese characters. It is free from stroke counting and grouping, and thus much faster and more accurate than the traditional method. This paper presents a collation...
详细信息
ISBN:
(数字)9783319258164
ISBN:
(纸本)9783319258164;9783319258157
YES is a simplified stroke-based method for sorting Chinese characters. It is free from stroke counting and grouping, and thus much faster and more accurate than the traditional method. This paper presents a collation element table built in YES for a large joint Chinese character set covering (a) all 20,902 characters of unicode CJK Unified Ideographs, (b) all 11,408 characters in the Complete List of Chinese Characters Used by the Media in 2013, (c) all 13,000 plus characters in the latest versions of Xinhua Dictionary(v11) and Contemporary Chinese Dictionary(v6). Of the 20,902 Chinese characters in unicode, 97.23% have one-to-one relationship with their stroke order codes in YES, comparing with 90.69% of the traditional method. Enhanced with the secondary and tertiary sorting levels of stroke layout and unicode value, there is a guarantee of one-to-one relationship between the characters and collation elements. The collation element table has been successfully applied to sorting CC-CEDICT, a Chinese-English dictionary of over 112,000 word entries.
暂无评论