For finding nonroman script library materials, catalogs with romanized access points alone are inadequate because they are unfamiliar to those who seek these materials. Relevant writers are surveyed. Information techn...
详细信息
For finding nonroman script library materials, catalogs with romanized access points alone are inadequate because they are unfamiliar to those who seek these materials. Relevant writers are surveyed. Information technology and MARC have eliminated the need to rely on card filers who knew only the order of letters in the roman alphabet. Two improvements are suggested: expand the MARC character repertoire and add rules to AACR to allow nonroman access points. Other issues are briefly described. (C) 2006 by The Haworth Press, Inc. All rights reserved.
The internationalized domain name (IDN) is a mechanism that enables us to use unicode characters in domain names. The set of unicode characters contains several pairs of characters that are visually identical with eac...
详细信息
ISBN:
(纸本)9781450369480
The internationalized domain name (IDN) is a mechanism that enables us to use unicode characters in domain names. The set of unicode characters contains several pairs of characters that are visually identical with each other;e.g., the Latin character 'a' (U+0061) and Cyrillic character a' (U+0430). Visually identical characters such as these are generally known as homoglyphs. IDN homograph attacks, which are widely known, abuse unicode homoglyphs to create lookalike URLs. Although the threat posed by IDN homograph attacks is not new, the recent rise of IDN adoption in both domain name registries and web browsers has resulted in the threat of these attacks becoming increasingly widespread, leading to large-scale phishing attacks such as those targeting cryptocurrency exchange companies. In this work, we developed a framework named "ShamFinder," which is an automated scheme to detect IDN homographs. Our key contribution is the automatic construction of a homoglyph database, which can be used for direct countermeasures against the attack and to inform users about the context of an IDN homograph. Using the ShamFinder framework, we perform a large-scale measurement study that aims to understand the IDN homographs that exist in the wild. On the basis of our approach, we provide insights into an effective countermeasure against the threats caused by the IDN homograph attack.
Text compression algorithm performs compression at the character level. Bangla text has some unique features such as no distinct upper and lower case letter, consonant cluster (CC) and consonant with dependent vowel s...
详细信息
ISBN:
(纸本)9781509056279
Text compression algorithm performs compression at the character level. Bangla text has some unique features such as no distinct upper and lower case letter, consonant cluster (CC) and consonant with dependent vowel sign (CV) etc. The conventional Lempel-Ziv-Welch (LZW) algorithm is not suitable for compressing Bangle text. Therefore, in this paper, we propose a modified LZW (MLZW) algorithm which can compress Bangla text effectively and efficiently. In our proposed method, a dictionary with unicode ranges from 1-90 is used for Bangla characters. The compression process is started with checking the input character. If input character is a part of CC or CV, then CC or CV is considered as a character and search it in the dictionary. If the character to be encoded is already in dictionary, encode it with the dictionary index. Otherwise, the character is added to the dictionary and is encoded with its corresponding dictionary index. Simulation results indicate that the proposed MLZW algorithm compresses Bangla text effectively and efficiently. We observed that the proposed MLZW provides higher compression rate approximately 3% for dictionary index and 33% for output sequence compared with LZW algorithm.
Recent researches regarding information hiding is mostly concentrating on Linguistic steganography. In this paper, a method to steganography is proposed with an Indian local language, Malayalam. The proposed method co...
详细信息
Recent researches regarding information hiding is mostly concentrating on Linguistic steganography. In this paper, a method to steganography is proposed with an Indian local language, Malayalam. The proposed method consists of custom unicode based technique with embedding based on indexing, i.e. the original message is encoded to Malayalam text with custom unicode values generated for the Malayalam text. The comparison study of the proposed method against an existing method revealed that, the proposed steganography methods is more precise in the encoding process and in the decoding process. The method achieved a precision rate of .95 and decoding rate of .81. (C) 2015 The Authors. Published by Elsevier B.V.
Attackers seeking to deceive web users into visiting malicious websites can exploit limitations of the tools intended to help browsers translate domain names containing non-ASCII characters, or internationalized domai...
详细信息
ISBN:
(纸本)9781728138763
Attackers seeking to deceive web users into visiting malicious websites can exploit limitations of the tools intended to help browsers translate domain names containing non-ASCII characters, or internationalized domain names (IDNs). These attacks, called homograph phishing, involve registering unicode domain names that are visually similar to legitimate ones but direct users to distinct servers. Tools exist to identify when domains use non-ASCII characters, which get translated by the Punycode protocol to work with the Domain Name System (DNS);however, these tools cannot automatically distinguish between benign use cases and ones with malicious intent, leading to high rates of false-positive alerts and increasing the workload of analysts looking for evidence of homograph phishing. To address this problem, we present PunyVis, a visual analytics system for exploring and identifying potential homograph attacks on large network datasets. By targeting instances of Punycode that use easily-confusable ASCII characters to spoof popular websites, PunyVis quickly condenses large datasets into a small number of potentially malicious records. Using the interactive tool, analysts can evaluate potential phishing instances and view supporting information from multiple data sources, as well as gain insight about overall risk and threat regarding homograph attacks. We demonstrate how PunyVis supports analysts in a case study with domain experts, and identified divergent analysis strategies and the need for interactions that support how analysts begin exploration and pivot around hypotheses. Finally, we discuss design implications and opportunities for cyber visual analytics.
Forensic examiners are frequently confronted with content in languages that they do not understand, and they could benefit from machine translation into their native language. But automated translation of file paths i...
详细信息
Forensic examiners are frequently confronted with content in languages that they do not understand, and they could benefit from machine translation into their native language. But automated translation of file paths is a difficult problem because of the minimal context for translation and the frequent mixing of multiple languages within a path. This work developed a prototype implementation of a file-path translator that first identifies the language for each directory segment of a path, and then translates to English those that are not already English nor artificial words. Brown's LA-Strings utility for language identification was tried, but its performance was found inadequate on short strings and it was supplemented with clues from dictionary lookup, unicode character distributions for languages, country of origin, and language-related keywords. To provide better data for language inference, words used in each directory over a large corpus were aggregated for analysis. The resulting directory-language probabilities were combined with those for each path segment from dictionary lookup and character-type distributions to infer the segment's most likely language. Tests were done on a corpus of 50.1 million file paths looking for 35 different languages. Tests showed 90.4% accuracy on identifying languages of directories and 93.7% accuracy on identifying languages of directory/file segments of file paths, even after excluding 44.4% of the paths as obviously English or untranslatable. Two of seven proposed language clues were shown to impair directory-language identification. Experiments also compared three translation methods: the Systran translation tool, Google Translate, and word-for-word substitution using dictionaries. Google Translate usually performed the best, but all still made errors with European languages and a significant number of errors with Arabic and Chinese. Published by Elsevier Ltd.
One of the basic features facilitating communication on the Internet in a variety of languages is unicode code-layout. It standardizes the representation of most of the world's writing systems on digital media, th...
详细信息
One of the basic features facilitating communication on the Internet in a variety of languages is unicode code-layout. It standardizes the representation of most of the world's writing systems on digital media, thus enabling the process and transmission of information through such technologies. unicode is a contemporary character code, and this paper traces its evolvement out of previous code-layouts, starting with Morse code in telegraphy. Focusing on the adaptations of character codes to Modern Hebrew, I show how representing languages in technology is intertwined with internal and transnational regional concerns, and argue that from its beginning character code has been a locus of struggle over power and sovereignty: first between colonial regimes and resistance movements, and then between global corporations and local agents.
What problems do e-documents with mathematical expressions in an Arabic presentation present? In addition to the known difficulties of handling mathematical expressions based on Latin script on the Web, Arabic mathema...
详细信息
ISBN:
(纸本)3540228012
What problems do e-documents with mathematical expressions in an Arabic presentation present? In addition to the known difficulties of handling mathematical expressions based on Latin script on the Web, Arabic mathematical expressions flow from right to left and use specific symbols with a dynamic cursivity. How might we extend the capabilities of tools such as MathML in order to structure Arabic mathematical e-documents? Those are the questions this paper will deal with. It gives a brief description of some steps toward an extension of MathML to mathematics in Arabic exposition. In order to evaluate it, this extension has been implemented in Mozilla.
This paper contemplates the concept of typographical advocacy, defined here as a variety of activities, strategies, and policies designed to increase or enhance language support in computing systems, facilitating typi...
详细信息
This paper contemplates the concept of typographical advocacy, defined here as a variety of activities, strategies, and policies designed to increase or enhance language support in computing systems, facilitating typing and displaying texts in local languages. Focusing on the cases of Spanish and Paraguayan Guarani, the paper traces some existing efforts in the domain of typographical advocacy and strives to explain their dynamics. An attempt is made to examine both situations in which typographical advocacy is visible and explicit (as in the case of Spanish) and cases in which advocacy seems negligible (as in the case of Guarani). A wide range of enabling and hindering factors are examined, such as the relations between tech companies, nation-state institutions, consumers, and the habitus created by the use of technology. Through these examples, the rationale for advocacy is explored as well as alternative courses of action in cases where advocacy is not desired or fails to achieve its goals. A case will be made for raising the awareness of typographical issues in language planning and policy (LPP) both as part of broader multilingual awareness and as a tool for solving practical language problems in times of increased dependency on computing and mobile devices.
In the present paper, we develop a teaching methodology for economic theory. The main contribution of this paper relies on combining the interactive characteristics of spreadsheet programs such as Excel and unicode pl...
详细信息
In the present paper, we develop a teaching methodology for economic theory. The main contribution of this paper relies on combining the interactive characteristics of spreadsheet programs such as Excel and unicode plain-text linear format for mathematical expressions. The advantage of unicode standard rests on its ease for writing and reading mathematical expressions. In this sense, our proposal allows incorporating an easily readable and writable methodology to cope with math expressions when interactive spreadsheets are used and designed in Economics teaching. The resulting nearly plain text can be used with few or no modifications in other numerical computing programs.
暂无评论