We present a novel approach to the selective annotation of large corpora through the use of machine learning. Linguistic search engines used to locate potential instances of an infrequent phenomenon do not support ran...
详细信息
Cross-lingual annotation projection methods can benefit from rich-resourced languages to improve the performance of Natural languageprocessing (NLP) tasks in less-resourced languages. In this research, Malay is exper...
详细信息
ISBN:
(纸本)9789897580741
Cross-lingual annotation projection methods can benefit from rich-resourced languages to improve the performance of Natural languageprocessing (NLP) tasks in less-resourced languages. In this research, Malay is experimented as the less-resourced language and English is experimented as the rich-resourced language. the research is proposed to reduce the deadlock in Malay computational linguistic research due to the shortage of Malay tools and annotated corpus by exploiting state-of-the-art English tools. this paper proposed a cross-lingual annotation projection based on word alignment of two languages with syntactical differences. A word alignment method known as MEWA (Malay-English Word Aligner) that integrates a Dice Coefficient and bigram string similarity measure is proposed. MEWA is experimented to automatically induced annotations using a Malay test collection on terrorism and an identified English tool. In the POS annotation projection experiment, the algorithm achieved accuracy rate of 79%.
As the number of international users on Facebook has increased, users with multi-lingual backgrounds showed their diversity in language selections. Some research has studied how Facebook affects users' social capi...
详细信息
ISBN:
(纸本)9783319209340;9783319209333
As the number of international users on Facebook has increased, users with multi-lingual backgrounds showed their diversity in language selections. Some research has studied how Facebook affects users' social capital. this study examines users' language selection behaviors and analyzes their selections by applying the concept of cultural capital. It aims to provide information to designers to develop cross-cultural applications or web pages on Facebook, for example, advanced translation tools or different language versions of web pages with cultural elements. the cross-cultural design might attract more international users and improve the website's usability. through observing 83 active Facebook users' status updates and interviews with 10 users with multi-lingual backgrounds, we find that audience, locality and context are three important factors that affect users' language selections. It showed that users' language proficiency plays a role when users choose the language in their posts and comments. the characteristics of different languages also affect users' language selections when they update their status and interact with other users on Facebook. Furthermore, some users prefer to use their native languages or heritage languages other than English on Facebook because they want to show their cultural capital and keep their cultural heritage alive.
Dependency parsing is considered a key technology for improving information extraction tasks. Research indicates that dependency parsers spend more than 95% of their total runtime on feature computations. Based on thi...
详细信息
ISBN:
(纸本)9789897581649
Dependency parsing is considered a key technology for improving information extraction tasks. Research indicates that dependency parsers spend more than 95% of their total runtime on feature computations. Based on this insight, this paper investigates the potential of improving parsing throughput by designing feature representations which are optimized for combining single features to more complex feature templates and by optimizing parser constraints. Applying these techniques to MDParser increased its throughput four fold, yielding Syntactic Parser, a dependency parser that outperforms comparable approaches by factor 25 to 400.
Integrating spatial information is a crucial step in construction of Urban Planning Domain Ontology (UPDO), and taking spatial information from the web as the input of self-learning method are commonly used in constru...
详细信息
the essential purpose of Business Process Management (BPM) is to construct processes which yield a profit for enterprise. In today’s business world, there is a strong need for adopting a BPM approach based on Service...
详细信息
Network forensics is a method of obtaining and analyzing digital evidences from network sources. Network forensics includes data acquisition, selection, processing, analysis and presentation to investigators. Due to h...
详细信息
ISBN:
(数字)9783319255125
ISBN:
(纸本)9783319255125;9783319255118
Network forensics is a method of obtaining and analyzing digital evidences from network sources. Network forensics includes data acquisition, selection, processing, analysis and presentation to investigators. Due to high volumes of transmitted data the acquired information can be incomplete, corrupted, or disordered which makes further reconstruction difficult. In this paper, we address the issue of advanced parsing and reconstruction of incomplete, corrupted, or disordered data packets. We introduce a technique that recovers TCP or UDP conversations so they could be further analyzed by application parsers. Presented technique is implemented in a new network forensic tool called Netfox Detective. We also discuss current challenges in parsing web mail communication, SSL decryption and Bitcoins detection.
Word Sense Disambiguation (WSD) is the process of identifying the proper sense of an ambiguous word depending on the particular context. It is to find the accurate sense s_i among the set of senses {s_1, s_2, ..., s_n...
详细信息
ISBN:
(纸本)9781467382878
Word Sense Disambiguation (WSD) is the process of identifying the proper sense of an ambiguous word depending on the particular context. It is to find the accurate sense s_i among the set of senses {s_1, s_2, ..., s_n}. this task was motivated by its interpretation in various Natural languageprocessing (NLP) applications like IR, MT, QA, TC, SP etc. In this paper, machine learning technique - Naive Bayes Classifier was used for automatic disambiguation task. Training data was prepared with sense annotated features. For preparing sense annotated data we took help of the sense inventory. Currently, about 160 ambiguous words are present in the sense inventory derived from 18K and 25K words from Assamese Corpus and WordNet. the system is implemented in two phases. In the first phase, a total of 2.7K sense annotated training data and 800 test data were taken and a result of 71% accuracy was found. Analyzing the result depicts that accuracy improves as the training data size gradually increases and by the learned model generated in the previous iteration. In second phase we manually validate the outcomes of first-phase and we add those clean sense tagged data to previous training data set. than we train our system with our incresing training data (3.5K) which enhance the result accuracy. An iterative learning is adopted by the system and more accuracy of 7% is achieved. this paper aims to implement Assamese WSD system by NB classifier using lexical features and enhancement of the baseline method turns out in improving the classifier accuracy to 78%.
Aspect-Oriented Programming (AOP) is a technology for the decomposition of software systems based on cross-cutting concerns. As shown in our previous work, cross-cutting concerns are also present in ontologies, and As...
详细信息
We consider the task of finding frequent parallel episodes in parallel point processes (or event sequences), allowing for imprecise synchrony of the events constituting occurrences (temporal imprecision) as well as in...
详细信息
暂无评论