We propose a linguistically driven approach to represent discourse relations in Chinese text as sequences. We observe that certain surface characteristics of Chinese texts, such as the order of clauses, are overt mark...
详细信息
We increase the lexical coverage of FrameNet through automatic paraphrasing. We use crowdsourcing to manually filter out bad paraphrases in order to ensure a high-precision resource. Our expanded FrameNet contains an ...
We present a novel, count-based approach to obtaining inter-lingual word representations based on inverted indexing of Wikipedia. We present experiments applying these representations to 17 datasets in document classi...
详细信息
the increasing accessibility and availability of online data provides a valuable knowledge source for information analysis and decision-making processes. In this paper we argue that extracting information from this da...
详细信息
ISBN:
(纸本)9789897581649
the increasing accessibility and availability of online data provides a valuable knowledge source for information analysis and decision-making processes. In this paper we argue that extracting information from this data is better guided by domain knowledge of the targeted use-case and investigate the integration of a knowledge-driven approach with Machine Learning techniques in order to improve the quality of the Relation Extraction process. Targeting the financial domain, we use Semantic web Technologies to build the domain Knowledgebase, which is in turn exploited to collect distant supervision training data from semantic linked datasets such as DBPedia and Freebase. We conducted a serious of experiments that utilise the number of Machine Learning algorithms to report on the favourable implementations/ configuration for successful information Extraction for our targeted domain.
We present Collocation Assistant, a prototype of a collocational aid designed to promote the collocational competence of learners of Japanese as a second language (JSL). Focusing on noun-verb constructions, the tool a...
详细信息
Lexicon plays a key role in Medical languageprocessing (MLP) technology. Construction of semantic lexicon has become the prerequisite of MLP study in China where there are limited clinical terminology resources avail...
详细信息
ISBN:
(纸本)9781467383028
Lexicon plays a key role in Medical languageprocessing (MLP) technology. Construction of semantic lexicon has become the prerequisite of MLP study in China where there are limited clinical terminology resources available. In this study, an iterative machine learning algorithm based on Conditional Random Field (CRF) was proposed aiming to automatically build a symptom lexicon from clinical corpus. A comprehensive evaluation was conducted in terms of exact and inexact for the algorithm. the algorithm achieved the performance, with F-measure of 87.23%, precision and recall were 99.95% and 72.23%, respectively. Furthermore, a lexicon which contained 22,501 symptoms was constructed based on this approach.
Word Sense Disambiguation ( WSD) has become a popular method for solving the ambiguous meaning of the words in information Retrieval ( IR) field area. Under the Natural languageprocessing ( NLP) community, WSD has be...
详细信息
ISBN:
(纸本)9781467378635
Word Sense Disambiguation ( WSD) has become a popular method for solving the ambiguous meaning of the words in information Retrieval ( IR) field area. Under the Natural languageprocessing ( NLP) community, WSD has been described as the task which able to select the appropriate meaning among the ambiguous meanings to a given word. Among three approaches, supervised based, unsupervised based and knowledge based approaches to WSD, this paper focuses on both supervised based and knowledge based approaches by proposing new Jaccard coefficient-based WSD algorithm to overcome the vocabulary miss match problem. WordNet and corpus external knowledge resources are utilized as the sense repositories by linking up withthe new WSD algorithm to consider additional semantic for WSD. According to sample testing, IR system with new WSD algorithm attains more about 20 percent of total accuracy rate than traditional IR system.
Statistical models for reordering source words have been used to enhance the hierarchical phrase-based statistical machine translation system. Existing word reordering models learn the reordering for any two source wo...
详细信息
ISBN:
(纸本)9781941643730
Statistical models for reordering source words have been used to enhance the hierarchical phrase-based statistical machine translation system. Existing word reordering models learn the reordering for any two source words in a sentence or only for two continuous words. this paper proposes a series of separate sub-models to learn reorderings for word pairs with different distances. Our experiments demonstrate that reordering sub-models for word pairs with distance less than a specific threshold are useful to improve translation quality. Compared with previous work, our method may more effectively and efficiently exploit helpful word reordering information.
this paper reports how to build a Chinese Grammatical Error Diagnosis system based on the conditional random fields (CRF). the system can find four types of grammatical errors in learners essays. the four types or err...
详细信息
In this paper we describe some computing tools designed for aiding teaching of the basics of digital electronics and its applications mainly in signal processing for university studies of engineering. In this study we...
详细信息
ISBN:
(纸本)9789897581076
In this paper we describe some computing tools designed for aiding teaching of the basics of digital electronics and its applications mainly in signal processing for university studies of engineering. In this study we have developed two types of teaching tools: firstly, several small JavaScript-based simulation tools for visualizing the basic functions of digital circuits and their hardware design language models, and secondly, an FPGA-based FIR filter system for showing how to perform simple digital signal processing tasks with FPGAs.
暂无评论