Previous works on automatic identification or extraction of collocations from large scale corpora generally make use of certain statistical measures to test for significance of association to yield n-best collocation ...
详细信息
ISBN:
(纸本)0780379020
Previous works on automatic identification or extraction of collocations from large scale corpora generally make use of certain statistical measures to test for significance of association to yield n-best collocation candidates for human scrutiny,optionally with linguistic preprocessing and linguistic *** drawback of these approaches is we can only take advantage of one single statistical test(optionally in association with simple frequency threshold),even though we often calculate the values of several statistical *** exploring a scheme to combine two or more different tests is out of the *** paper reports experiments with machine learning for collocation identification using a variety of statistical association measurements. In particular,we develop a new decision tree learning algorithm based on C4.5 to be used for learning tasks with unbalanced *** experiment results are presented and briefly discussed.
Grammatical Collocation is one of the important lexical knowledge for NLP and getting it automatically is useful for many *** this paper we present a parser based on phrases to extract grammatical collocations of Chin...
详细信息
ISBN:
(纸本)0780379020
Grammatical Collocation is one of the important lexical knowledge for NLP and getting it automatically is useful for many *** this paper we present a parser based on phrases to extract grammatical collocations of Chinese *** the instruction of collocation templates,the parser combine base phrases to common phrases,and to grammatical collocation chunks with a chart-parsing *** the algorithm,we apply a new process sequence referring to the special task,which starts from each verb,then processes to the left and right. In this ways,we are able to achieve precision of 61.4%,55.4%,and 54.8%for science,news and novel corpus respectively.
This paper presents a series of comparative experiments on using statistical classifiers for task *** experiments focus on three aspects: the effect of different features on the performance of the classifier,the robus...
详细信息
ISBN:
(纸本)0780379020
This paper presents a series of comparative experiments on using statistical classifiers for task *** experiments focus on three aspects: the effect of different features on the performance of the classifier,the robustness of classifiers with different features on data variability and the effect of size of training data on the performance of the *** Chinese input sentences,three linguistics units can be used as the features:Chinese characters,Chinese words and semantic constituents. Both advantages and disadvantages of them are analyzed in details.A controlled study using Naive Bayes classifiers is conducted to examine the impact of different features on the performance of classifiers. The classifiers with different features are evaluated respectively on the clean and noisy test data to investigate their *** curves of the classifiers with different features are given to show the effect of size of training data.
<正>Respectable Colleagues and Dear Friends, On behalf of the Program Committee,2003 Beijing internationalconference on naturallanguageprocessing and knowledgeengineering,NLP-KE’03-Beijing,I would like to exten...
<正>Respectable Colleagues and Dear Friends, On behalf of the Program Committee,2003 Beijing internationalconference on naturallanguageprocessing and knowledgeengineering,NLP-KE’03-Beijing,I would like to extend to you our warmest welcome and most cordial greetings. The theme of naturallanguageprocessing and Understanding has become one of the focuses in AI research and the related fields due to the great needs and challenges from the worldwide trend of *** explains why the researchers over the world feel a high responsibility on their shoulders and why so many colleagues and friends would like to join in
Name and address reading is an important combined application area of languageprocessing and text-to-speech (TTS) systems. It is the cornerstone of both traditional reverse directory telephone services and new, locat...
详细信息
Name and address reading is an important combined application area of languageprocessing and text-to-speech (TTS) systems. It is the cornerstone of both traditional reverse directory telephone services and new, location based, traffic and tour guide applications. The languageprocessing aspects of a solution for Hungarian is described. The work was based on the analysis of a subscriber database containing about 3 million records (there are about 10 million Hungarian citizens). Categories of name and address elements were defined. A program for the automatic classification of database records was developed. Statistical parameters were derived about proper/legal names and addresses. Based on these results text corpora for enriching the TTS acoustic database were designed. Reading strategies and related special algorithms and tables were developed for the description of complex name categories. Our results may be applied for similar tasks of other languages with comparable linguistic and statistical features.
<正>This paper presents an approach to semantic inference for FAQ mining based on *** questions are classified into ten intension categories using predefined question stemming *** answers in the FAQ database are als...
详细信息
ISBN:
(纸本)0780379020
<正>This paper presents an approach to semantic inference for FAQ mining based on *** questions are classified into ten intension categories using predefined question stemming *** answers in the FAQ database are also clustered using latent semantic analysis(LSA) and K-means *** FAQ mining,given a query,the question part and answer part in an FAQ questionanswer pair is matched with the input query, ***,the probabilities estimated from these two parts are integrated and used to choose the most likely answer for the input *** approaches are experimented on a medical FAQ *** results show that the proposed approach achieved a retrieval rate of 90%and outperformed the keyword-based approach.
Face marks figures of faces which consist of characters such as(^^),and are effective for expressing emotions in a text-dialogue *** usually determine face marks from history of emotional elements and actional *** pap...
详细信息
Face marks figures of faces which consist of characters such as(^^),and are effective for expressing emotions in a text-dialogue *** usually determine face marks from history of emotional elements and actional *** paper proposes a method of learning face marks for a naturallanguage dialogue system from chat dialogue data in the Internet,*** use a back propagation error learning of a three-layer neural network to learn a model of face *** this neural network,the input neurons express emotional parameters and actional categories of texts,and the output neurons express parts of face marks:mouth,eyes,arms,and optional *** experimental results showed that the learning error was 0.19,and we could get the performance approximately 93%permissible value for the learning set of dialogues and approximately 60 %for the evaluation set of *** also showed that our system acquired the good information of relationship between parts of face marks and emotional and actional elements.
This paper discusses automatic extraction of the unlisted terms in the field of Information Technology based on the large-scale DCC(Dynamic Circulation Corpus),under the theory of Dynamic Updating of language and *** ...
详细信息
ISBN:
(纸本)0780379020
This paper discusses automatic extraction of the unlisted terms in the field of Information Technology based on the large-scale DCC(Dynamic Circulation Corpus),under the theory of Dynamic Updating of language and *** proposes the concept of Concatenation Index to decide whether a character string is a word/phrase or *** also presents a new approach named "Concatenation Index + TFIDF" for extracting unlisted terms in large scale corpus of a certain *** experiment selects the texts,around 17 million Chinese characters,in the field of IT (Information Technology) as the object corpus;and the texts,around 600 million Chinese characters,in the field of common usage as the contrast *** a result,the tentative work flow has been established, and the approach turned out to be efficient.
<正>The paper introduces our research on using integrated linguistic knowledge to automatically recognize Chinese scientific and technological terms based on the careful analysis of the characteristics of this kind ...
详细信息
ISBN:
(纸本)0780379020
<正>The paper introduces our research on using integrated linguistic knowledge to automatically recognize Chinese scientific and technological terms based on the careful analysis of the characteristics of this kind of *** system of automatic term recognition includes two phases:learning stage and application *** the stage of learning,we use a series of machine learning methods to get various kinds of integrated knowledge for automatic term recognition from a large-scale corpus and a term *** knowledge includes the inner structural knowledge of terms,the statistical domain features of term component,the statistical mutual information between the components of terms,the outer environment features of terms and the distinct text-level features of term recognition etc..In the stage of application,through an efficient model,we use all these various types of knowledge into automatic term *** experiments show that the system can give great help to the expert of term standardization to discover new terms.
作者:
Ching-Long YehYi-Chun ChenDepartment
of Computer Science and Engineering Tatung University 40 Chungshan N.Rd.3rd.Section Taipei 104 Taiwan China
Most traditional approaches to anaphora resolution are based on the integration of complex syntactic information and domain ***,to construct a domain knowledge base is very laborintensive and *** this paper,we work on...
详细信息
ISBN:
(纸本)0780379020
Most traditional approaches to anaphora resolution are based on the integration of complex syntactic information and domain ***,to construct a domain knowledge base is very laborintensive and *** this paper,we work on the output of a part-of-speech tagger and use a partial parsing instead of a complex parsing to resolve zero anaphors in Chinese *** employ centering theory and constraint rules to identify the antecedents of zero anaphors appeared in the preceding *** this paper,we focus on the cases of zero anaphors that occur in the topic or subject,and object positions of utterances. The result shows that the precision rates of zero anaphora detection and the recall rate of zero anaphora resolution with the method are 81%and 70%respectively.
暂无评论