To achieve the corpus of Naxi - English bilingual words alignment, aim at syntactic characteristics of Naxi language. A Naxi-English bilingual words alignment method is proposed. this method uses the log-linear model,...
详细信息
To achieve the corpus of Naxi - English bilingual words alignment, aim at syntactic characteristics of Naxi language. A Naxi-English bilingual words alignment method is proposed. this method uses the log-linear model, and introduces feature functions based on the characteristic of the Naxi language, which are English - Naxi interval switching function and Naxi - English bilingual words position transformation function. Withthe artificial labeling of Naxi - English words alignment corpus, the parameters of the model are trained by using the minimum error. the Naxi-English bilingual words are alignment automatically by this model. Experiments with IBM Model 3 as a benchmark, and gradually add constraints on the characteristics of the Naxi language withthe basis of IBM Model 3. the final experiment results show that the Naxi - English bilingual word alignment accuracy can be improved significantly withthe feature functions which are base on characteristic of Naxi.
Most of the previous work on predicate-argument(PA) structure analysis utilized pipeline strategies with several subtasks. However, when the subtasks are sequentially processed, the errors propagate from the preceding...
详细信息
Most of the previous work on predicate-argument(PA) structure analysis utilized pipeline strategies with several subtasks. However, when the subtasks are sequentially processed, the errors propagate from the preceding subtasks to consecutive subtasks. In this paper, the PA structure analysis is divided into three subtasks: predicate sense disambiguation, argument classification and argument identification. A dual decomposition method is used to join the latter two subtasks, which alleviates the error propagation by making the decisions from the two subtasks as consistent as possible. Furthermore the predicate sense disambiguation task is integrated into the argument classification task by handling syntactic dependencies and semantic dependencies between a predicate and its arguments. Experiments show that our approach achieves competitive results compared to the state-of-the-art systems without special feature selection procedures.
this paper proposed a tag-topic model for semantic knowledge acquisition from blogs. the model extends the Latent Dirichlet Allocation by adding a tag layer between the document and topic layer, it represents each doc...
详细信息
this paper proposed a tag-topic model for semantic knowledge acquisition from blogs. the model extends the Latent Dirichlet Allocation by adding a tag layer between the document and topic layer, it represents each document with a mixture of tags, each tag is associated with a multinomial distribution over topics and each topic is associated with a multinomial distribution over words. After parameters estimating, the tags are regarded as concepts, the top words arranged to the top topics are selected as related words of the concepts, and PMI-IR is utilized for filtering out noisy words to improve the quality of the semantic knowledge. Experimental results show that the tag-topic model can effectively capture semantic knowledge.
Ontologies have served as a knowledge representation about the whole world or some part of it. Building ontologies is a challenging and active research area. Manually constructed Ontologies often have higher quality t...
详细信息
Ontologies have served as a knowledge representation about the whole world or some part of it. Building ontologies is a challenging and active research area. Manually constructed Ontologies often have higher quality than the ones created by automatic or semi-automatic approaches but they tend to be more applicable to small domains. Automatic approaches are considered more suitable for building large scale Ontologies where time and efforts of human experts become a bottleneck. For both paradigms, approaches to building Ontologies from Vietnamese texts are still very limited. In this paper, we propose a system that automatically builds Ontology from Vietnamese texts using cascades of annotation-based grammars. Obtained experimental results on a university organizational structure domain are very promising.
For the complex questions of Chinese question answering system such as `why', `how' these non-factoid questions, we proposed an answer extraction method using discourse structures features and ranking algorith...
详细信息
For the complex questions of Chinese question answering system such as `why', `how' these non-factoid questions, we proposed an answer extraction method using discourse structures features and ranking algorithm. this method takes the judge problem of answers relevance as learning to rank answers. First, the method analyses questions to generate the query string, and then uses rhetorical structure theory and the natural languageprocessingtechnology of vocabulary, syntax, semantic analysis to analyze the retrieved documents, so as to determine the inherent relationship between paragraphs or sentences and generate the answer candidate paragraphs or sentences. thirdly, construct the answer ranking model, extract five group features of similarity features, density and frequency features, translation features, discourse structure features and external knowledge features to train ranking model. Finally, re-ranking the answers withthe training model and find the optimal answers. Experiments show that the proposed method can effectively improve the accuracy and quality of non-factoid answers.
Sentence emotion recognition allows for deeper analysis of textual emotion. Based on the finding that sentence emotional focus can be expressed by some clauses in this sentence, this work proposes to select clause emo...
详细信息
Sentence emotion recognition allows for deeper analysis of textual emotion. Based on the finding that sentence emotional focus can be expressed by some clauses in this sentence, this work proposes to select clause emotion for sentence emotion recognition. In the first step, a Maximum entropy (MaxEnt) classification model has been built for word emotion recognition. In the second step, a homogeneous Markov model (HMM) classification method is used for clause emotion classification. In the third step, nine text features are selected, and genetic algorithm (GA) is used to specify the weight of each text feature. the sentence emotion is an addition of all selected clause states in this sentence. the experimental results showed that there are 9.1% and 3.6% improvement for the two tasks respectively when comparing withthe baseline. It is demonstrated that clause selection is able to improve the performance of sentence emotion recognition significantly.
Chinese word segmentation can be implemented in a coarse-to-fine schema. In such schema, a candidate set containing multiple segmentations of a sentence (rather than only one segmentation) is used as the output of a c...
详细信息
Chinese word segmentation can be implemented in a coarse-to-fine schema. In such schema, a candidate set containing multiple segmentations of a sentence (rather than only one segmentation) is used as the output of a coarse-grained CWS model. then a more sophisticated CWS model or other models of downstream tasks will reconsider all the segmentations in the candidate set to determine the best segmentation. this paper discussed and compared three candidate generation methods, namely boundary level method, word level method and sentence level method, in a unified form. the oracle F1-measures of the candidate sets of these methods were compared. the performances were also compared in a joint CWS and POS-tagging task. the results showed that the word level method has the best performance among these three candidate generation methods. Results also showed that the coarse-to-fine schema outperforms the pipeline schema in which only one segmentation is used for the downstream task and the joint schema in which all possible segmentation is used for the downstream task. Moreover, the speed of the coarse-to-fine schema is closed to the speed of the pipeline schema and much higher than the speed of the joint schema.
web services facilitate agile development and deployment of new services in e-Government systems. Due to their distributed nature, web services as well as e-Government systems must be secured against various cyber att...
详细信息
web services facilitate agile development and deployment of new services in e-Government systems. Due to their distributed nature, web services as well as e-Government systems must be secured against various cyber attacks and ensured that only authorized users can have access to valuable and sometimes sensitive information and resources. However, correct and efficient security engineering for web service-based e-Government systems is not a trivial task. Model-driven security (MDS) is considered to be a promising approach to reduce complexity and increase efficiency in the design and development of secure web services for e-Government systems. Nevertheless, the practicability of MDS has yet not been fully assessed. In a recent pilot project, we have applied MDS to the design and development of web services for the actual e-Government system in Austria. Our work shows that despite extensive research work, several aspects of MDS need to be adapted and further developed such that one can really benefit from such an approach in practice. Our work on addressing these aspects provides realistic assessment and valuable insights on the application of MDS to web services in the real world.
In this paper, we propose a discriminative latent model to extract Chinese multiword expressions from corpus resources as part of a larger research effort to improve machine translation(MT) system such as Super Functi...
详细信息
In this paper, we propose a discriminative latent model to extract Chinese multiword expressions from corpus resources as part of a larger research effort to improve machine translation(MT) system such as Super Function based Machine Translation(SFBMT). For existing MT systems, especially Statistic based MT(SBMT), the issue of multiword expressions (MWEs) detection and accurate correspondence from source to target language remains an unsolved problem. Template or Super Function based machine translation system suffer less from the existance of MWEs, but MWEs are still main holdback of MT systems. Our initial experiments on the Chinese-Japanese MT systems reveal that, where MWEs exist, SFBMT system suffer in terms of both efficiency ,comprehensibility and adequacy of finding the translation functions. Statistic based Machine Translation suffers more than SFBMT. For SFBMT systems to become of further practical use, they need to be enhanced with MWEs processing capability. As part of our study towards this goal, we proposed a discriminative latent model, which was developed for sequence labeling task, for identifying and extracting Chinese MWEs. In our evaluation, the tool achieved precisions ranging from 71.46% to 95.87% for different types of MWEs. Such results demonstrate that it is feasible to automatically identify many Chinese MWEs using our tool, Super Function based MT will be further improvement after the Chinese MWEs have been detected.
Withthe development of FPGA, the design of digital electronic circuits has entered a new era, which make it possible for designers to customize their own special CPU. So it is a great significance to design an econom...
详细信息
Withthe development of FPGA, the design of digital electronic circuits has entered a new era, which make it possible for designers to customize their own special CPU. So it is a great significance to design an economic and applicable CPU core based on FPGA for the cost reduction and the possession of intellectual property. Based on the relevant knowledge on the principles of computer composition, the paper designed and implemented an 8-bit CISC CPU by using top-down approach, completed all of the functional modules using the hardware description language VHDL and EDA technology. It is designed, compiled and simulated under the integrated development environment of Quartus II, and after that it is downloaded to the FPGA experimental platform for units and system testing. Finally, the validity of the design was verified and the simulation result of test program for the designed CPU is presented. the experiment result shows that the functional modules are successful in performing their individual functions, and the CPU as a whole can correctly execute all of the instructions. thereby, it proves to achieve the goal as formerly designed. the research of this paper makes contributions to the design of the more complex CPU based on FPGA and readers may refer to it according to their demands.
暂无评论