Bilingual word alignment is to find the corresponding word-level translation between the source and the target language sentences. It is widely used in the area of natural languageprocessing, like machine translation...
详细信息
ISBN:
(纸本)9780769529301
Bilingual word alignment is to find the corresponding word-level translation between the source and the target language sentences. It is widely used in the area of natural languageprocessing, like machine translation, cross-languageinformation retrieval, bilingual dictionary compilation. However, bilingual word alignment is a very difficult task involving many challenges such as morphology, syntax, semantics, human translation habits, and the inherent difference between English and Chinese. In this paper, a new algorithm is proposed By adopting a multilayered matching and disambiguation strategy, the bilingual word-alignment task is transformed to an iterative solution of anchor word-pair. the positive experimental results of 93.5% precision, 77.3% recall, and 84.2% F-score show the algorithm's effectiveness.
this paper constructed a latent semantic text model using genetic algorithm (GA) for web clustering. the main difficulty in the application of GA for text clustering is thousands or even tens of thousands of dimension...
详细信息
ISBN:
(纸本)9780769529301
this paper constructed a latent semantic text model using genetic algorithm (GA) for web clustering. the main difficulty in the application of GA for text clustering is thousands or even tens of thousands of dimensions in the feature space. Latent semantic indexing (LSI) is a successful technology which attempts to explore the latent semantics structure in textual data, and furthermore, it reduces this large space to smaller one and provides a robust space for clustering. GA belongs to search techniques that efficiently evolve the optimal solution for the problem. Evolved in the reduced latent semantic indexing model, GA can improve clustering accuracy and speed which is typically suitable for real time clustering. We used SSTRESS criteria to analyze the dissimilarity between original term-by-document corpus matrix and the approximate decomposition matrix with different ranks corresponding to the performance of our algorithm evolved in the reduced space. the superiority of GA applied in LSI model over K-means and conventional GA in the vector space model (VSM) is demonstrated by providing good Reuter text clustering results.
XML document usage is currently in a limbo state probably because of too much freedom endowed to the XML tag definition and schema organization. An effort to restrict the unbounded freedom in XML document structure de...
详细信息
ISBN:
(纸本)9780769529301
XML document usage is currently in a limbo state probably because of too much freedom endowed to the XML tag definition and schema organization. An effort to restrict the unbounded freedom in XML document structure design may help improve the utilization of XML documents on the web environment. Abstraction of common document characteristics from diverse user groups in the same application domain can help develop commonly acceptable XML document structures. We can achieve optimality in document structure by abstracting the document structure and implanting optimum classification tree in XML schema. the implantation is enabled if we apply the ID3-based classification tree generation algorithm. In generating the classification tree of a case example, the 'situation variable and decision variable' structure was proposed to abstract the business process exception handling document structure. the classification tree was then used to construct XML-schema which enables authoring and transmission of web documents that contain business information. Since the induced classification tree is optimized by the "information gain" criterion, the classification tree based XML-schema design also helps utilize XML document information on the semantic web.
In this paper, colored Petri nets are used to model the atomic processes and the composite processes of semantic web services. After introducing the concept about the colored Petri net and the knowledge associated wit...
详细信息
ISBN:
(纸本)9780769529301
In this paper, colored Petri nets are used to model the atomic processes and the composite processes of semantic web services. After introducing the concept about the colored Petri net and the knowledge associated with OWL-S, colored Petri nets are used to model the composite processes of semantic web services. In this model, input, output and precondition are represented through different kinds of tokens, effect is represented by the change of the token number during firing the transition, the composition model can be described unambiguously and the composite process can be analyzed and verified conveniently. then, an algorithm is given to verify the model syntax correctness. Finally, the composite service model is applied to a case.
Recently semantic web studies are running only complicated direction, so its technology does not seem to be applied at an early date. this paper suggests a simple structure of ontology and a method to utilize this ont...
详细信息
ISBN:
(纸本)9780769529301
Recently semantic web studies are running only complicated direction, so its technology does not seem to be applied at an early date. this paper suggests a simple structure of ontology and a method to utilize this ontology in contextual advertising. the present contextual advertising systems use DB systems for search-based advertising. In these systems, there is just a list of keywords, but no relation among keywords at all. this structure is not suitable to the contextual advertising which matches an advertisement, analyzing the contents. therefore, a discord between content and matched advertisements often happens. In this paper, we propose an intelligent keyword management system, which considers a contextual situation with expanding keywords. the keywords and the relations between them are expressed as ontology. As this proposed system uses parts of the current contextual advertising system, as it stands, only slight changes in the system can make it recommend more suitable keywords for contextual advertising.
Automatic expert finding systems aiming at identifying experts from a large set of document repository have attracted considerable interest in recent years. To better describe the relationship on expert-document and d...
详细信息
ISBN:
(纸本)9780769529301
Automatic expert finding systems aiming at identifying experts from a large set of document repository have attracted considerable interest in recent years. To better describe the relationship on expert-document and document-topic, we introduce the conception "Role" in our model and expand its use in expert. search. We illustrate how the conception "Role" in our model helps to present an effective approach to model the probability of the candidate being an expert on the given topic with documents relevant to it. Consequently, we evaluate the effectiveness of our model by a series of experiments on TREC 2006 Enterprise Track expert search task. the results show that our role centralized model performs much better than general approaches overall. It is a big potential to dig in more deeply in the future.
Everywhere in the Earth, We get a GPS signal from satellite in space. In indoors, we can use indoor position system to get the signal. Its signal format and type are different each other. therefore, some studies try t...
详细信息
ISBN:
(纸本)9780769529301
Everywhere in the Earth, We get a GPS signal from satellite in space. In indoors, we can use indoor position system to get the signal. Its signal format and type are different each other. therefore, some studies try to mapping informationthrough metadata. [1,2] However, it's to hard to fix the standard among the heterogynous standards and searching the data through metadata transformations[13,14]. In this paper, we propose a location model (LM) to describe purpose, place, time and who that are elements of LM. A multimedia data created by device as digital camera, has a variety of metadata. We assume that it has a location metadata as latitude and longitude. We have been clustering location information based on Location Model by LM manager. Moreover, we implements LM Searcher helps user to search correct files. this study shows efficient use of clustering location model and tool of managing and searching among multimedia data in storage system.
Nowadays, spoken-style text is prevailing because lots of information are being written in spoken-style such as Short-Message-Service(SMS) messages. However the spoken-style text contains more spelling errors than the...
详细信息
ISBN:
(纸本)9780769529301
Nowadays, spoken-style text is prevailing because lots of information are being written in spoken-style such as Short-Message-Service(SMS) messages. However the spoken-style text contains more spelling errors than the traditional written-style text. In this paper, we propose a rule-based spelling correction system which can automatically extract spelling correction rules from the correction corpus and apply extracted rules to spelling errors of input sentences. In order to preserve both high precision and high recall, we devise a candidate-elimination algorithm which determines appropriate context size of spelling correction rules based on rule accuracy. Experimental results showed that the proposed system can extract 42,537 spelling correction rules and apply the rules to correct spelling errors on the test corpus and thus, the rate of precision is increased from 31.08% to 79.04% on the basis of message unit.
the proceedings contain 105 papers. the topics discussed include: e-mail clustering based on profile and multi-attribute values;an efficient document categorization model based on LSA and BPNN;a dynamic SOM algorithm ...
详细信息
ISBN:
(纸本)0769529305
the proceedings contain 105 papers. the topics discussed include: e-mail clustering based on profile and multi-attribute values;an efficient document categorization model based on LSA and BPNN;a dynamic SOM algorithm for clustering large-scale document collection;analysis on web clustering based on genetic algorithm with latent semantic indexing technology;kernel-based sentiment classification for Chinese sentence;leveraging world knowledge in Chinese text classification;structure analysis and computation-based Chinese question classification;Chinese text classification without automatic word segmentation;the formalization of four types of 'ZAI' viewpoint aspect sentences;a feature space expression to analyze dependency of Korean clauses with a composite kernel;a multilayered bilingual word-alignment algorithm;a template-based English-Chinese translation system using FOPA and UAMRT;and Korean spacing by improving Viterbi segmentation.
the purpose of this paper is to suggest guidelines for application on USN (Ubiquitous Sensor Network) based on the field study. USN detects, stores, processes and integrates the object and environment data via tags an...
详细信息
ISBN:
(纸本)9780769529301
the purpose of this paper is to suggest guidelines for application on USN (Ubiquitous Sensor Network) based on the field study. USN detects, stores, processes and integrates the object and environment data via tags and sensors then generates the situation aware information and knowledge contents to allow anyone to freely use the customized knowledge services anywhere and anytime. Forthcoming ubiquitous society can be embodied early by building USN infrastructure, and the USN technologies create a positive ripple effect on related industries by rapid added values of the cutting-edge of IT services and products that improve the quality of citizens' life styles. Even through USN technology developments are lively in progress, their practical studies are in sluggish. For these reasons, National information Society Agency (NIA) carried out four different field studies such as u-Marine, u-Architecture, u-Agriculture, and u-Hospital in order to examine USN feasibility in 2005.
暂无评论