this paper proposes a self-organized genetic algorithm for document clustering based on semantic similarity measure. the traditional method to represent text is that the document is organized as a string of words, whi...
详细信息
ISBN:
(纸本)9780769533049
this paper proposes a self-organized genetic algorithm for document clustering based on semantic similarity measure. the traditional method to represent text is that the document is organized as a string of words, while the conceptual similarity is ignored We take advantage of thesaurus-based ontology to overcome this problem. To investigate how ontology method could be used effectively in document clustering, a hybrid strategy which combines the thesaurus-based semantic similarity measure and vector space model (VSM) measure to provide more accurate assessment of similarity between documents are implemented. Considering the influence between the diversity of the population and the selective pressure, an approach of dynamic evolution operators is put forward in this article. In our experiment two data sets of 200 and 600 documents from Reuter-21578 corpus are excerpted for test and the experiment results show that our method of genetic algorithm in conjunction withthe hybrid semantic strategy, the combination of the thesaurus-based measure and VSM-based measure, outperforms that withthe sole VSM measure. Our clustering algorithm also efficiently enhances the performance of precision and recall in comparison with k-means in the same similarity environments.
Query expansion in knowledge based on information retrieval system requires knowledge base being considered semantic relations between words. Since Apriori algorithm extracts association word without taking user prefe...
详细信息
web service is released, combined and invoked through the web. Withthe continuous growth of web services, finding the web service to meet user requirement has become a challenging problem. the current web service dis...
详细信息
Ontologies changes in the sense of constantly growing in scientific discourse and being revised over time by different people. It refers to the fact that groups of professionals over time (e.g. in a longer term projec...
详细信息
In this paper, we provide a novel semantic workflow system, based on semantic functional service descriptions and a rule file. the workflow engine follows a three-step process. First, it determines for all the resourc...
详细信息
In this paper, we provide a novel semantic workflow system, based on semantic functional service descriptions and a rule file. the workflow engine follows a three-step process. First, it determines for all the resources in its knowledge base the functionality they need to progress in the workflow. this uses a phase-functionality rule file which binds phases of the workflow to functionalities. During a second phase, the functionalities are mapped to REST service calls using RESTdesc functional descriptions. During the third step, the engine executes the generated service calls and pushes the resource it acted on to the next phase in the workflow using a phase-transition rule file. the main advantage of this approach is that each step can be influenced by external information from the Linked Open Data cloud. It exploits the fact that Linked Open Data and RESTful web services and APIs are resource-oriented. Moreover, the workflow rule file makes the system easily adaptable and extensible to achieve new functionalities or to obey changing company policies. Finally, the separation between functional descriptions and service descriptions supports easy management over the fast-changing services at hand.
For almost 14 years in the Language engineering Group we have worked on a wide variety of Natural Language Processing (NLP) problems, being one of the earliest in the creation and operation of onomasiological dictiona...
详细信息
For almost 14 years in the Language engineering Group we have worked on a wide variety of Natural Language Processing (NLP) problems, being one of the earliest in the creation and operation of onomasiological dictionaries. During that time we have focused on search engine dictionary improvement, but recently our aim has been a development methodology for creating specialized onomasiological dictionaries in a semi-automatic way. To automate the creation of onomasiological dictionaries necessarily implies the automatic execution of used processes to populate the dictionaries knowledge base. Due to the nature of these dictionaries, the definitions that must be included in the knowledge base are both normative and colloquial. In this paper we present a proposal for semi-automatically populating the knowledge base of these dictionaries.
In this paper we present i) an approach for clustering authors according to their citation distributions and ii) an ontology, the Bibliometric Data Ontology, for supporting the formal representation of such clusters. ...
详细信息
In this paper we present i) an approach for clustering authors according to their citation distributions and ii) an ontology, the Bibliometric Data Ontology, for supporting the formal representation of such clusters. this method allows the formulation of queries which take in consideration the citation behaviour of an author and predicts with a good level of accuracy future citation behaviours. We evaluate our approach with respect to alternative solutions and discuss the predicting abilities of the identified clusters.
this paper describes the development and use of MatML, the Materials Markup Language. MatML is an emerging XML standard intended primarily for the exchange of materials property information. It provides a medium of co...
详细信息
ISBN:
(纸本)159593443X
this paper describes the development and use of MatML, the Materials Markup Language. MatML is an emerging XML standard intended primarily for the exchange of materials property information. It provides a medium of communication for users in materials science and related fields such as manufacturing and aerospace. It sets the stage for the development of semanticweb standards to enhance knowledge discovery in materials science and related areas. MatML has been used in applications such as the development of materials digital libraries and analysis of contaminant emissions data. Data mining applications of MatML include statistical process control and failure analysis. Challenges in promoting MatML involve satisfying a broad range of constituencies in the internationalengineering and materials science community and also adhering to other related standards in web data exchange. these issues are being addressed through the development of a good ontology, the automation of format conversions and possible schema extensions. MatML aims to be a lingua franca for data exchange in materials science and its broader horizons. Copyright 2006 ACM.
the uptake of Linked Data (LD) has promoted the proliferation of datasets and their associated ontologies bringing their semantic to the data being published. these ontologies should be evaluated at different stages, ...
详细信息
In the world-wide web context, availability of software components increases the possibility of applying a reuse approach in software development. thus, component retrieval is a key problem, both for software industry...
详细信息
ISBN:
(纸本)9780769534930
In the world-wide web context, availability of software components increases the possibility of applying a reuse approach in software development. thus, component retrieval is a key problem, both for software industry and for end-users, moreover for Open Source community which uses more and more components-based software engineering approaches. the OMG has defined a unified framework for reusable items (so-called "assets') descriptions. Even if this framework supports description of a large variety of components, it reduces retrieval aspects to keywords search without considering user's profile and user's need within the current task. We believe that the retrieval difficulty is related to the crucial problem of interaction between component providers and users (i.e. the consumers). this interaction can be supported and even automated by increasing the expressiveness of the language used for encoding component properties and formulating queries, enhancing therefore the quality of the retrieval. In this research, we propose to use common ontologies for representing user's profiles, user's needs and semanticknowledge of the components. these ontologies also support reasoning on components and matching of provided and required components. the approach makes use of business domain ontologies and ontology of the domain of information system engineering. the paper describes how these ontologies can be used both at design time for asset descriptions and user's profiles definition and at reuse time for matching user's requirements, user's profiles and asset descriptors.
暂无评论