There are many external resources and heterogeneous data on the internet that an organization or user may need to improve the decision making process. It is therefore, very important and critical that this information...
详细信息
ISBN:
(纸本)9781920682668
There are many external resources and heterogeneous data on the internet that an organization or user may need to improve the decision making process. It is therefore, very important and critical that this information are complete, precise and can be acquired on time. Most web sources provide data in semi-structured form on the internet. The combination of semi-structured data from different sources on the internet often fails because of syntactic and semantic differences. The access, retrieval and utilization of information from the different webdata sources impose a need for the data to be integrated. integration of webdata is a complex process because of the heterogeneity nature of webdata and thus needs some kind of a web data integration system. There are many types of heterogeneity and differences among web sources that makes dataintegration a difficult process (e.g., different data model, different syntax and semantics in schema and data instance level among web sources). Semantic schema heterogeneity, which refers to the misinterpretation of data at the schema level, is one major obstacle that needs to be overcome in web data integration process. Semantic schema heterogeneity has been identified as one of the most important problems when dealing with interoperability and cooperation among multiple data sources on the internet. In this paper, we recommend a system architecture for web data integration focusing on resolving the problems of semantic schema heterogeneity between webdata sources. We propose an ontology-based approach as a solution for the reconciliation of semantic conflicts between webdata at the schema level.
Many studies concentrate on developing attractive web applications, but very few discuss the fundamental problems of modeling, integration and retrieval of web hypermedia data from heterogeneous data sources based on ...
详细信息
ISBN:
(纸本)142440133X
Many studies concentrate on developing attractive web applications, but very few discuss the fundamental problems of modeling, integration and retrieval of web hypermedia data from heterogeneous data sources based on its content and semantics. The main focus of this paper is the modeling facilities in the XHMG system for content-based representation, integration and retrieval of heterogeneous webdata. The paper shows the basic XHMG structural instruments for web and web page content representation. The most important application of this approach is handling and integrating the hypermedia information in the web based on its content and meaning. The research in this paper will have a potentially large impact on the technologies used in information sources for e-business, e-advertising, e-commerce, e-government, e-learning, portals, digital libraries, web search engines, online catalogs.
How to integrate heterogeneous semi-structured web records into relational database is an important and challengeable research topic. An improved model of conditional random fields was presented to combine the learnin...
详细信息
How to integrate heterogeneous semi-structured web records into relational database is an important and challengeable research topic. An improved model of conditional random fields was presented to combine the learning of labeled samples and unlabeled database records in order to reduce the dependence on tediously hand-labeled training data. The pro- posed model was used to solve the problem of schema matching between data source schema and database schema. Experimental results using a large number of web pages from diverse domains show the novel approach's effectiveness.
The World Wide web represents a universe of knowledge and information. Unfortunately, it is not straightforward to query and access the desired information. Languages and tools for accessing, extracting, transforming,...
详细信息
ISBN:
(纸本)3540278281
The World Wide web represents a universe of knowledge and information. Unfortunately, it is not straightforward to query and access the desired information. Languages and tools for accessing, extracting, transforming, and syndicating the desired information are required. The web should be useful not merely for human consumption but additionally for machine communication. Therefore, powerful and user-friendly tools based on expressive languages for extracting and integrating information from various different web sources, or in general, various heterogeneous sources are needed. The tutorial gives an introduction to web technologies required in this context, and presents various approaches and techniques used in information extraction and integration. Moreover, sample applications in various domains motivate the discussed topics and providing data instances for the Semantic web is illustrated(1).
We present the Object-web Mediator to querying integrated webdata sources composed of a retrieval component based on an intermediate object view mechanism and search views, and an XML engine. Search views map the sou...
详细信息
We present the Object-web Mediator to querying integrated webdata sources composed of a retrieval component based on an intermediate object view mechanism and search views, and an XML engine. Search views map the source capabilities to attributes defined at object classes, and parsers that process retrieved documents and cache them in XML format. The XML engine queries cached documents, extracts data, and returns extracted data for evaluation. The originality of this approach consists of a generic view mechanism to access data sources with limited data access and complex capabilities, and an XML engine to support data extraction and reorganization. This approach has been developed and demonstrated as part of the multi-database system supporting queries via uniform Object Protocol Model interfaces against public webdata sources of interest to the biologists. (C) 2002 Elsevier Science B.V. All rights reserved.
暂无评论