The World Wide web is the largest database on earth - a huge amount of data and information primarily intended for human users. Unfortunately, data on the web requires intelligent interpretation and cannot be easily u...
详细信息
The World Wide web is the largest database on earth - a huge amount of data and information primarily intended for human users. Unfortunately, data on the web requires intelligent interpretation and cannot be easily used by programs. It requires advanced dataextraction and information integration techniques to automatically process data. Lixto technology addresses these issues and enables developers to interactively turn web pages into mobile services.
In this paper a methodology and a framework for personalized views on data available on the World Wide web are proposed. We describe its main two ingredients, web data extraction and ontology-based personalized conten...
详细信息
ISBN:
(纸本)3540258612
In this paper a methodology and a framework for personalized views on data available on the World Wide web are proposed. We describe its main two ingredients, web data extraction and ontology-based personalized content presentation. We exemplify the usage of these methodologies with a sample application for personalized publication browsing(1).
The World Wide web represents a universe of knowledge and information. Unfortunately, it is not straightforward to query and access the desired information. Languages and tools for accessing, extracting, transforming,...
详细信息
ISBN:
(纸本)3540278281
The World Wide web represents a universe of knowledge and information. Unfortunately, it is not straightforward to query and access the desired information. Languages and tools for accessing, extracting, transforming, and syndicating the desired information are required. The web should be useful not merely for human consumption but additionally for machine communication. Therefore, powerful and user-friendly tools based on expressive languages for extracting and integrating information from various different web sources, or in general, various heterogeneous sources are needed. The tutorial gives an introduction to web technologies required in this context, and presents various approaches and techniques used in information extraction and integration. Moreover, sample applications in various domains motivate the discussed topics and providing data instances for the Semantic web is illustrated(1).
This application demonstrates how to provide personalized, syndicated views on distributed webdata using Semantic web technologies. The application comprises four steps: The information gathering step, in which infor...
详细信息
ISBN:
(纸本)3540297545
This application demonstrates how to provide personalized, syndicated views on distributed webdata using Semantic web technologies. The application comprises four steps: The information gathering step, in which information from distributed, heterogenous sources is extracted and enriched with machine-readable semantics, the operation step for timely and up-to-date extractions, the reasoning step in which rules reason about the created semantic descriptions and additional knowledge-bases like ontologies and user profile information, and the user interface creation step in which the RDF-descriptions resulting from the reasoning step are interpreted and translated into an appropriate, personalized user interface. We have developed this application for solving the following real-world problem: We provide personalized, syndicated views on the publications of a large European research project with more than twenty geographically distributed partners and embed this information with contextual information on the project, its working groups, information about the authors, related publications, etc.
In order to let software programs gain full benefit from semi-structured web sources, wrapper programs must be built to provide a "machine-readable" view over them. A significant problem of this approach is ...
详细信息
ISBN:
(纸本)076952415X
In order to let software programs gain full benefit from semi-structured web sources, wrapper programs must be built to provide a "machine-readable" view over them. A significant problem of this approach is that, since web sources are autonomous, they may experience changes that invalidate the current wrapper. In this paper, we address this problem by introducing novel heuristics and algorithms for automatically maintaining wrappers. In our approach the system collects some query results during normal wrapper operation and, when the source changes, it uses them as input to generate a set of labeled examples for the source which can then be used to induce a new wrapper. Our experiments show that the proposed techniques show high accuracy for a wide range of real-world web data extraction problems.
data mining to extract information from web pages can help provide value-added services. The MDR (mining data records) system exploits web page structure and uses a string-matching algorithm to mine contiguous and non...
详细信息
data mining to extract information from web pages can help provide value-added services. The MDR (mining data records) system exploits web page structure and uses a string-matching algorithm to mine contiguous and noncontiguous data records.
暂无评论