The problem of extracting structured data (i.e. lists, record sets, tables, etc.) from the web has been traditionally approached by taking into account either the underlying markup structure of a web page or the visua...
详细信息
ISBN:
(纸本)9783642218217
The problem of extracting structured data (i.e. lists, record sets, tables, etc.) from the web has been traditionally approached by taking into account either the underlying markup structure of a web page or the visual structure of the web page. However, empirical results show that considering the HTML structure and visual cues of a web page independently do not generalize well. We propose a new hybrid method to extract general lists from the web. It employs both general assumptions on the visual rendering of lists, and the structural representation of items contained in them. We show that our method significantly outperforms existing methods across a varied web corpus.
In this *** a unified webinformation model is advanced,which is made up of four domains,i.e. information structure,information semantics,information relation and information ***,based on the unified information model...
详细信息
In this *** a unified webinformation model is advanced,which is made up of four domains,i.e. information structure,information semantics,information relation and information ***,based on the unified information model,webinformation wrapping,clustering,classifying,constructing of the local information view,global information view and the mapping between them are done in an integrated *** taking the change of information capability as criteria,an evaluation approach for the web information integration is also ***,a multi-agent based web information integration framework is proposed to help users in fast and effective informationintegration on the web.
暂无评论