The web newspapers provide valuable information for readers. From all available information often we require some important & summarized information. However, each newspaper pages contain a lot of unrelated inform...
详细信息
ISBN:
(纸本)9789380544199
The web newspapers provide valuable information for readers. From all available information often we require some important & summarized information. However, each newspaper pages contain a lot of unrelated information. In order to improve the quality of news-pages, we developed an application which extracts single news from web pages. The information available on the web pages is semi-structured text document and is represented either by HTMI. or XML. format. In text document, representation of text depends on size and purpose of document. We propose another technique for web extraction in view of the qualities of structure of Web page. The tree is represented in the form of DOM (Document Object Model). While breaking down the page, DOM tree is constructed by parsing the webpage. For selecting text content from newspaper four patterns have been implemented. By using parsing technique selected news is converted into DOM tree.
暂无评论