This paper introduces a concept of information entropy to judge web-page types, which associates with the method put forward by Roadrunner that pre-purifying topic pages and then using proportional relation to judge t...
详细信息
ISBN:
(纸本)9780769548968
This paper introduces a concept of information entropy to judge web-page types, which associates with the method put forward by Roadrunner that pre-purifying topic pages and then using proportional relation to judge the type of pages. With some typical pages from large website home, the average precision could be reached to 96.7%, which lays foundation for further information extracting work
暂无评论