Understanding the qualities of webrobot traffic is essential to build mechanisms for mitigating the impact of their traffic on web systems. This paper presents an updated characterization of the navigational and sess...
详细信息
ISBN:
(纸本)9781538605790
Understanding the qualities of webrobot traffic is essential to build mechanisms for mitigating the impact of their traffic on web systems. This paper presents an updated characterization of the navigational and session patterns of webrobot traffic across three web servers in the United States, Europe, and the Middle East under 30 different features. The results indicate that some features may be fitted to the same heavy-tailed model across the web servers, but the best fitting models for other features depend on the web server. Due to some different tasks of webrobots and security policies set by website administrators, there are thus some features of webrobot traffic that cannot be universally modeled.
Many studies on detection and classification of webrobots have focused their attention mostly on text crawlers, and empirical experiments used relatively small data collected at universities. In this paper, we analyz...
详细信息
Many studies on detection and classification of webrobots have focused their attention mostly on text crawlers, and empirical experiments used relatively small data collected at universities. In this paper, we analyzed more than one billion requests to *** in 24 h. web logs were made anonymous to eliminate potential privacy concerns while preserving essential characteristics (e.g., frequency, queries, etc). We have developed an effective characterization metrics, based on workload characteristics and resource types, in detecting and classifying various webrobots including text crawlers, link checkers, and icon crawlers. As expected, webrobot behavior was clearly different from that of typical interactive users, and different types of webrobots also exhibited different characteristics. However, comparison of the similar type of webrobots, text crawlers in particular, revealed different characteristics, thereby enabling characterization with reasonably high confidence level. we divided various feature metrics into five groups, and effectiveness of each group in classification is shown in polar diagram in the decreasing order of effectiveness in the clockwise direction. One can use the findings to classify likely identify of unknown webrobots, and organizations can develop appropriate measures to deal with them. Our analysis is based on recent web log data collected at one of the best known site which offers truly global service. Crown Copyright (C) 2009 Published by Elsevier Ltd. All rights reserved.
暂无评论