The Open Knowledge Extraction (OKE) challenge, at its second edition, has the ambition to provide a reference framework for research on Knowledge Extraction from text for the Semantic web by re-defining a number of ta...
详细信息
Empirical studies have shown that individuals' behaviors are largely influenced by social conformity, including punishment. However, a coevolutionary theoretical framework that takes into account effects of confor...
详细信息
Research on social contagion dynamics has not yet including a theoretical analysis of the ubiquitous local trend imitation (LTI) characteristic. We propose a social contagion model with a tent-like adoption probabilit...
详细信息
Knowledge about the general graph structure of theWorldWideweb is important for understanding the social mechanisms thatgovern its growth, for designing ranking methods, for devising better crawling algorithms, and fo...
Knowledge about the general graph structure of theWorldWideweb is important for understanding the social mechanisms that
govern its growth, for designing ranking methods, for devising better crawling algorithms, and for creating accurate models
of its structure. In this paper, we analyze a large web graph. The graph was extracted from a large publicly accessible web
crawl that was gathered by the Common Crawl Foundation in 2012. The graph covers over 3:5 billion web pages and 128:7
billion hyperlinks. We analyze and compare, among other features, degree distributions, connectivity, average distances, and
the structure of weakly/strongly connected components. We conduct our analysis on three different levels of aggregation: page,
host, and pay-level domain (PLD) (one “dot level” above public suffixes).
Our analysis shows that, as evidenced by previous research (Serrano et al., 2007), some of the features previously observed
by Broder et al., 2000 are very dependent on artifacts of the crawling process, whereas other appear to be more structural. We
confirm the existence of a giant strongly connected component; we however find, as observed by other researchers (Donato
et al., 2005; Boldi et al., 2002; Baeza-Yates and Poblete, 2003), very different proportions of nodes that can reach or that can be
reached from the giant component, suggesting that the “bow-tie structure” as described by Broder et al. is strongly dependent
on the crawling process, and to the best of our current knowledge is not a structural property of the web.
More importantly, statistical testing and visual inspection of size-rank plots show that the distributions of indegree, outdegree
and sizes of strongly connected components of the page and host graph are not power laws, contrarily to what was previously
reported for much smaller crawls, although they might be heavy tailed. If we aggregate at pay-level domain, however, a power
law emerges. We also provide for the first time accurate measurement of
This year we did not use Clueweb12 or Clueweb12-B,while we solve this issue based on data we crawled from ***, we use external structured resource –Google Place API[1] to find all of the possible candidate places in ...
详细信息
暂无评论