The analysis of several algorithms and data structures can be framed as a peeling process on a random hypergraph: vertices with degree less than k are removed until there are no vertices of degree less than k left. Th...
详细信息
ISBN:
(纸本)9781450328210
The analysis of several algorithms and data structures can be framed as a peeling process on a random hypergraph: vertices with degree less than k are removed until there are no vertices of degree less than k left. The remaining hypergraph is known as the k-core. In this paper, we analyze parallel peeling processes, where in each round, all vertices of degree less than k are removed. It is known that, below a specific edge density threshold, the k-core is empty with high probability. We show that, with high probability, below this threshold, only 1/log ((k-1)(r-1)) log logn + O(1) rounds of peeling are needed to obtain the empty k-core for r-uniform hypergraphs. Interestingly, we show that above this threshold, Omega(logn) rounds of peeling are required to find the non-empty k-core. Since most algorithms and data structures aim to peel to an empty kcore, this asymmetry appears fortunate. We verify the theoretical results both with simulation and with a parallel implementation using graphics processing units (GPUs). Our implementation provides insights into how to structure parallel peeling algorithms for efficiency in practice.
Finding dense bipartite subgraphs and detecting the relations among them is an important problem for affiliation networks that arise in a range of domains, such as social network analysis, word-document clustering, th...
详细信息
ISBN:
(纸本)9781450355810
Finding dense bipartite subgraphs and detecting the relations among them is an important problem for affiliation networks that arise in a range of domains, such as social network analysis, word-document clustering, the science of science, internet advertising, and bioinformatics. However, most dense subgraph discovery algorithms are designed for classic, unipartite graphs. Subsequently, studies on affiliation networks are conducted on the co-occurrence graphs (e.g., co-author and co-purchase) that project the bipartite structure to a unipartite structure by connecting two entities if they share an affiliation. Despite their convenience, co-occurrence networks come at a cost of loss of information and an explosion in graph sizes, which limit the quality and the efficiency of solutions. We study the dense subgraph discovery problem on bipartite graphs. We define a framework of bipartite subgraphs based on the butterfly motif (2,2-biclique) to model the dense regions in a hierarchical structure. We introduce efficient peeling algorithms to find the dense subgraphs and build relations among them. We can identify denser structures compared to the state-of-the-art algorithms on co-occurrence graphs in real-world data. Our analyses on an author-paper network and a user-product network yield interesting subgraphs and hierarchical relations such as the groups of collaborators in the same institution and spammers that give fake ratings.
We consider variations of set reconciliation problems where two parties, Alice and Bob, each hold a set of points in a metric space, and the goal is for Bob to conclude with a set of points that is close to Alice'...
详细信息
ISBN:
(纸本)9781450362276
We consider variations of set reconciliation problems where two parties, Alice and Bob, each hold a set of points in a metric space, and the goal is for Bob to conclude with a set of points that is close to Alice's set of points in a well-defined way. This setting has been referred to as robust set reconciliation. In one variation, the goal is for Bob to end with a set of points that is close to Alice's in earth mover's distance, and in another the goal is for Bob to have a point that is close to each of Alice's. The first problem has been studied before;while previous results achieved an O(d) approximation, where d is the dimension of the space, we achieve an O(log n) approximation, where n is the number of points. The second problem appears new, and here we find schemes that, under certain conditions, use sublinear communication. Our primary novelty is utilizing Invertible Bloom Lookup Tables in combination with locality sensitive hashing. This combination allows us to cope with the geometric setting in a communication-efficient manner.
暂无评论