检索结果-内蒙古大学图书馆

An Effective and Efficient mapreduce Algorithm for Computing BFS-Based Traversals of Large-Scale RDF Graphs

algorithms 2016年第1期9卷

作者： Cuzzocrea, Alfredo Cosulschi, Mirel de Virgilio, Roberto Univ Trieste DIA Dept I-34127 Trieste Italy ICAR CNR I-34127 Trieste Italy Univ Craiova Dept Comp Sci Craiova 200585 Romania Univ Rome Tre Dipartimento Informat & Automaz I-00146 Rome Italy

Nowadays, a leading instance of big data is represented by Web data that lead to the definition of so-called big Web data. Indeed, extending beyond to a large number of critical applications (e.g., Web advertisement), these data expose several characteristics that clearly adhere to the well-known 3V properties (i.e., volume, velocity, variety). Resource Description Framework (RDF) is a significant formalism and language for the so-called Semantic Web, due to the fact that a very wide family of Web entities can be naturally modeled in a graph-shaped manner. In this context, RDF graphs play a first-class role, because they are widely used in the context of modern Web applications and systems, including the emerging context of social networks. When RDF graphs are defined on top of big (Web) data, they lead to the so-called large-scale RDF graphs, which reasonably populate the next-generation Semantic Web. In order to process such kind of big data, mapreduce, an open source computational framework specifically tailored to big data processing, has emerged during the last years as the reference implementation for this critical setting. In line with this trend, in this paper, we present an approach for efficiently implementing traversals of large-scale RDF graphs over mapreduce that is based on the Breadth First Search (BFS) strategy for visiting (RDF) graphs to be decomposed and processed according to the mapreduce framework. We demonstrate how such implementation speeds-up the analysis of RDF graphs with respect to competitor approaches. Experimental results clearly support our contributions.

关键词： mapreduce algorithms BFS-traversals of RDF graphs effective and efficient algorithms for big data processing

来源：评论

学校读者我要写书评

暂无评论

Randomized Composable Core-sets for Distributed Submodular Maximization 15

Randomized Composable Core-sets for Distributed Submodular M...

引用

47th Annual ACM Symposium on Theory of Computing (STOC) held as part of the Federated Computing Research Conference

作者： Mirrokni, Vahab Zadimoghaddam, Morteza Google Res New York NY 10014 USA

ISBN: (纸本)9781450335362

An effective technique for solving optimization problems over massive data sets is to partition the data into smaller pieces, solve the problem on each piece and compute a representative solution from it, and finally obtain a solution inside the union of the representative solutions for all pieces. This technique can be captured via the concept of composable core-sets, and has been recently applied to solve diversity maximization problems as well as several clustering problems [7, 15, 8]. However, for coverage and submodular maximization problems, impossibility bounds are known for this technique [15]. In this paper, we focus on efficient construction of a randomized variant of composable core-sets where the above idea is applied on a random clustering of the data. We employ this technique for the coverage, monotone and non-monotone submodular maximization problems. Our results significantly improve upon the hardness results for non-randomized core-sets, and imply improved results for sub modular maximization in a distributed and streaming settings. The effectiveness of this technique has been confirmed empirically for several machine learning applications [22], and our proof provides a theoretical foundation to this idea. In summary, we show that a simple greedy algorithm results in a 1/3-approximate randomized composable core set for submodular maximization under a cardinality constraint. Our result also extends to non-monotone submodular functions, and leads to the first 2-round mapreduce-based constant-factor approximation algorithm with O(n) total communication complexity for either monotone or non monotone functions. Finally, using an improved analysis technique and a new algorithm PseudoGreedy, we present an improved 0.545-approximation algorithm for monotone sub modular maximization, which is in turn the first mapreduce-based algorithm beating factor 1/2 in a constant number of rounds.

关键词： Submodular Maximization Distributed algorithms Streaming algorithms Core-sets Randomized Composable Core sets mapreduce algorithms

来源：评论

学校读者我要写书评

暂无评论

Distributed Graph Algorithmics: Theory and Practice 15

Distributed Graph Algorithmics: Theory and Practice

引用

8th ACM International Conference on Web Search and Data Mining (WSDM)

作者： Lattanzi, Silvio Mirrokni, Vahab Google Res New York NY 10001 USA

ISBN: (纸本)9781450333177

As a fundamental tool in modeling and analyzing social, and information networks, large-scale graph mining is an important component of any tool set for big data analysis. Processing graphs with hundreds of billions of edges is only possible via developing distributed algorithms under distributed graph mining frameworks such as mapreduce, Pregel, Gigraph, and alike. For these distributed algorithms to work well in practice, we need to take into account several metrics such as the number of rounds of computation and the communication complexity of each round. For example, given the popularity and ease-of-use of mapreduce framework, developing practical algorithms with good theoretical guarantees for basic graph algorithms is a problem of great importance. In this tutorial, we first discuss how to design and implement algorithms based on traditional mapreduce architecture. In this regard, we discuss various basic graph theoretic problems such as computing connected components, maximum matching, MST, counting triangle and overlapping or balanced clustering. We discuss a computation model for mapreduce and describe the sampling, filtering, local random walk, and core-set techniques to develop efficient algorithms in this framework. At the end, we explore the possibility of employing other distributed graph processing frameworks. In particular, we study the effect of augmenting mapreduce with a distributed hash table (DHT) service and also discuss the use of a new graph processing framework called ASYMP based on asynchronous message-passing method. In particular, we will show that using ASyMP, one can improve the CPU usage, and achieve significantly improved running time.

关键词： Parallel computing mapreduce algorithms Large scale data-mining

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：