检索结果-内蒙古大学图书馆

16th IEEE International Conference on Cluster Computing, CLUSTER 2014

作者： Shi, Xuanhua Lin, Haohong Jin, Hai Zhou, Bing Bing Yin, Zuoning Di, Sheng Wu, Song Services Computing Technology and System Lab School of Computer Science and Technology Huazhong University of Science and Technology China Centre for Distributed and High Performance Computing School of Information Technologies University of Sydney Australia GraphSQL Inc. Australia INRIA Argonne National Laboratory United States

ISBN: (纸本)9781479955480

The scale of cloud services keeps increasing over time, significantly introducing huge challenges in system manageability and reliability. Designing coordination services in cloud is the right track to solve the above problems. However, existing coordination services (e.g., Chubby and ZooKeeper) only perform well in read-intensive scenario and small ensemble scales. To this end, we propose Giraffe, a scalable distributed coordination service. There are three important contributions in our design. (1) Giraffe organizes coordination servers using interior-node-disjoint trees for better scalability. (2) Giraffe employs a novel Paxos protocol for strong consistency and fault-tolerance. (3) Giraffe supports hierarchical data organization and in-memory storage for high throughput and low latency. We evaluate Giraffe on a high performance computing test-bed. The experimental results show that Giraffe gains much better write performance than ZooKeeper when server ensemble is large. Giraffe is nearly 300% faster than ZooKeeper on update operations when ensemble size is 50 servers. Experiments also show that Giraffe reacts and recovers more quickly than ZooKeeper against node failures. © 2014 IEEE.

关键词： Distributed cloud

来源：评论

学校读者我要写书评

暂无评论

From "Think Like a Vertex" to "Think Like a Graph"

引用

PROCEEDINGS OF THE VLDB ENDOWMENT 2013年第3期7卷 193-204页

作者： Tian, Yuanyuan Balmin, Andrey Corsten, Severin Andreas Tatikonda, Shirish McPherson, John IBM Almaden Res Ctr Yorktown Hts NY 10598 USA GraphSQL Redwood City CA 94065 USA IBM Deutschland GmbH Berlin Germany

To meet the challenge of processing rapidly growing graph and network data created by modern applications, a number of distributed graph processing systems have emerged, such as Pregel and GraphLab. All these systems divide input graphs into partitions, and employ a "think like a vertex" programming model to support iterative graph computation. This vertex-centric model is easy to program and has been proved useful for many graph algorithms. However, this model hides the partitioning information from the users, thus prevents many algorithm-specific optimizations. This often results in longer execution time due to excessive network messages (e.g. in Pregel) or heavy scheduling overhead to ensure data consistency (e.g. in GraphLab). To address this limitation, we propose a new "think like a graph" programming paradigm. Under this graph-centric model, the partition structure is opened up to the users, and can be utilized so that communication within a partition can bypass the heavy message passing or scheduling machinery. We implemented this model in a new system, called Giraph++, based on Apache Giraph, an open source implementation of Pregel. We explore the applicability of the graph-centric model to three categories of graph algorithms, and demonstrate its flexibility and superior performance, especially on well-partitioned data. For example, on a web graph with 118 million vertices and 855 million edges, the graph-centric version of connected component detection algorithm runs 63X faster and uses 204X fewer network messages than its vertex- centric counterpart.

关键词： Graph theory Iterative methods Machinery Message passing Open source software Open systems Scheduling Connected component Graph algorithms Graph computations Modern applications Open source implementation Partition structures Programming models Programming paradigms

来源：评论

学校读者我要写书评

暂无评论

GIRAFFE: A scalable distributed coordination service for large-scale systems

GIRAFFE: A scalable distributed coordination service for lar...

引用

IEEE International Conference on Cluster Computing

作者： Xuanhua Shi Haohong Lin Hai Jin Bing Bing Zhou Zuoning Yin Sheng Di Song Wu Services Computing Technology and System Lab School of Computer Science and Technology Huazhong University of Science and Technology Centre for Distributed and High Performance Computing School of Information Technologies University of Sydney GraphSQL Inc. INRIA and Argonne National Laboratory

ISBN: (纸本)9781479955497

关键词： cloud computing fault tolerant computing large-scale systems protocols public domain software reliability Giraffe Paxos protocol ZooKeeper cloud services fault-tolerance hierarchical data organization memory storage scalable distributed coordination service Color Proposals Protocols Scalability Servers Topology Vegetation Consistency Coordination Service Distributed Cloud System Fault Tolerance

来源：评论

学校读者我要写书评

暂无评论

From "think like a vertex" to "think like a graph" 40th

From "think like a vertex" to "think like a graph"

引用

Proceedings of the 40th International Conference on Very Large Data Bases, VLDB 2014

作者： Tiany, Yuanyuan Balminx, Andrey Corsten, Severin Andreas Tatikonda, Shirish McPherson, John IBM Almaden Research Center United States GraphSQL United States IBM Deutschland GmbH Germany

关键词： Message passing

来源：评论

学校读者我要写书评

暂无评论

FlowFlex: Malleable scheduling for flows of MapReduce jobs

FlowFlex: Malleable scheduling for flows of MapReduce jobs

引用

14th ACM/IFIP/USENIX Middleware Conference, Middleware 2013

作者： Nagarajan, Viswanath Wolf, Joel Balmin, Andrey Hildrum, Kirsten IBM T. J. Watson Research Center United States GraphSQL United States

ISBN: (纸本)9783642450648

We introduce FlowFlex, a highly generic and effective scheduler for flows of MapReduce jobs connected by precedence constraints. Such a flow can result, for example, from a single user-level Pig, Hive or Jaql query. Each flow is associated with an arbitrary function describing the cost incurred in completing the flow at a particular time. The overall objective is to minimize either the total cost (minisum) or the maximum cost (minimax) of the flows. Our contributions are both theoretical and practical. Theoretically, we advance the state of the art in malleable parallel scheduling with precedence constraints. We employ resource augmentation analysis to provide bicriteria approximation algorithms for both minisum and minimax objective functions. As corollaries, we obtain approximation algorithms for total weighted completion time (and thus average completion time and average stretch), and for maximum weighted completion time (and thus makespan and maximum stretch). Practically, the average case performance of the FlowFlex scheduler is excellent, significantly better than other approaches. Specifically, we demonstrate via extensive experiments the overall performance of FlowFlex relative to optimal and also relative to other, standard MapReduce scheduling schemes. All told, FlowFlex dramatically extends the capabilities of the earlier Flex scheduler for singleton MapReduce jobs while simultaneously providing a solid theoretical foundation for both. © IFIP International Federation for Information Processing 2013.

关键词： MapReduce

来源：评论

学校读者我要写书评

暂无评论

PREDIcT: Towards Predicting the Runtime of Large Scale Iterative Analytics

引用

PROCEEDINGS OF THE VLDB ENDOWMENT 2013年第14期6卷 1678-1689页

作者： Popescut, Adrian Daniel Balmin, Andrey Ercegoyac, Vuk Ailamaki, Anastasia Ecole Polytech Fed Lausanne CH-1015 Lausanne Switzerland IBM Almaden Res Ctr San Jose CA USA GraphSQL Mountain View CA USA Google Inc Menlo Pk CA USA

Machine learning algorithms are widely used today for analytical tasks such as data cleaning, data categorization, or data filtering. At the same time, the rise of social media motivates recent uptake in large scale graph processing. Both categories of algorithms are dominated by it er a tivesubtasks,i.e., processing steps which are executed repetitively until a convergence condition is met. Optimizing cluster resource allocations among multiple workloads of iterative algorithms motivates the need for estimating their runtime, which in turn requires: i) predicting the number of iterations, and ii) predicting the processing time of each iteration. As both parameters depend on the characteristics of the dataset and on the convergence function, estimating their values before execution is difficult.

关键词： Forecasting

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：