Application which need to process and manage largegraph data sets have imposed significant challenges for data science community inrecent times. This talk discusses the key challenges which need to be handled when im...
详细信息
ISBN:
(纸本)9781450343503
Application which need to process and manage largegraph data sets have imposed significant challenges for data science community inrecent times. This talk discusses the key challenges which need to be handled when implementing a next-generation graphprocessing and management platform. There are severalkey problems which needs to bead dressed in building such large graph processing system. First, optimized techniques needs to be followed for managing extremely largegraph data. Second, new programming models and software tools need to be created for efficiently processinglargegraphs. This talk will discuss the approaches which need to be followed in addressing these two major issues and will highlight our vision in achieving the challenges of next-generation graphprocessing and management.
More and more large data collections are gathered worldwide in various IT systems. Many of them possess a networked nature and need to be processed and analysed as graph structures. Due to their size they very often r...
详细信息
More and more large data collections are gathered worldwide in various IT systems. Many of them possess a networked nature and need to be processed and analysed as graph structures. Due to their size they very often require the usage of a parallel paradigm for efficient computation. Three parallel techniques have been compared in the paper: MapReduce, its map-side join extension and Bulk Synchronous Parallel (BSP). They are implemented for two different graph problems: calculation of single source shortest paths (SSSP) and collective classification of graph nodes by means of relational influence propagation (RIP). The methods and algorithms are applied to several network datasets differing in size and structural profile, originating from three domains: telecommunication, multimedia and microblog. The results revealed that iterative graphprocessing with the BSP implementation always and significantly, even up to 10 times outperforms MapReduce, especially for algorithms with many iterations and sparse communication. The extension of MapReduce based on map-side join is usually characterized by better efficiency compared to its origin, although not as much as BSP. Nevertheless, MapReduce still remains a good alternative for enormous networks, whose data structures do not fit in local memories. (C) 2013 The Authors. Published by Elsevier B.V. All rights reserved,
We present a new distributed architecture allowing simulation of living cells in spatial structures. Each cell is represented with a Quasi-Steady State Petri Net that integrates dynamic regulatory network expressed wi...
详细信息
We present a new distributed architecture allowing simulation of living cells in spatial structures. Each cell is represented with a Quasi-Steady State Petri Net that integrates dynamic regulatory network expressed with a Petri net and Genome Scale Metabolic Network (GSMN) where linear programming is used to explore the steady-state metabolic flux distributions in the whole-cell model. The combination of Petri net and GSMN has already been used in simulations of single cells, but we present an extension to the model and an architecture to simulate populations of millions of interacting cells organised in spatial structures which can be used to model tumour growth or formation of tuberculosis lesions. The crucial element of this solution is the ability of cells to communicate by producing and detecting substances such as cytokines and chemokines. This ability is modeled by allowing cells to share tokens in places called communicators. To make the simulation of such a huge model possible we use the Spark framework and organise the computation in an agent-based "think like a vertex" fashion as in Pregel-like systems. In the cluster we introduce a special kind of per node caching to speed up computation of the steady-state metabolic flux. We demonstrate capabilities of the new architecture by simulating communication of liver cells through FGF19 cytokine during the homeostatic response to cholesterol burst. Our approach can be used for mechanistic modelling of the emergence of multicellular system behaviour from interaction between genome and environment. (C) 2017 Elsevier B.V. All rights reserved.
Network structures, especially social networks, grow rapidly and provide huge datasets intractable to analyse. In this paper, two parallel approaches to process largegraph structures within the Hadoop environment wer...
详细信息
ISBN:
(纸本)9780769549255;9781467351645
Network structures, especially social networks, grow rapidly and provide huge datasets intractable to analyse. In this paper, two parallel approaches to process largegraph structures within the Hadoop environment were compared: Bulk Synchronous Parallel (BSP) and MapReduce (MR). The experimental studies were carried out for two different graph problems: collective classification by means of Relational Influence Propagation (RIP) and Single Source Shortest Path (SSSP) calculation. The appropriate BSP and MapReduce algorithms for these problems were applied to various network datasets differing in size and structural profile, originating from three domains: telecommunication, multimedia and microblog. The collected results revealed that iterative graphprocessing with BSP implementation significantly outperform popular MapReduce, especially for algorithms with many iterations and sparse communication. However, MapReduce still remains the only alternative for enormous networks.
graphprocessing is one of the important research topics in the big-data era. To build a general framework for graphprocessing by using a DRAM-based FPGA board with deep memory hierarchy, one of the reasonable method...
详细信息
graphprocessing is one of the important research topics in the big-data era. To build a general framework for graphprocessing by using a DRAM-based FPGA board with deep memory hierarchy, one of the reasonable methods is to partition a given big graph into multiple small subgraphs, represent the graph with a two-dimensional grid, and then process the subgraphs one after another to divide and conquer the whole problem. Such a method (grid-graphprocessing) stores the graph data in the off-chip memory devices (e.g., on-board or host DRAM) that have large storage capacities but relatively small bandwidths, and processes individual small subgraphs one after another by using the on-chip memory devices (e.g., FFs, BRAM, and URAM) that have small storage capacities but superior random access performances. However, directly exchanging graph (vertex and edge) data between the processing units in FPGA chip with slow off-chip DRAMs during gridgraphprocessing leads to limited performances and excessive data transmission amounts between the FPGA chip and off-chip memory devices. In this article, we show that it is effective in improving the performance of grid-graphprocessing on DRAM-based FPGA hardware accelerators by leveraging the flexibility and programmability of FPGAs to build application-specific caching mechanisms, which bridge the performance gaps between on-chip and off-chip memory devices, and reduce the data transmission amounts by exploiting the localities on data accessing. We design two application-specific caching mechanisms (i.e., vertex caching and edge caching) to exploit two types of localities (i.e., vertex locality and subgraph locality) that exist in grid-graphprocessing, respectively. Experimental results show that with the vertex caching mechanism, our system (named as Fabgraph) achieves up to 3.1x and 2.5x speedups for BFS and PageRank, respectively, over Foregraph when processing medium graphs stored in the on-board DRAM. With the edge caching mech
暂无评论