Traditionally distributed graph processing systems have largely focused on scalability through the optimizations of inter-node communication and load balance. However, they often deliver unsatisfactory overall process...
详细信息
ISBN:
(纸本)9781931971331
Traditionally distributed graph processing systems have largely focused on scalability through the optimizations of inter-node communication and load balance. However, they often deliver unsatisfactory overall processing efficiency compared with shared-memory graph computing frameworks. We analyze the behavior of several graph-parallel systems and find that the added overhead for achieving scalability becomes a major limiting factor for efficiency, especially with modern multi-core processors and high-speed interconnection networks. Based on our observations, we present Gemini, a distributed graph processing system that applies multiple optimizations targeting computation performance to build scalability on top of efficiency. Gemini adopts (1) a sparse-dense signal-slot abstraction to extend the hybrid push-pull computation model from shared-memory to distributed scenarios, (2) a chunk-based partitioning scheme enabling low-overhead scaling out designs and locality-preserving vertex accesses, (3) a dual representation scheme to compress accesses to vertex indices, (4) NUMA-aware sub-partitioning for efficient intra-node memory accesses, plus (5) locality-aware chunking and fine-grained work-stealing for improving both inter-node and intra-node load balance, respectively. Our evaluation on an 8-node high-performance cluster (using five widely used graph applications and five real-world graphs) shows that Gemini significantly outperforms all well-known existing distributed graph processing systems, delivering up to 39.8 x (from 8.91x) improvement over the fastest among them.
the importance of large-scale data analysis has shown a recent increase in a wide variety of areas, such as natural language processing, sensor data analysis, and scientific computing. Such an analysis application typ...
详细信息
ISBN:
(纸本)9781509024537
the importance of large-scale data analysis has shown a recent increase in a wide variety of areas, such as natural language processing, sensor data analysis, and scientific computing. Such an analysis application typically reuses existing programs as components and is often required to continuously process new data with low latency while processing large-scale data on distributed computation nodes. However, existing frameworks for combining programs into a parallel data analysis pipeline (e.g., workflow) are plagued by the following issues: (1) Most frameworks are oriented toward high-throughput batch processing, which leads to high latency. (2) A specific language is often imposed for the composition and/or such a specific structure as a simple unidirectional dataflow among constituting tasks. (3) A program used as a component often takes a long time to start up due to the heavy load at initialization, which is referred to as the startup overhead. Our solution to these problems is a remote procedure call (RPC)-based composition, which is achieved by our middleware Rapid Service Connector (RaSC). RaSC can easily wrap an ordinary program and make it accessible as an RPC service, called a RaSC service. Using such component programs as RaSC services enables us to integrate them into one program with low latency without being restricted to a specific workflow language or dataflow structure. In addition, a RaSC service masks the startup overhead of a component program by keeping the processes of the component program alive across RPC requests. We also proposed architecture that automatically manages the number of processes to maximize the throughput. the experimental results showed that our approach excels in overall throughput as well as latency, despite its RPC overhead. We also showed that our approach can adapt to runtime changes in the throughput requirements.
the proceedings contain 39 papers. the topics discussed include: DVS: dynamic variable-width striping RAID for shingled write disks;cooperative bandwidth sharing for 5G heterogeneous network using game theory;active b...
ISBN:
(纸本)9781509033157
the proceedings contain 39 papers. the topics discussed include: DVS: dynamic variable-width striping RAID for shingled write disks;cooperative bandwidth sharing for 5G heterogeneous network using game theory;active burst-buffer: in-transit processing integrated into hierarchical storage;DS-Index: a distributed search solution for federated cloud;assessing advanced technology in CENATE;efficient parity update for scaling raid-like storage systems;a stripe-oriented write performance optimization for RAID-structured storage systems;CircularCache: scalable and adaptive cache management for massive storage systems;a kind of FTL scheme which keeps the high performance and lowers the capacity of ram occupied by mapping table;distributed slot scheduling algorithm for hybrid CSMA/TDMA MAC in wireless sensor networks;correlating hardware performance events to CPU and dram power consumption;dynamic power-performance adjustment on clustered multi-threading processors;a high-performance persistent identification concept;hybrid replication: optimizing network bandwidth and primary storage performance for remote replication;GPU-ABFT: optimizing algorithm-based fault tolerance for heterogeneous systems with GPUs;and improving read performance of SSDs via balanced redirected read.
暂无评论