From social networks to language modeling, wing scale and importance of graph data has driven thedevelopwenl of numerous new graph-parallel systems (e.g Giraph and GraphLab). By restricting the types of computation th...
详细信息
ISBN:
(纸本)9781450327459
From social networks to language modeling, wing scale and importance of graph data has driven thedevelopwenl of numerous new graph-parallel systems (e.g Giraph and GraphLab). By restricting the types of computation that can be expressed and by introducing new techniques to partition and distribute graphs, these systems can efficiently execute sophisticated graph algorithms orders of magnitude faster than more general data-parallel systems. However, the same restrictions that enable graph-parallel systems to achieve substantial performance gains also limit their ability to express many of the important stages in a typical grapil-analytics pipeline. Moreover, while graph parallel systems are optimized for iterative diffusion algorithms like PageRank they are not well suited for more basic tasks like constructing the graph, modifying its structure, or expressing computation that, spans multiple graphs. While existing systems address specific stages of a typical graphanalytics pipeline, they do not address the entire pipeline, forcing the user to deal with multiple systems, complex and brittle file interfaces, and inefficient data-movement, and duplication. To fill the need for a holistic approach to graph-analytics we introduce GraphX, which unifies graph and data parallel computation under a single API and system. GraphX recasts advances in graph-processing in the context of relational algebra and distributed join optimization enabling more general data-parallel systems to process graphs efficiently. We evaluate the GraphX system on several real world tasks and show that its end-to-end performance can exceed that of specialized systems. This talk describes recent work at the PC Berkeley AM-PLab in collaboration with Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ton Stolen.
Isolating computation and communication concerns into separate pure computation and pure coordination modules enhances modularity, understandability, and reusability of parallel and/or distributed software. MANIFOLD i...
详细信息
Software testing is an important process to evaluate whether the developed software applications meet the required specifications. There is an emerging need for testing frameworks for big data software projects to ens...
详细信息
ISBN:
(纸本)9781665439022
Software testing is an important process to evaluate whether the developed software applications meet the required specifications. There is an emerging need for testing frameworks for big data software projects to ensure the quality of the big data applications and satisfy the user requirements. In this study, we propose a software testing framework that can be utilized in big data projects both in e-science and e-commerce. In particular, we design the proposed framework to test big data-based recommendation applications. To show the usability of the proposed framework, we provide a reference prototype implementation and use the prototype to test a big data recommendation application. We apply the prototype implementation to test both functional and non-functional methods of the recommendation application. The results indicate that the proposed testing framework is usable and efficient for testing the recommendation systems that use big data processingtechniques.
Software pipelining for nested loops remains a challenging problem for embedded system design. The existing software pipelining techniques for single loops can only explore the parallelism of the innermost loop, so th...
详细信息
Software pipelining for nested loops remains a challenging problem for embedded system design. The existing software pipelining techniques for single loops can only explore the parallelism of the innermost loop, so the final timing performance is inferior. While multi-dimensional (MD) retiming can explore the outer loop parallelism, it introduces large overheads in loop index generation and code size due to transformation. In this paper, we use MD retiming to model the software pipelining problem of nested loops. We show that the computation time and code size of a software-pipelined loop nest is affected by execution sequence and retiming function. The algorithm of Software Pipelining for NEsted loops technique (SPINE) is proposed to generate fully parallelized loops efficiently with the overheads as small as possible. The experimental results show that our technique outperforms both the standard software pipelining and MD retiming significantly.
In this article we present CORBA Lightweight Components, CORBA–LC, a new network-centered reflective component model which allows building distributedapplications assembling binary independent components spread on t...
详细信息
Operator-based programming languages provide an effective development model for large scale stream processingapplications. A stream processing application consists of many runtime deployable software processing eleme...
详细信息
The proceedings contain 82 papers. The topics discussed include: run-time automatic performance tuning for multicore applications;exploiting cache traffic monitoring for run-time race detection;accelerating data race ...
ISBN:
(纸本)9783642233968
The proceedings contain 82 papers. The topics discussed include: run-time automatic performance tuning for multicore applications;exploiting cache traffic monitoring for run-time race detection;accelerating data race detection with minimal hardware support;event log mining tool for large scale HPC systems;reducing the overhead of direct application instrumentation using prior static analysis;reducing energy usage with memory and computation-aware dynamic frequency scaling;using the last-mile model as a distributed scheme for available bandwidth prediction;self-stabilization versus robust self-stabilization for clustering in ad-hoc network;multilayer cache partitioning for multiprogram workloads;parallel inexact constraint preconditioners for saddle point problems;hardware and software tradeoffs for task synchronization on manycore architectures;and progress guarantees when composing lock-free objects.
In multiprocessors system, crossbar scheduling networks have been widely used for system interconnection among processors or modules in SOC (System on Chip). In this paper, a parallel SAR (Synthetic Aperture Radar) im...
详细信息
The computing industry has undergone several paradigm shifts in the last few decades. Fueled by the need of faster computing, larger data and real time processing needs parallel computing has emerged as one of the dom...
The computing industry has undergone several paradigm shifts in the last few decades. Fueled by the need of faster computing, larger data and real time processing needs parallel computing has emerged as one of the dominant paradigms. Motivated by the success achieved in distributed computing mod- els and the limitations faced by single core processors, parallel computing is the only alternative for building faster computers. parallel computing is one of the most challenging areas computer science in the present and developing algorithms and optimization techniques for utilizing the processing power present in a current generation parallel computer is still a very exciting area for research. The parallel computing industry underwent a massive shift with the conventional sequential com- puters hitting the power wall. It led to the development of multicore and many-core computing chips that had multiple sequential computing cores packed into a single chip. The immediate impact was the need for (re)designing sequential algorithms in order to utilize the computing power of such chips. Combined with the intricate memory and cache structures, parallel algorithms require a higher degree of engineering for the most optimal performance. The many-core revolution started with the release of Graphics processing Units (GPU) which had a large number of compute cores and offered massive parallelism. With the evolution of the many-core chips, the GPUs found application in graphics, gaming as well as general purpose computation. In the same time frame, the Central processing Units (CPU) too under went a sea of innovation and emerged as more powerful and mature computing machines. However, the multicore CPUs were mostly ignored in its initial days. With the advancementof accelerator platforms, the CPUs and GPUs are now able to communicate in a more efficient manner. In the recent times there has been quite a few works such as the ones in [79, 91, 43] that shows that hybrid algorithms
Numerous problems in science and engineering involve discretizing the problem domain as a regular structured grid and make use of domain decomposition techniques to obtain solutions faster using high performance compu...
详细信息
ISBN:
(纸本)9781479980062
Numerous problems in science and engineering involve discretizing the problem domain as a regular structured grid and make use of domain decomposition techniques to obtain solutions faster using high performance computing. However, the load imbalance of the workloads among the various processing nodes can cause severe degradation in application performance. This problem is exacerbated for the case when the computational workload is non-uniform and the processing nodes have varying computational capabilities. In this paper, we present novel local search algorithms for regular partitioning of a structured mesh to heterogeneous compute nodes in a distributed setting. The algorithms seek to assign larger workloads to processing nodes having higher computation capabilities while maintaining the regular structure of the mesh in order to achieve a better load balance. We also propose a distributed memory (MPI) parallelization architecture that can be used to achieve a parallel implementation of scientific modeling software requiring structured grids on heterogeneous processing resources involving CPUs and GPUs. Our implementation can make use of the available CPU cores and multiple GPUs of the underlying platform simultaneously. Empirical evaluation on real world flood modeling domains on a heterogeneous architecture comprising of multicore CPUs and GPUs suggests that the proposed partitioning approach can provide a performance improvement of up to 8x over a naive uniform partitioning.
暂无评论