Task graphs or dependence graphs are used in runtime systems to schedule tasks for parallel execution. In problem domains such as dense linear algebra and signal processing, dependence graphs can be generated from a p...
详细信息
ISBN:
(纸本)9781450328357
Task graphs or dependence graphs are used in runtime systems to schedule tasks for parallel execution. In problem domains such as dense linear algebra and signal processing, dependence graphs can be generated from a program by static analysis. However, in emerging problem domains such as graph analytics, the set of tasks and dependences between tasks in a program are complex functions of runtime values and cannot be determined statically. In this paper, we introduce a novel approach for exploiting parallelism in such programs. This approach is based on a data structure called the kinetic dependence graph (KDG), which consists of a dependence graph together with update rules that incrementally update the graph to reflect changes in the dependence structure whenever a task is completed. We have implemented a simple programming model that allows programmers to write these applications at a high level of abstraction, and a runtime within the Galois system [15] that builds the KDG automatically and executes the program in parallel. On a suite of programs that are difficult to parallelize otherwise, we have obtained speedups of up to 33 on 40 cores, out-performing third-party implementations in many cases.
Task-based execution of graph workloads allows various ordered and unordered implementations, with tasks representing dependencies between graph vertices and edges. This work explores graph algorithms in the context o...
详细信息
Task-based execution of graph workloads allows various ordered and unordered implementations, with tasks representing dependencies between graph vertices and edges. This work explores graph algorithms in the context of ordered and unordered task-based implementations, that trade-off work-efficiency with parallelism. The monotonicity of convergent graph solutions is the reason behind the trade-off between work-efficiency and parallelism. This trade-off results in variable performance-based choices within and across different machines (CPUs and GPUs), graph algorithms, implementations (ordered, relaxed, and unordered). Input graphs also augment this choice space, with this work analyzing temporally changing graphs in addition to the static graphs explored by prior works. These algorithmic and architectural choices are first explored in this work, and it is seen that different graph workload-input combinations perform optimally on diverse architectural configurations. The resulting choice space is analyzed and this work represents it in the form of characteristic variables that correlate with each choice space. Using these characteristic variables, this work proposes analytical and neural network models to correlate these choice spaces to find the best performing implementation. The variables and the prediction models proposed in this work are also integrated with a state-of-the-art performance predictor on a multiaccelerator setup, and shows geometric performance gains of 54% on a CPU, 14% on a GPU, and 31.5% in a multiaccelerator setup over baseline implementations without performance prediction.
暂无评论