this paper presents a method to design and auto-tune a new parallel 3-D FFT code using the non-blocking MPI all-to-all operation. We achieve high performance by optimizing computation-communication overlap. Our code p...
详细信息
In fork-join parallelism, a sequential program is split into a directed acyclic graph of tasks linked by directed dependency edges, and the tasks are executed, possibly in parallel, in an order consistent withtheir d...
详细信息
Multithreaded programs execute nondeterministically on conventional architectures and operating systems. this complicates many tasks, including debugging and testing. Deterministic multithreading (DMT) makes the outpu...
详细信息
We present three lock-free data structures for priority task scheduling: a priority work-stealing one, a centralized one with ρ-relaxed semantics, and a hybrid one combining both concepts. Withthe single-source shor...
详细信息
ISBN:
(纸本)9781450326568
We present three lock-free data structures for priority task scheduling: a priority work-stealing one, a centralized one with ρ-relaxed semantics, and a hybrid one combining both concepts. Withthe single-source shortest path (SSSP) problem as example, we show how the different approaches affect the prioritization and provide upper bounds on the number of examined nodes. We argue that priority task scheduling allows for an intuitive and easy way to parallelize the SSSP problem, notoriously a hard task. Experimental evidence supports the good scalability of the resulting algorithm. the larger aim of this work is to understand the trade-offs between scalability and priority guarantees in task scheduling systems. We show that ρ-relaxation is a valuable technique for improving the first, while still allowing semantic constraints to be satisfied: the lock-free, hybrid κ-priority data structure can scale as well as work-stealing, while still providing strong priority scheduling guarantees, which depend on the parameter κ. Our theoretical results open up possibilities for even more scalable data structures by adopting a weaker form of ρ-relaxation, which still enables the semantic constraints to be respected.
parallel programs consist of series of code sections with different thread-level parallelism (TLP). As a result, it is rather common that a thread in a parallel program, such as a GPU kernel in CUDA programs, still co...
详细信息
Tarjan's famous linear time, sequential algorithm for finding the strongly connected components (SCCs) of a graph relies on depth first search, which is inherently sequential. Deterministic parallel algorithms sol...
详细信息
ISBN:
(纸本)9781450326568
Tarjan's famous linear time, sequential algorithm for finding the strongly connected components (SCCs) of a graph relies on depth first search, which is inherently sequential. Deterministic parallel algorithms solve this problem in logarithmic time using matrix multiplication techniques, but matrix multiplication requires a large amount of total work. Randomized algorithms based on reachability - the ability to get from one vertex to another along a directed path - greatly improve the work bound in the average case. However, these algorithms do not always perform well;for instance, Divide-and-Conquer Strong Components (DCSC), a scalable, divide-and-conquer algorithm, has good expected theoretical limits, but can perform very poorly on graphs for which the maximum reachability of any vertex is small. A related algorithm, MultiPivot, gives very high probability guarantees on the total amount of work for all graphs, but this improvement introduces an overhead that increases the average running time. this work introduces SCCMulti, a multi-pivot improvement of DCSC that offers the same consistency as MultiPivot without the time overhead. We provide experimental results demonstrating SCCMulti's scalability;these results also show that SCCMulti is more consistent than DCSC and is always faster than MultiPivot.
the paper is introducing the principles of a new global optimization strategy, Imperialistic Strategy (IS), applied to the Continuous Global Optimization Problem (CGOP). Inspired from existing multi-population strateg...
详细信息
ISBN:
(纸本)9781479984480
the paper is introducing the principles of a new global optimization strategy, Imperialistic Strategy (IS), applied to the Continuous Global Optimization Problem (CGOP). Inspired from existing multi-population strategies, like the Island Model (IM) approaches to parallel Evolutionary Algorithms (EA) and the Imperialistic Competitive Algorithm (ICA), the proposed IS method is considered an optimization strategy for the reason that it can integrate other well-known optimization methods, which in the context are regarded as sub-methods (although in other contexts they are prominent global optimization methods). Four optimization methods were implemented and tested in the roles of sub-methods: Genetic Algorithm (GA) (a floating-point representation variant), Differential Evolution (DE), Quantum Particle Swarm Optimization (QPSO) and Artificial Bee Colony (ABC). the optimization performances of the proposed optimization methods were compared on a test bed of 9 known multimodal optimization problems by applying an appropriate testing methodology. the obtained increased success rates of IS multi-population variants compared to the success rates of the optimization sub-methods run separately, combined withthe increased computing efficiencies possible to be perceived for parallel and distributed implementations, demonstrated that IS is a promising approach to CGOP.
Scale-out programs run on multiple processes in a cluster. In scale-out systems, processes can fail. Computations using traditional libraries such as MPI fail when any component process fails. the advent of Map Reduce...
详细信息
the industry shift emerging forms of parallel and distributed computing (PDC), including multi-core CPUs, cloud computing, and general-purpose use of GPUs, have naturally led to increased presence of PDC elements unde...
详细信息
ISBN:
(纸本)9781450326056
the industry shift emerging forms of parallel and distributed computing (PDC), including multi-core CPUs, cloud computing, and general-purpose use of GPUs, have naturally led to increased presence of PDC elements undergraduate Computer Science curriculum recommendations, such as the new and substantial "PD" knowledge area in the joint acm/IEEE CS2013 recommendations[1]. How can undergraduate students grasp the extensive and complex range of PDC principles and practices, and apply that knowledge in problem solving, while PDC technologies continue to evolve rapidly? parallel design patterns are descriptions of effective solutions to recurring parallelprogramming problems in particular contexts and have emerged from long-standing industry practice. parallel patterns occur at all computational levels, ranging from low-level concurrent execution patterns (such as message passing or thread pool patterns) to high-level software design patterns suitable for organizing entire systems or their components (such as model-view-control or pipe and filter patterns). the sheer number of parallel patterns, which reflect the full breadth and complexity of PDC, can be quite daunting for a newcomer. However, the ubiquity of parallel patterns in all forms of parallel and distributed computation makes these patterns relevant and illuminating at all undergraduate levels. Knowledge of parallel patterns, being reusable elements of parallel design, guides problem-solving during the creation of parallel programs;and those enduring design patterns remain relevant and useful as new PDC infrastructure emerges in this rapidly evolving field. this panel presents four viewpoints representing various approaches for teaching parallel patterns to CS undergraduates at multiple academic levels. Moderator Dick Brown co-directs (with Adams and Shoop) the CSinparallel project (***), which produces and shares modular materials for incrementally adding parallelism to existing undergraduate comp
In this work we propose novel algorithms for storing and evaluating sparse grid functions, operating on regular (not spatially adaptive), yet potentially dimensionally adaptive grid types. Besides regular sparse grids...
详细信息
暂无评论