Functional algorithmic skeletons promise a high-level programming interface for distributed-memory clusters that free developers from concerns of task decomposition, scheduling, and communication. Unfortunately, prior...
详细信息
this paper presents a method to design and auto-tune a new parallel 3-D FFT code using the non-blocking MPI all-to-all operation. We achieve high performance by optimizing computation-communication overlap. Our code p...
详细信息
Multithreaded programs execute nondeterministically on conventional architectures and operating systems. this complicates many tasks, including debugging and testing. Deterministic multithreading (DMT) makes the outpu...
详细信息
In fork-join parallelism, a sequential program is split into a directed acyclic graph of tasks linked by directed dependency edges, and the tasks are executed, possibly in parallel, in an order consistent withtheir d...
详细信息
In many application domains, FPGAS are now promoted as a way of getting round the restrictions of specific CPU designs on system scalability. However, in the current state-of-the art, programming FPGAS remains essenti...
详细信息
We present three lock-free data structures for priority task scheduling: a priority work-stealing one, a centralized one with ρ-relaxed semantics, and a hybrid one combining both concepts. Withthe single-source shor...
详细信息
ISBN:
(纸本)9781450326568
We present three lock-free data structures for priority task scheduling: a priority work-stealing one, a centralized one with ρ-relaxed semantics, and a hybrid one combining both concepts. Withthe single-source shortest path (SSSP) problem as example, we show how the different approaches affect the prioritization and provide upper bounds on the number of examined nodes. We argue that priority task scheduling allows for an intuitive and easy way to parallelize the SSSP problem, notoriously a hard task. Experimental evidence supports the good scalability of the resulting algorithm. the larger aim of this work is to understand the trade-offs between scalability and priority guarantees in task scheduling systems. We show that ρ-relaxation is a valuable technique for improving the first, while still allowing semantic constraints to be satisfied: the lock-free, hybrid κ-priority data structure can scale as well as work-stealing, while still providing strong priority scheduling guarantees, which depend on the parameter κ. Our theoretical results open up possibilities for even more scalable data structures by adopting a weaker form of ρ-relaxation, which still enables the semantic constraints to be respected.
online matching. Antescofo is a real-time system for performance coordination between musicians and computer processes during live music concert. ATP are used to define complex events that correspond to a combination ...
详细信息
ISBN:
(纸本)9781450329477
online matching. Antescofo is a real-time system for performance coordination between musicians and computer processes during live music concert. ATP are used to define complex events that correspond to a combination of perceived events in the musical environment as well as arbitrary logical and metrical temporal conditions. the real-time recognition of such event is used to trigger arbitrary actions in the style of event-condition-action rules. the musical context, the rationales of temporal patterns and several illustrative examples are introduced to motivate the design of ATP. the semantics of ATP matching is defined to parallelthe well-known notion of regular expression and Brzozowski's derivatives but extended to handle an infinite alphabet, arbitrary predicates, elapsing time and inhibitory conditions. this approach is compared to those developed in log auditing and for the runtime verification of realtime logics. ATP are implemented by translation into a core subset of the Antescofo domain-specific language. this compilation has proven efficient enough to avoid the extension of the real-time runtime of the language and has been validated with composers in actual pieces.
parallel programs consist of series of code sections with different thread-level parallelism (TLP). As a result, it is rather common that a thread in a parallel program, such as a GPU kernel in CUDA programs, still co...
详细信息
Tarjan's famous linear time, sequential algorithm for finding the strongly connected components (SCCs) of a graph relies on depth first search, which is inherently sequential. Deterministic parallel algorithms sol...
详细信息
ISBN:
(纸本)9781450326568
Tarjan's famous linear time, sequential algorithm for finding the strongly connected components (SCCs) of a graph relies on depth first search, which is inherently sequential. Deterministic parallel algorithms solve this problem in logarithmic time using matrix multiplication techniques, but matrix multiplication requires a large amount of total work. Randomized algorithms based on reachability - the ability to get from one vertex to another along a directed path - greatly improve the work bound in the average case. However, these algorithms do not always perform well;for instance, Divide-and-Conquer Strong Components (DCSC), a scalable, divide-and-conquer algorithm, has good expected theoretical limits, but can perform very poorly on graphs for which the maximum reachability of any vertex is small. A related algorithm, MultiPivot, gives very high probability guarantees on the total amount of work for all graphs, but this improvement introduces an overhead that increases the average running time. this work introduces SCCMulti, a multi-pivot improvement of DCSC that offers the same consistency as MultiPivot without the time overhead. We provide experimental results demonstrating SCCMulti's scalability;these results also show that SCCMulti is more consistent than DCSC and is always faster than MultiPivot.
Scale-out programs run on multiple processes in a cluster. In scale-out systems, processes can fail. Computations using traditional libraries such as MPI fail when any component process fails. the advent of Map Reduce...
详细信息
暂无评论