The proceedings contain 37 papers. The topics discussed include: on triangulation of simple networks;strong-diameter decompositions of minor free graphs;approximation algorithms for multiprocessor scheduling under unc...
详细信息
ISBN:
(纸本)159593667X
The proceedings contain 37 papers. The topics discussed include: on triangulation of simple networks;strong-diameter decompositions of minor free graphs;approximation algorithms for multiprocessor scheduling under uncertainty;scheduling DAGs on asynchronous processors;scheduling to minimize gaps and power consumption;cache-oblivious streaming B-trees;an experimental comparison of cache-oblivious and cache-conscious programs;scheduling threads for constructive cache sharing on CMPs;proximity-aware directory-based coherence for multi-core processor architectures;a parallel dynamic programming algorithm on a multi-core architecture;tight bounds for distributed selection;local MST computation with short advice;distributed approximation of capacitated dominating sets;packing to angles and sectors;and the notion of a timed register and its application to indulgent synchronization.
In this paper we investigate algorithmic instruments leading to low power consumption in computing devices. While previous work on energy-efficient algorithms has mostly focused on single processor environments, in th...
详细信息
ISBN:
(纸本)9781595936677
In this paper we investigate algorithmic instruments leading to low power consumption in computing devices. While previous work on energy-efficient algorithms has mostly focused on single processor environments, in this paper we investigate multi-processor settings. We study the basic problem of scheduling a set of jobs, each specified by a release time, a deadline and a processing volume, on variable speed processors so as to minimize the total energy consumption. We first settle the complexity of speed scaling with unit size jobs. More specifically, we devise a polynomial time algorithm for agreeable deadlines and prove NP-hardness results for arbitrary release dates and deadlines. For the latter setting we also develop a polynomial time algorithm achieving a constant factor approximation guarantee that is independent of the number of processors. Additionally, we study speed scaling of jobs with arbitrary processing requirements and, again, develop constant factor approximation algorithms. We finally transform our offline algorithms into constant competitive online strategies.
As the number of cores increases on chip multiprocessors, coherence is fast becoming a central issue for multi-core performance. This is exacerbated by the fact that interconnection speeds are not scaling well with te...
详细信息
ISBN:
(纸本)9781595936677
As the number of cores increases on chip multiprocessors, coherence is fast becoming a central issue for multi-core performance. This is exacerbated by the fact that interconnection speeds are not scaling well with technology. This paper describes mechanisms to accelerate coherence for a multi-core architecture that has multiple private L2 caches and a scalable point-to-point interconnect between cores. These techniques exploit the differences in geometry between chip multiprocessors and traditional multiprocessor architectures. Directory-based protocols have been proposed as a scalable alternative to snoop-based protocols. In this paper, we discuss implementations of coherence for CMPs and propose and evaluate a novel directory-based coherence scheme to improve the performance of parallel programs on such processors. Proximity-aware coherence accelerates read and write misses by initiating cache-to-cache transfers from the spatially closest sharer. This has the dual benefit of eliminating unnecessary accesses to off-Chip memory, and minimizing the distance over which communicated data moves across the network. The proposed schemes result in speedups up to 74.9% for our workloads.
Given a set of points P subset of R-2, a conflict free coloring of P w.r.t. rectangle ranges is an assignment of colors to points of P, such that each non-empty axis-parallel rectangle T in the plane contains a point ...
详细信息
ISBN:
(纸本)9781595936677
Given a set of points P subset of R-2, a conflict free coloring of P w.r.t. rectangle ranges is an assignment of colors to points of P, such that each non-empty axis-parallel rectangle T in the plane contains a point whose color is distinct from all other points in P boolean AND T. This notion has been the subject of recent interest, and is motivated by frequency assignment in wireless cellular networks: one naturally would like to minimize the number of frequencies (colors) assigned to bases stations (points), such that within any range (for instance, rectangle), there is no interference. We show that any set of n points in R-2 can be conflict-free colored with (O) over tilde (n(beta+epsilon)) colors in expected polynomial time, for any arbitrarily small epsilon > 0 and beta = 3-root 5/2 < 0.382. This improves upon the previously known bound of O(root n log log n/ log n).
In chip multiprocessors (CMPs), limiting the number of offchip cache misses is crucial for good performance. Many multithreaded programs provide opportunities for constructive cache sharing, in which concurrently sche...
详细信息
ISBN:
(纸本)9781595936677
In chip multiprocessors (CMPs), limiting the number of offchip cache misses is crucial for good performance. Many multithreaded programs provide opportunities for constructive cache sharing, in which concurrently scheduled threads share a largely overlapping working set. In this paper, we compare the performance of two state-of-the-art schedulers proposed for fine-grained multithreaded programs: parallel Depth First (PDF), which is specifically designed for constructive cache sharing, and Work Stealing (WS), which is a more traditional design. Our experimental results indicate that PDF scheduling yields a 1.3-1.6X performance improvement relative to WS for several fine-grain parallel benchmarks on projected future CMP configurations;we also repert several issues that may limit the advantage of PDF in certain applications. These results also indicate that PDF more effectively utilizes off-chip bandwidth, making it possible to trade-off on-chip cache for a larger number of cores. Moreover, we find that task granularity plays a key role in cache performance. Therefore;we present an automatic approach for selecting effective grain sizes, based on a new working set profiling algorithm that is an order of magnitude faster than previous approaches. This is the first paper demonstrating the effectiveness of PDF on real benchmarks;providing a direct comparison between PDF and WS;revealing the limiting factors for PDF in practice, and presenting an approach for overcoming these factors.
This paper addresses the problem of scheduling a DAG of unit-length tasks on asynchronous processors, that is, processors having different and changing speeds. The objective is to minimize the makespan, that is, the t...
详细信息
ISBN:
(纸本)9781595936677
This paper addresses the problem of scheduling a DAG of unit-length tasks on asynchronous processors, that is, processors having different and changing speeds. The objective is to minimize the makespan, that is, the time to execute the entire DAG. Asynchrony is modeled by an oblivious adversary, which is assumed to determine the processor speeds at each point in time. The oblivious adversary may change processor speeds arbitrarily and arbitrarily often, but makes speed decisions independently of any random choices of the scheduling algorithm. This paper gives bounds on the makespan of two randomized online firing-squad scheduling algorithms, ALL and LEVEL. These two schedulers are shown to have good makespan even when asynchrony is arbitrarily extreme. Let W and D denote, respectively, the number of tasks and the longest path in the DAG, and let gave denote the average speed of the p processors during the execution. In ALI, each processor repeatedly chooses a random task to execute from among all ready tasks (tasks whose predecessors have been executed). Scheduler ALL is shown to have a makespan T-p = circle minus (W/p pi(ave)), when W/D >= p log p circle minus ((log p)(alpha) W/p pi(ave) + (log p)(1-alpha) D/pi(ave)), when W/D = p(log p)(1-2 alpha), for alpha is an element of [0,1] circle minus (D/pi(ave)), when W/D <= p/log p, both expected and with high probability. A family of DAGs is exhibited for which this analysis is tight. In LEVEL each of the processors repeatedly chooses a random task to execute from among all critical tasks (ready tasks at the lowest level of the DAG). This second scheduler is shown to have a makespan of T-p = circle minus (W/p pi(ave) + [log* p - log* (pD/W)] W/pi(ave)), both expected and with high probability. Thus, LEVEL is always at least as good as ALL asymptotically, and sometimes better.
The proceedings contain 43 papers. The topics discussed include: publish and perish: definition and analysis of an n-person publication impact game;exponential separation of quantum and classical online space complexi...
详细信息
ISBN:
(纸本)1595934529
The proceedings contain 43 papers. The topics discussed include: publish and perish: definition and analysis of an n-person publication impact game;exponential separation of quantum and classical online space complexity;minimizing the stretch when scheduling flows of biological requests;position paper and brief announcement: the FG programming environment - good and good for you;efficient parallelalgorithms for dead sensor diagnosis and multiple access channels;on the communication complexity of randomized broadcasting in random-like graphs;strip packing with precedence constraints and strip packing with release times;on space-stretch trade-offs: lower bounds;a performance analysis of local synchronization;the cache complexity of multithreaded cache oblivious algorithms;and deterministic load balancing and dictionaries in the parallel disk model.
暂无评论