In this paper we consider the problem of generating random permutations on small parallel machines. The machines that we have in mind are shared memory machines with a constant number of processors such as the Sequent...
详细信息
ISBN:
(纸本)0897913701
In this paper we consider the problem of generating random permutations on small parallel machines. The machines that we have in mind are shared memory machines with a constant number of processors such as the Sequent Symmetry. We describe a parallel implementation of the 'shuffling' algorithm for generating a random permutation. If the hardware operates in a fair manner, this algorithm generates a fully random permutation. However, if the machine resolves contention in a malicious manner, then the algorithm does not generate permutations uniformly. We give almost tight bounds on the degree that an adversary can reduce the randomness. We also discuss the cost of locking data in the algorithm and present a method of generating random permutations with substantially reduced locking cost.
A perfect hash function for a (multi)set X of n integers is an infective function h : X → {1,., s}, where s = O(n), that can be stored in O(n) space and evaluated in constant time by a single processor. We show that ...
详细信息
We consider the question of whether a shared-memory model can serve as an effective bridging model for parallel computation along the lines of a distributed-memory model such as the BSP. As a candidate for a shared-me...
详细信息
We consider the question of whether a shared-memory model can serve as an effective bridging model for parallel computation along the lines of a distributed-memory model such as the BSP. As a candidate for a shared-memory bridging model, we introduce the Queuing Shared Memory (QSM) model, which accounts for limited communication bandwidth while still providing a simple shared-memory abstraction. We substantiate the ability of the QSM to serve as a bridging model by providing a simple work-preserving emulation of the QSM on both the BSP, and on a related model, the (d,x)-BSP. We present evidence that the features of the QSM are essential to its effectiveness as a bridging model. In addition, we describe scenarios in which the high-level QSM is more suited for analyzing algorithms on certain machines than the more detailed BSP and LogP models. Finally, we present algorithmic results for the QSM, as well as general strategies for mapping algorithms designed for the BSP or PRAM models onto the QSM model. Our main conclusion is that shared-memory models can potentially serve as viable alternatives to existing message-passing, distributed-memory bridging models.
Some parallelalgorithms have the property that, as they are allowed to take more time, the total work that they do is reduced. This paper describes three such algorithms that find the strongly connected components of...
详细信息
In this paper we investigate algorithmic instruments leading to low power consumption in computing devices. While previous work on energy-efficient algorithms has mostly focused on single processor environments, in th...
详细信息
ISBN:
(纸本)9781595936677
In this paper we investigate algorithmic instruments leading to low power consumption in computing devices. While previous work on energy-efficient algorithms has mostly focused on single processor environments, in this paper we investigate multi-processor settings. We study the basic problem of scheduling a set of jobs, each specified by a release time, a deadline and a processing volume, on variable speed processors so as to minimize the total energy consumption. We first settle the complexity of speed scaling with unit size jobs. More specifically, we devise a polynomial time algorithm for agreeable deadlines and prove NP-hardness results for arbitrary release dates and deadlines. For the latter setting we also develop a polynomial time algorithm achieving a constant factor approximation guarantee that is independent of the number of processors. Additionally, we study speed scaling of jobs with arbitrary processing requirements and, again, develop constant factor approximation algorithms. We finally transform our offline algorithms into constant competitive online strategies.
The proceedings contain 54 papers. The topics discussed include: expediting hazard pointers with bounded RCU critical sections;Alock: asymmetric lock primitive for RDMA systems;when is parallelism fearless and zero-co...
ISBN:
(纸本)9798400704161
The proceedings contain 54 papers. The topics discussed include: expediting hazard pointers with bounded RCU critical sections;Alock: asymmetric lock primitive for RDMA systems;when is parallelism fearless and zero-cost with rust?;efficient parallel reinforcement learning framework using the reactor model;parallel best arm identification in heterogeneous environments;brief announcement: lock-free learned search data structure;brief announcement: LIT: lookup interlocked table for range queries;brief announcement: a fast scalable detectable unrolled lock-based linked list;scheduling out-trees online to optimize maximum flow;optimizing dynamic data center provisioning through speed scaling: a primal-dual perspective;scheduling jobs with work-inefficient parallel solutions;and multi bucket queues: efficient concurrent priority scheduling.
Let T be a collection of data templates of an N × N matrix, T = {row, column, forward diagonal, backward diagonal}. In the context of parallel processing, the question of whether it is possible or not for a paral...
详细信息
A parallel variant of breadth-first search for distributed computing is presented. The variant allows exhaustive enumeration of elements of a search space (implicitly defined graph) in which the representation of all ...
详细信息
ISBN:
(纸本)9780897918909
A parallel variant of breadth-first search for distributed computing is presented. The variant allows exhaustive enumeration of elements of a search space (implicitly defined graph) in which the representation of all graph nodes would otherwise require more than the total available memory. This algorithm requires the use of a tadpole data structure to partition the search space into connected subgraphs, with each subgraph stored within the memory of a single processor. Thus, the graph of the nodes are stored in a way that adds greater spatial locality, thereby reducing communication among processors. The algorithm enumerates the tadpoles in a breadth-first manner while executing depth-first search within each tadpole. The search within each tadpole is reduced to tree search. A parameter is defined that allows a linear tradeoff in which memory and communication are each reduced linearly as CPU time grows. The result appears to fill a gap in the literature, which has concentrated on parallel depth-first search, but has been relatively sparse in the case of parallel breadth-first search.
This paper considers the problem of creating message-passing protocols for parallel computers. It is assumed that the processors are connected by a network that provides guaranteed delivery of every message, provided ...
详细信息
This paper investigates the parallel time and processor complexities of several searching problems involving Monge and Monge-composite arrays. We present array-searching algorithms for concurrent-read-concurrent-write...
详细信息
ISBN:
(纸本)0897913701
This paper investigates the parallel time and processor complexities of several searching problems involving Monge and Monge-composite arrays. We present array-searching algorithms for concurrent-read-concurrent-write (CRCW) PRAMs, concurrent-read-exclusive-write (CREW) PRAMs, hypercubes, cube-connected-cycles, and shuffle-exchange networks. All these algorithms run in optimal time, and their processor-time products are all within an O(lg n) factor of the worst-case sequential bounds. Several applications of these algorithms are also given. Two applications improve previous results substantially, and the others provide novel parallelalgorithms for problems not previously considered.
暂无评论