The goal of the μDatabase project is to understand the behaviour of data structures and their algorithms, both parallel and sequential, in a memory-mapped environment. Memory mapping allows primary-memory pointer-bas...
详细信息
The goal of the μDatabase project is to understand the behaviour of data structures and their algorithms, both parallel and sequential, in a memory-mapped environment. Memory mapping allows primary-memory pointer-based data structures to be stored on secondary storage, and subsequently traversed and modified, without transforming the embedded pointers. The project incorporates a collaboration between practitioners and theoreticians and has produced a toolkit to support a parallel memory-mapped environment, extensive concurrency tools, an analytical model for the system validated by experiments, and preliminary results involving parallel database join algorithms as well as sequential B-tree and R-tree data structures. Future work includes augmenting the toolkit, developing and testing more parallelalgorithms tuned for efficiency in a memory-mapped environment, and extending the current model to cover more general algorithms and capture more low-level details of the physical machine.
We describe a very simple deterministic parallel algorithm for linear programming in fixed dimension d that takes poly(log log n) time in the common CRCW PRAM model and does optimal O(n) work. Our algorithm is based o...
详细信息
ISBN:
(纸本)9780897918091
We describe a very simple deterministic parallel algorithm for linear programming in fixed dimension d that takes poly(log log n) time in the common CRCW PRAM model and does optimal O(n) work. Our algorithm is based on multidimensional search and an effective use of approximation algorithms to speed-up the basic search in the CRCW model. Our method also yields very fast poly(log log n) algorithm for smallest enclosing sphere and approximate ham-sandwich cuts as well as an O(log n) time work-optimal algorithm for exact ham-sandwich cuts of separable point sets. For all these problems, particularly for the fixed-dimensional linear programming, o(log n) time efficient deterministic PRAM algorithms were not known until very recently.
Gang scheduling is a resource management scheme for parallel and distributed systems that combines time-sharing with space-sharing to ensure short response times for interactive tasks and high overall system throughpu...
详细信息
Gang scheduling is a resource management scheme for parallel and distributed systems that combines time-sharing with space-sharing to ensure short response times for interactive tasks and high overall system throughput. In this paper, we present and analyze a queueing theoretic model for a general gang scheduling scheme that forms the basis of a multiprogramming environment currently being developed for IBM's SP2 parallel system and for clusters of workstations. Our model and analysis can be used to tune our scheduler in order to maximize its performance on each hardware platform.
The Bulk-Synchronous parallel (BSP) model was proposed by Valiant as a model for general-purpose parallel computation. The objective of the model is to allow the design of parallel programs that can be executed effici...
详细信息
The Bulk-Synchronous parallel (BSP) model was proposed by Valiant as a model for general-purpose parallel computation. The objective of the model is to allow the design of parallel programs that can be executed efficiently on a variety of architectures. While many theoretical arguments in support of the BSP model have been presented, the degree to which the model can be efficiently utilized on existing parallel machines remains unclear. To explore this question, we implemented s small library of BSP functions, called the Green BSP library, on several parallel platforms. We also created a number of parallel applications based on this library. Here, we report on the performance of six of these applications on three different parallel platforms. Our preliminary results suggest that the BSP model can be used to develop efficient and portable programs for a range of machines and applications.
A quantitative comparison of the BSP and LogP models for parallel computation is developed. Very efficient cross simulations between the two models are derived, showing their substantial equivalence for algorithmic de...
详细信息
ISBN:
(纸本)9780897918091
A quantitative comparison of the BSP and LogP models for parallel computation is developed. Very efficient cross simulations between the two models are derived, showing their substantial equivalence for algorithmic design guided by asymptotic analysis. It is also shown that the two models can be implemented with similar performance on most point-to-point networks. In conclusion, within the limits of our analysis that is mainly of asymptotic nature, BSP and LogP can be viewed as closely related variants within the bandwidth-latency framework for modeling parallel computation. BSP seems somewhat preferable due to greater simplicity and portability, and slightly greater power.
NEighborhood MOdeling (NEMO) provides transparent parallelism to develop parallel spatial data processing. It addresses the following parallel issues: architecture and machine independence;communication bottlenecks;da...
详细信息
NEighborhood MOdeling (NEMO) provides transparent parallelism to develop parallel spatial data processing. It addresses the following parallel issues: architecture and machine independence;communication bottlenecks;data visualization;casualty errors;load balancing;and data coherence. NEMO is capable of processing three types of time consuming raster neighborhood models: cellular automata;propagation;and neighborhood analysis. NEMO achieves this flexibility by including five components to its design: the three application drivers such as the cellular automata driver, propagation driver, and neighborhood analysis automata driver;and the display manager and raster database manager.
The issue of effectiveness of private caches for processors were studied. Since time for all processors to access the shared memory simultaneously is usually much longer than the time for a processor to access its own...
详细信息
ISBN:
(纸本)9780897918091
The issue of effectiveness of private caches for processors were studied. Since time for all processors to access the shared memory simultaneously is usually much longer than the time for a processor to access its own private cache, scheduling with private caches falls into the distributed memory model where the lower bound applies. The effectiveness of private caches were shown by proving that a version of Dynamic Equi-partition Scheduling Policy (DEQ) achieves a mean response time with five times the optimal mean response time in the cache clock time for a large class of parallel jobs well accepted in the parallel scheduling community. This shows an improvement of system performance by using private caches over that of purely shared memory.
We present new BSP algorithms for deterministic sorting and randomized median finding. We sort n general keys by using a partitioning scheme that achieves the requirements of efficiency (one-optimality) and insensitiv...
详细信息
ISBN:
(纸本)9780897918091
We present new BSP algorithms for deterministic sorting and randomized median finding. We sort n general keys by using a partitioning scheme that achieves the requirements of efficiency (one-optimality) and insensitivity against data skew (the accuracy of the splitting keys depends solely on the step distance, which can be adapted to meet the worst-case requirements of our application). Although we employ sampling in order to realize efficiency, we can give a precise worst-case estimation of the maximum imbalance which might occur. We also investigate optimal randomized BSP algorithms for the problem of finding the median of n elements that require, with high-probability, 3n/(2p)+o(n/p) number of comparisons, for a wide range of values of n and p. Experimental results for the two algorithms are also presented.
We present parallelalgorithms that maintain a 2-3 tree under insertions and deletions. The algorithms are designed for an extension of Valiant's BSP model, BSP*, that and reduction of the overhead involved in com...
详细信息
ISBN:
(纸本)9780897918091
We present parallelalgorithms that maintain a 2-3 tree under insertions and deletions. The algorithms are designed for an extension of Valiant's BSP model, BSP*, that and reduction of the overhead involved in communication. The BSP*-model is introduced by Baumker et al. in [2]. Our analysis of the data structure goes beyond standard asymptotic analysis: We use Valiant's notion of c-optimality. Intuitively c-optimal algorithms tend to speedup p/c with growing input size (p denotes the number of processors), where the communication time is asymptotically smaller than the computation time. Our first approach allows 1-optimal searching and amortized c-optimal insertion and deletion for a small constant c. The second one allows 2-optimal searching, and c-optimal deletion and insertion for a small constant c. Both results hold with probability 1-o(1) for wide ranges of BSP*-parameters, where the ranges become larger with growing input sizes. The first approach allows much larger ranges. Further, both approaches are memory efficient, their total amount of memory used is proportional to the size m of the set being stored. Our results improve previous results by supporting a fully dynamic search tree rather than a static one, and by significantly reducing the communication time. Further our algorithms use blockwise communication.
The computational requirements for an adaptive solution of unsteady problems change as the simulation progresses. This causes workload imbalance among processors on a parallel machine which, in turn, requires signific...
详细信息
The computational requirements for an adaptive solution of unsteady problems change as the simulation progresses. This causes workload imbalance among processors on a parallel machine which, in turn, requires significant data movement at runtime. We present a dynamic load-balancing framework, called JOVE, that balances the workload across all processors with a global view each time the computational mesh is adapted. JOVE has been implemented on an SP2 in MPI for portability. Experimental results for two model meshes demonstrate that mesh adaption with load balancing gives more than a sixfold improvement over one without load balancing. Furthermore, JOVE gives a 24-fold speedup on 64 processors compared to sequential execution.
暂无评论