In this paper we review existing loop scheduling algorithms and also describe the feedback-guided dynamic loop scheduling (FGDLS) algorithm that was proposed in Bull et al. [2] and Bull [1]. the FGDLS algorithm uses a...
详细信息
ISBN:
(纸本)0769512607
In this paper we review existing loop scheduling algorithms and also describe the feedback-guided dynamic loop scheduling (FGDLS) algorithm that was proposed in Bull et al. [2] and Bull [1]. the FGDLS algorithm uses a feedback mechanism to schedule a parallel loop within a sequential outer loop. It has been shown to perform well for scheduling problems for which the load associated withthe parallel loop changes relatively slowly as the outer sequential loop executes. However the question of convergence of the FGDLS algorithm has remained an open question. In this paper we are able to establish sufficient conditions (essentially requiring that the workload does not change too rapidly with loop iteration count) for the (global) convergence of a continuous analogue of the feedback-guided algorithm.
this paper presents a methodology to design a distributed shared memory by decomposing it into two layers. An application independent layer supplies the basic functionalities to access shared structures and optimizes ...
详细信息
ISBN:
(纸本)3540422935
this paper presents a methodology to design a distributed shared memory by decomposing it into two layers. An application independent layer supplies the basic functionalities to access shared structures and optimizes these functionalities according to the underlying architecture. On top of this layer, that can be seen as an application independent run time support, an application dependent layer defines the most suitable consistency model for the considered class of applications and it implements the most appropriate caching and prefetching strategies for the consistency model. To exemplify this methodology, we introduce DVSA, a package that implements the application independent layer and SHOB, an example of the second layer. SHOB defines a release consistency model for iterative numerical algorithms and it implements the corresponding caching and prefetching strategies. We present some experimental results of the methodology and discuss the performance of a uniform multigrid method developed through SHOB on a massively parallel architecture, the Meiko CS2, and on a cluster of workstations.
Static scheduling is the temporal and spatial mapping of a program to the resources of a parallel system. Scheduling algorithms use the Directed Acyclic Graph (DAG) to represent the sub-tasks and the precedence-constr...
详细信息
A novel parallel architecture for estimating computationally intensive 4th-order cumulants is presented. Different from most systolic array implementations, a MIMD array processor is used to efficiently compute the cu...
详细信息
this paper addresses the problem of selecting the best object navigation strategy for a query to be run in parallel, when considering multiple alternative strategies. this is done by performing experiments withthree ...
详细信息
ISBN:
(纸本)0769512305
this paper addresses the problem of selecting the best object navigation strategy for a query to be run in parallel, when considering multiple alternative strategies. this is done by performing experiments withthree pointer-based join algorithms, running simple and complex queries in a parallel ODMG-compliant object database system. We believe our results provide insights into how physical properties of queries impact on the performance of different joins operators in a parallel environment.
We assume wlog that every teaming algorithm with membership and equivalence queries proceeds in rounds. In each round it puts in parallel a polynomial number of queries and after receiving the answers, it performs int...
详细信息
ISBN:
(纸本)354065013X
We assume wlog that every teaming algorithm with membership and equivalence queries proceeds in rounds. In each round it puts in parallel a polynomial number of queries and after receiving the answers, it performs internal computations before starting the next round. the query depth is defined by the number of rounds. In this paper we show that, assuming the existence of cryptographic one-way functions, for any fixed polynomial d(n) there exists a concept class that is efficiently and exactly learnable with membership queries in query depth d(n) + 1, but cannot be weakly predicted with membership and equivalence queries in depth d(n). Hence, concerning the query depth, efficient learning algorithms for this concept class cannot be parallelized. We also discuss applications to random-self-reductions and coherent sets. (C) 2001 Elsevier Science B.V. All rights reserved.
In this work we describe several portable sequential and parallelalgorithms for solving the inverse eigenproblem for Real Symmetric Toeplitz matrices. the algorithms are based on Newton’s method (and some variations...
详细信息
the proceedings contain 130 papers. the special focus in this conference is on parallelprocessing. the topics include: Software component technology for high performance parallel and grid computing;connecting computa...
ISBN:
(纸本)3540424954
the proceedings contain 130 papers. the special focus in this conference is on parallelprocessing. the topics include: Software component technology for high performance parallel and grid computing;connecting computational requirements with computing resources;a tool for binding to threads processors;a distributed object infrastructure for interaction and steering;optimal polling for latency-throughput tradeoffs in queue-based network interfaces for clusters;performance prediction of data-dependent task parallel programs;the hardware performance monitor toolkit;via communication performance on a gigabit Ethernet cluster;group-based performance analysis for multithreaded SMP cluster applications;exploiting unused time slots in list scheduling considering communication contention;an evaluation of partitioners for parallel SAMR applications;load balancing on networks with dynamically changing topology;approximation algorithms for scheduling independent malleable tasks;load redundancy elimination on executable code;using a swap instruction to coalesce loads and stores;data-parallel compiler support for multipartitioning;parallel and distributed databases, data mining and knowledge discovery;an experimental performance evaluation of join algorithms for parallel object databases;a classification of skew effects in parallel database systems;experiments in parallel clustering with DBSCAN;analysis of the cycle structure of permutations;scanning biosequence databases on a hybrid parallel architecture;experiences in using MPI-Io on top of GPFS for the IFS weather forecast code;improving conditional branch prediction on speculative multithreading architectures;performances of a dynamic threads scheduler and self-stabilizing neighborhood unique naming under unfair scheduler.
We have developed and evaluated two parallelization schemes for a tree-based k-means clustering method on shared memory machines. One scheme is to partition the pattern space across processors. We have determined that...
详细信息
In this paper, we propose a parallel I/O system utilizing parallel commodity network attached to multiple I/O processors of parallelprocessing systems. I/O requests from user application are automatically distributed...
详细信息
暂无评论