We study the parallel scheduling problem for a new modality of parallel computing: having one workstation ''steal cycles'' from another. We focus on a draconian mode of cycle-stealing, in which the own...
详细信息
We study the parallel scheduling problem for a new modality of parallel computing: having one workstation ''steal cycles'' from another. We focus on a draconian mode of cycle-stealing, in which the owner of workstation a allows workstation A to take control of B's processor whenever it is idle, with the promise of relinquishing control immediately upon demand. The typically high communication overhead for supplying workstation B with work and receiving its results militates in favor of supplying B with large amounts of work at a time;the risk of losing work in progress when the owner of B reclaims the workstation militates in favor of supplying B with a sequence of small packets of work. The challenge is to balance these two pressures in a way that maximizes the amount of work accomplished. We formulate two models of cycle-stealing. The first attempts to maximize the expected work accomplished during a single episode, when one knows the probability distribution of the return of B's owner. The second attempts to match the productivity of an omniscient cycle-stealer, when one knows how much work that stealer can accomplish. We derive optimal scheduling strategies for sample scenarios within each of these models. Perhaps our most important discovery is the as-yet unexplained coincidence that two quite distinct scenarios lead to almost identical unique optimizing schedules. One scenario falls within our first model;it assumes that the probability of the return of Bs owner is uniform across the lifespan of the episode;the optimizing schedule maximizes the expected amount of work accomplished during the episode. The other scenario falls within our second model;it assumes that B's owner will interrupt our cycle-stealing at most once during the lifespan of the opportunity;the optimizing schedule maximizes the amount of work that one is guaranteed to accomplish during the lifespan.
In this digital world, more than 90% of desktop and notebook computers have integrated Graphics Processing Units i.e. GPU's, for better graphics processing. Graphics Processing Unit is not only for graphics applic...
详细信息
ISBN:
(纸本)9781479928996
In this digital world, more than 90% of desktop and notebook computers have integrated Graphics Processing Units i.e. GPU's, for better graphics processing. Graphics Processing Unit is not only for graphics applications, even for nongraphics applications too. In the past few years, the graphics programmable processor has evolved into an increasingly convincing computational resource. But GPU sits idle if graphics job queue is empty, which decreases the GPU's efficiency. This paper focuses on various tact to overcome this problem and to make the CPU-GPU processing more powerful and efficient. The graphics programmable processor or Graphics processing unit is especially well suited to address problem sets expressed as data parallel computation with the same program executed on many data elements concurrently. The objective of this paper is to increase the capabilities and flexibility of recent GPU hardware combined with high level GPU programming languages: to accelerate the building of images in a frame buffer intended for output to a display, and, to provide tremendous acceleration for numerically intensive scientific applications. This paper also gives some light on major applicative areas where GPU is in use and where GPU can be used in future.
Associative computation is characterized by the in-tertwining of search by content and dataparallel com-putation. This intertwining facilitates the integration of knowledge retrieval and data parallel computation. Th...
详细信息
Load balancing and data locality are the two most important factors affecting the performance of parallel programs running on distributed-memory multiprocessors. A good balancing scheme should evenly distribute the wo...
详细信息
Load balancing and data locality are the two most important factors affecting the performance of parallel programs running on distributed-memory multiprocessors. A good balancing scheme should evenly distribute the workload among the available processors, and locate the tasks close to their data to reduce communication and idle time. In this paper, we study the load balancing problem of data-parallel loops with predictable neighborhood data references. The loops are characterized by variable and unpredictable execution time due to dynamic external workload. Nevertheless the data referenced by each loop iteration exploits spatial locality of stencil references. We combine an initial static BLOCK. scheduling and a dynamic scheduling based on work stealing. data locality is preserved by careful restrictions on the tasks that can be migrated. Experimental results on a network of workstations are reported.
暂无评论