In this paper, we introduce a framework for load balancing using mobile agent named EALBMA(Efficient and Adaptive Load Balancing based on Mobile Agent) firstly. The framework can resolve some problems in traditional l...
详细信息
In this paper, we introduce a framework for load balancing using mobile agent named EALBMA(Efficient and Adaptive Load Balancing based on Mobile Agent) firstly. The framework can resolve some problems in traditional load balancing, including structure of system, updating load information, and adjusting strategies of load balancing. Secondly, the paper analyses some traditional algorithms for updating load information and their disadvantages. Then, aiming at these disadvantages, we propose a novel algorithm for updating load information partially based on mobile agent called ULIMA. Finally, from our simulation experiment results, we draw conclusions that it is reasonable and feasible to introduce mobile agent to load balancing, and the performance of ULIMA is improved.
OpenMP is an informal industry standard for programming parallel computers with a shared memory and has during the last few years achieved considerable acceptance in both the academic world and the industry. OpenMP is...
详细信息
OpenMP is an informal industry standard for programming parallel computers with a shared memory and has during the last few years achieved considerable acceptance in both the academic world and the industry. OpenMP is a thread-level fork-join programming model and relies on a set of compiler directives. An OpenMP aware compiler uses these directives to generate a multi-threaded application. In practice, an OpenMP run-time library is also needed as OpenMP specifies a set of run-time library calls. In this paper we report on a free OpenMP compiler and run-time library infrastructure. We present an OpenMP compiler for C called OdinMP and briefly discuss the run-time library that the compiler targets. The source code to both the compiler and the run-time libraries are available and can be freely used for OpenMP research. The compilation system is evaluated using the EPCC micro-benchmark suite for OpenMP and a set of applications from the SPLASH-2 benchmarks suite ported to OpenMP. Comparisons are made to OpenMP aware compiler systems from SGI and Intel. The performance of code generated with the presented compilation system is shown to be very close to or exceeding that of commercial compilers for a wide range of benchmark applications.
As distributedsystems continue to evolve, automatic resource management is becoming more and more important. The resource management system must be able to dynamically handle large heterogeneous systems in a way that...
详细信息
As distributedsystems continue to evolve, automatic resource management is becoming more and more important. The resource management system must be able to dynamically handle large heterogeneous systems in a way that gives good performance and resource utilization. In the performance context, allocating software modules to nodes in an efficient way is of high interest. This paper considers the problem of allocating software modules to processing nodes in an automatic dynamic manner using module migration algorithms. The module allocation problem is NP-complete and many heuristics have been proposed. However, in systems where the workload changes over time, it may be infeasible to update module allocation often enough to handle changes in workload. This paper presents the Match-maker algorithm that performs load balancing by pairing overloaded nodes with under-loaded ones, initiating module migration within the pair. The paper presents a load balancing optimization problem, and uses the benchmark problem to evaluate the algorithm. In addition, the Match-maker algorithm is compared with other previously described algorithms for module migration. The Matchmaker algorithm is found to be fast and efficient in reducing load imbalance in distributedsystems, especially for large systems.
HPL is a Linpack benchmark package widely used in massive cluster system performance test. Based on indepth analysis of the blocked parallel solution algorithm of linear algebra equations and HPL implementation mechan...
详细信息
HPL is a Linpack benchmark package widely used in massive cluster system performance test. Based on indepth analysis of the blocked parallel solution algorithm of linear algebra equations and HPL implementation mechanics, the restriction factors of the HPL peak performance are probed. The influence of main parameters P×Q and NB on computing performance in LU factorization process is discussed especially. Theory analysis and experiment results indicate that, the efficiency factor, defined as the ratio of the matrix operation time and the amount of operations, is relevant to the matrix block size to a great extent, yet little correlate to the matrix size itself. According to this law, the author proposes to determine the block size NB of massively parallel test through scanning the operation efficiency of proper small-scale matrix. Thus it improves the status getting NB through trial-and-error experiment without definite aim. The author's idea has been verified by the real test results, as well as been applicable to the matrix parallel operations of other courses.
Communications and the scheduling of tasks are the two most important issues of parallel programming on clusters. Over time, various parallelcomputing programming models such as remote threads, transparent process mi...
详细信息
Communications and the scheduling of tasks are the two most important issues of parallel programming on clusters. Over time, various parallelcomputing programming models such as remote threads, transparent process migration, message passing, distributed shared memory and optimizing parallel compilers have emerged, assisting the programmer to develop applications which can work seamlessly in such environments. Considerable research has been done to optimize these models, which typically have large communications overheads, resulting in a detrimental effect on performance. In addition to this, their acceptance has varied by virtue of the fact that each has introduced new problems with reference to portability, scalability and usability. Sometimes these problems completely violate the underlying notion of such computing. To overcome these issues Pshell, which provides transparent scheduling and communication of jobs between disparate hosts, has been developed. Pshell is the 'glue' for producing high-performance parallel applications that can work securely and efficiently in heterogeneous environments. It represents a major shift in traditional parallel programming environments because it is a language using the syntax of the Bourne shell sh(l). The Bourne shell process and communications models can easily be extended to parallelcomputing environments using the concurrent programming model. This paper will examine the evolution of cluster computing and identify where deficiencies lie in current programming models, providing a justification for simple languages while providing the reader with an understanding of the Pshell programming environment. In addition to this, the paper illustrates how such a language can be used to ease the process of writing parallel applications and to overcome some of the limitations inherent in traditional programming models without sacrificing performance.
systems involving multiple processes are often most easily programmed using critical regions. Mutual exclusion is the dilemma of assurance that certain portions of program code are executed within the critical regions...
详细信息
systems involving multiple processes are often most easily programmed using critical regions. Mutual exclusion is the dilemma of assurance that certain portions of program code are executed within the critical regions, where no two programs are permitted to be in critical regions at the same time. So, arranging mutual exclusion plays a significant role in the domain of both centralized and distributedsystems. Unfortunately, all the three basic approaches - centralized, distributed and token ring proposed for achieving mutual exclusion in distributedsystems are said to be good for distributedsystems in some abstract way only. In this paper, we have presented a ring-based algorithm for arranging mutual exclusion in distributedsystems. We have compared the conventional algorithms with the algorithm presented in this paper and shown that our algorithm is more fault-tolerant than all the traditional algorithms and requires less number of message-passing with reduced amount of load on coordinator.
This paper discusses some of the salient issues involved in implementing the illusion of a shared-memory programming model across a group of distributed memory processors from a cluster through to an entire Grid. This...
详细信息
This paper discusses some of the salient issues involved in implementing the illusion of a shared-memory programming model across a group of distributed memory processors from a cluster through to an entire Grid. This illusion can be provided by a distributed shared memory (DSM) runtime system. Mechanisms that have the potential to increase the performance by minimizing high-latency intra site messages & data transfers are highlighted. Relaxed consistency models are investigated, as well as the use of a grid information system to ascertain topology information. The latter allows for hierarchy-aware management of shared data and synchronization variables. The process of incremental hybridization is also explored, where more efficient message-passing mechanisms can incrementally replace DSM actions when circumstances dictate that performance improvements can be obtained. In this paper we describe the overall design/architecture of a prototype system, SMG, which integrates DSM and message passing paradigms and may be the target of an OpenMP compiler. Our initial findings based on some trivial examples indicate some of the potential benefits that can be obtained for grid Applications.
Two new algorithms are proposed for scheduling tasks in heterogeneous distributedsystems where tasks may have arbitrary dependencies and estimates of execution time are imperfect. A new Dynamic Adaptive Random Schedu...
详细信息
Two new algorithms are proposed for scheduling tasks in heterogeneous distributedsystems where tasks may have arbitrary dependencies and estimates of execution time are imperfect. A new Dynamic Adaptive Random Scheduler (DARS) algorithm is proposed. DARS uses a randomized ordering technique to search for a near optimum schedule while monitoring the status of each previously dispatched task such that tasks are migrated according to the current estimates of task execution time. A proposed new Load Balancing (LB) algorithm distributes tasks initially by a heuristic that accounts for heterogeneity and dependencies, and while tasks are executing it migrates tasks to rebalance the load whenever a host becomes idle such that the number of migrations is minimized and each migration accounts for dependencies. The algorithms were evaluated by simulation of various scenarios and for various levels of uncertainty in execution time estimated values. The best algorithm is highly dependent on the nature of the load and the level of uncertainty in the execution time estimates.
This paper presents a design of a reconfigurable parallel prefix computation hardware on FPGAs. The design is based on a pipelined dataflow algorithm, and control logics are added to reconfigure the system for the arb...
详细信息
This paper presents a design of a reconfigurable parallel prefix computation hardware on FPGAs. The design is based on a pipelined dataflow algorithm, and control logics are added to reconfigure the system for the arbitrary parallelism degree. The design receives multiple input streams of elements in parallel, and produces output streams in parallel. The design has the great advantage of controlling the degree of parallelism explicitly at run time. Time complexity of the design is O(d + (N-d) / d), where d and N are parallelism degree and stream size, respectively. When the stream size is very big, the initial trigger time d in the time complexity can be ignored and we get O(N/d). Unlike the software parallel prefix algorithms, the design works for both known and unknown sized data. The design is implemented and tested for target devices Xilinx Spartan2S XC2S300EFT256-6Q and XC2S600EFG676-6.
Using distributedcomputing to harness idle CPU cycles from PCs across the Internet is an economically attractive solution for solving many problems. Research estimates that the average wasted idle time for Internet c...
详细信息
Using distributedcomputing to harness idle CPU cycles from PCs across the Internet is an economically attractive solution for solving many problems. Research estimates that the average wasted idle time for Internet connected PCs is over 90%. If a fraction of these wasted cycles could be harnessed and presented in a reliable and consistent manner, they could be traded as a commodity. Consumers in this market would expect Service Level Agreements (SLA) with guarantees on Quality of Service. Providers in this market would need to bring consistency and stability to this highly volatile environment. In this paper we explore this topic by presenting SLA contracts for purchasing idle CPU units from a PC Grid, describing the required QoS parameters, and demonstrating how the use of fault-tolerant probabilistic task scheduling with statistical inference could be used to bring stability and consistency to an Internet PC Grid.
暂无评论