Most parallel computing resources are controlled by batch schedulers that place requests for computation in a queue until access to compute nodes are granted. Queue waiting times are notoriously hard to predict, makin...
详细信息
Most parallel computing resources are controlled by batch schedulers that place requests for computation in a queue until access to compute nodes are granted. Queue waiting times are notoriously hard to predict, making it difficult for users not only to estimate when their applications may start, but also to pick among multiple batch-scheduled resources the one that produce the shortest turnaround time. As a result, an increasing number of users resort to "redundant requests": several requests are simultaneously submitted to multiple batch schedulers on behalf of a single job; once one of these requests is granted access to compute nodes, the others are canceled. Using simulation as well as experiments with a production batch scheduler we investigate whether redundant requests are harmful in terms of (i) schedule performance and fairness, (ii) system load, and (iii) system predictability. We find that two main issues with redundant requests are load on the middleware and unfairness towards users who do not use redundant requests, which both depend on the number of users who use redundant requests and on the amount of request redundancy these users employ
It is often difficult to perform efficiently a collection of jobs with complex job dependencies due to temporal unpredictability of the grid. One way to mitigate the unpredictability is to schedule job execution in a ...
详细信息
It is often difficult to perform efficiently a collection of jobs with complex job dependencies due to temporal unpredictability of the grid. One way to mitigate the unpredictability is to schedule job execution in a manner that constantly maximizes the number of jobs that can be sent to workers. A recently developed scheduling theory provides a basis to meet that optimization goal. Intuitively, when the number of such jobs is always large, high parallelism can be maintained, even if the number of workers changes over time in an unpredictable manner. In this paper we present the design, implementation, and evaluation of a practical scheduling tool inspired by the theory. Given a DAGMan input file with interdependent jobs, the tool prioritizes the jobs. the resulting schedule significantly outperforms currently used schedules under a wide range of system parameters, as shown by simulation studies. For example, a scientific data analysis application, AIRSN, was executed at least 13% faster with 95% confidence. An implementation of the tool was integrated withthe Condor high-throughput computing system
parallel production or rule-based systems, like a parallel version of Jess, are needed for real applications. the proposed architecture for a such system is based on a wrapper allowing the cooperation between several ...
详细信息
ISBN:
(纸本)0769524346
parallel production or rule-based systems, like a parallel version of Jess, are needed for real applications. the proposed architecture for a such system is based on a wrapper allowing the cooperation between several instances Of Jess running on different computers. the system has been designed having in mind the final goal to speedup current P system simulators. Preliminary tests show its efficiency it? this particular case and on classical benchmarks.
Large-scale scientific computing applications frequently make use of closely-coupled distributedparallel components. the performance of such scientific applications is therefore dependent on the component parts and t...
详细信息
ISBN:
(纸本)0769523129
Large-scale scientific computing applications frequently make use of closely-coupled distributedparallel components. the performance of such scientific applications is therefore dependent on the component parts and their interaction at run-time. this paper describes a methodology for predictive performance modelling of parallelapplications composed of multiple interacting components. In this paper, the fundamental steps and required operations involved in the modelling process are identified - including inter-component dataflow analysis, MxN communication performance evaluation and composite performance model evaluation. A case study is presented to illustrate the modelling process and the methodology is verified through experimental analysis.
this paper describes a parallel algorithm for correlating or "fusing" streams of data from sensors and other sources of information. the algorithm is useful for applications where composite conditions over m...
详细信息
ISBN:
(纸本)0769523129
this paper describes a parallel algorithm for correlating or "fusing" streams of data from sensors and other sources of information. the algorithm is useful for applications where composite conditions over multiple data streams must be detected rapidly, such as intrusion detection or crisis management. the implementation of this algorithm on a multithreaded system and the performance of this implementation are also briefly described.
A programmable Java distributed system, which utilises the free resources of a heterogeneous set of computers linked together by a network, has been developed. the system has been successfully deployed on over 200 com...
详细信息
ISBN:
(纸本)0769523129
A programmable Java distributed system, which utilises the free resources of a heterogeneous set of computers linked together by a network, has been developed. the system has been successfully deployed on over 200 computers, which were distributed over a number of locations, and has been successfully used to process bioinformatics, biomedical engineering, and cryptography applications. We present two bioinformatics applications, DSEARCH, which performs sensitive database and DPRml which performs distributed phytogeny reconstruction by maximum likelihood.
We are interested in discovering the intrinsic dynamics of parallelapplications, which are independent of runtime environment, to aid in the development of appropriate tuning policies, especially dynamic load balanci...
详细信息
ISBN:
(纸本)0769523129
We are interested in discovering the intrinsic dynamics of parallelapplications, which are independent of runtime environment, to aid in the development of appropriate tuning policies, especially dynamic load balancing policies. Based on the novel idea of profiling mesh-based applications at a fine granularity of each mesh element, this paper proposes a synthetic application simulator which is driven by a series of application signatures mapping to the mesh structure. By integrating the ZOLTAN library into the system, our simulator provides a convenient test bed for developing and evaluating load balancing policies.
Using a large HPC platform, we investigate the effectiveness of "symbiotic space-sharing", a technique that improves system throughput by executing parallelapplications in combinations and configurations th...
详细信息
ISBN:
(纸本)1424403073
Using a large HPC platform, we investigate the effectiveness of "symbiotic space-sharing", a technique that improves system throughput by executing parallelapplications in combinations and configurations that alleviate pressure on shared resources. We demonstrate that relevant benchmarks commonly suffer a 10-60% penalty in runtime efficiency due to memory resource bottlenecks and up to several orders of magnitudefor I/O. We show that this penalty can be often mitigated, and sometimes virtually eliminated, by symbiotic space-sharing techniques and deploy a prototype scheduler that leverages these findings to improve system throughput by 20%.
Scheduling is a fundamental issue in achieving high performance on metacomputers and computational grids. For the first time, the job scheduling problem for grid computing on metacomputers is studied as a combinatoria...
详细信息
ISBN:
(纸本)0769523129
Scheduling is a fundamental issue in achieving high performance on metacomputers and computational grids. For the first time, the job scheduling problem for grid computing on metacomputers is studied as a combinatorial optimization problem. It is proven that the list scheduling algorithm can achieve reasonable worst-case performance bound in grid environments supporting distributed supercomputing with large applications. It is also observed that communication heterogeneity does have significant impact on schedule lengths.
the parallel-Horus framework, developed at the University of Amsterdam, is a unique software architecture that allows non-expert parallel programmers to develop fully sequential multimedia applications for efficient e...
详细信息
ISBN:
(纸本)0769523129
the parallel-Horus framework, developed at the University of Amsterdam, is a unique software architecture that allows non-expert parallel programmers to develop fully sequential multimedia applications for efficient execution on homogeneous Beowulf-type commodity clusters. Previously obtained results for realistic, but relatively small-sized applications have shown the feasibility of the parallel-Horus approach, withparallel performance consistently being found to be optimal with respect to the abstraction level of message passing programs. In this paper we discuss the most serious challenge parallel-Horus has had to deal with so far: the processing of over 184 hours of video included in the 2004 NIST TRECVID evaluation, i.e. the de facto international standard benchmark for content-based video retrieval. Our results and experiences confirm that parallel-Horus is a very powerful support-tool for state-of-the-art research and applications in multimedia processing.
暂无评论