Task scheduling is very important for effcient execution of large-scale workflows in distributedcomputing environments. Static scheduling schemes achieve high performance when executing workflow in stable environment...
详细信息
ISBN:
(纸本)9780889868786
Task scheduling is very important for effcient execution of large-scale workflows in distributedcomputing environments. Static scheduling schemes achieve high performance when executing workflow in stable environments. However, the scheduling costs are very high for largescale workflow and they may perform poorly if the system performance is changed dynamically. Demand-driven scheduling schemes achieve high performance for independent tasks, but they may perform poorly for workflow. Dynamic rescheduling schemes have the advantages of these two types of schemes, because tasks are rescheduled using algorithms from static scheduling schemes when the performance is changed. Thus, better performance can be achieved in workflow applications, although the scheduling costs are increased if the system performance is frequently changed. Therefore, a new dynamic scheduling scheme is proposed to reduce the number of rescheduling tasks. The rescheduling trigger of this scheme is based on task terminations. Moreover, the scheme reduces the number of rescheduling tasks compared to a scheme using a simple task termination trigger by checking dependencies between tasks. Evaluation using an abstract simulation demonstrated that the number of rescheduling tasks was reduced to approximately 1/4 to 1/100 of that without, and with a 5% increase of the execution time.
parallel applications continue to suffer more from I/O latency as the rate of increase in computing power grows faster than that of memory and storage access performance. I/O prefetching is an effective solution to hi...
详细信息
parallel applications continue to suffer more from I/O latency as the rate of increase in computing power grows faster than that of memory and storage access performance. I/O prefetching is an effective solution to hide the latency, yet existing I/O prefetching techniques are conservative and their effectiveness is limited. A pre-execution prefetching approach, whereby a thread dedicated to read operations is executed ahead of main thread in order to hide I/O latency, has been put forward to solve this "I/O wall" problem in a recent work. We first identify the limitation of applying the existing pre-execution prefetching approach due to read after write (RAW) dependency, and then propose a method to overcome this limitation by assigning a thread for each dependent read operation. Preliminary experiments, including one from Hill encryption as a real-life application, verify the benefits of the proposed approach.
Emerging multiple display infrastructures provide users with a large number of semi-public and private displays. Selecting what information to present on which display here becomes a real issue, especially when multip...
详细信息
ISBN:
(纸本)9780889866386
Emerging multiple display infrastructures provide users with a large number of semi-public and private displays. Selecting what information to present on which display here becomes a real issue, especially when multiple users with diverging interests have to be considered. This especially holds for dynamic ensembles of displays. We propose to cast the Display Mapping problem as an optimization task. We develop an explicit criterion for the global quality of a display mapping and then describe a distributed algorithm based on the GRASP framework that is able to approximate the global optimum through local interaction between display devices. We claim that such a distributed optimization approach, based on the definition of an explicit global quality measure, is a general concept for achieving coherent ensemble behavior.
For search-intensive applications such as data mining and bioinformatics, a SIMD Processor Array on a Chip may be an effective architecture, and if the application is control-intensive, a Multiple SIMD (MSIMD) archite...
详细信息
ISBN:
(纸本)9780889866386
For search-intensive applications such as data mining and bioinformatics, a SIMD Processor Array on a Chip may be an effective architecture, and if the application is control-intensive, a Multiple SIMD (MSIMD) architecture may further increase processor utilization. In this paper, we describe the implementation of an associative MSIMD architecture on the MASC Processor. The MASC Processor implemented using FPGAs, is easily scalable, and dynamically assigns tasks to Processing Elements as the program executes.
Volunteer computing is an innovative approach to high performance computing that relies on volunteers who donate their personal computers' unused resources to a computationally intensive research project. Prominen...
详细信息
ISBN:
(纸本)9780889866386
Volunteer computing is an innovative approach to high performance computing that relies on volunteers who donate their personal computers' unused resources to a computationally intensive research project. Prominent volunteer computing projects include SETI@home, Folding@Home, and The Great Internet Mersenne Prime Search (GIMPS). Many volunteer computing projects are built upon a volunteer computing framework that abstracts functionality that is common to all volunteer computing projects, such as network communications, database access, and project management. These volunteer computing frameworks tend to be complex, limiting, and difficult to use. We have designed and implemented a new volunteer computing framework called the Simple Light-weight Infrastructure for Network computing (SLINC) that addresses the disadvantages we identified with existing frameworks. SLINC is a flexible and extensible volunteer computing framework that will enable researchers to more easily build volunteer computing projects.
A recent study characterizing failures in computer networks shows that transient single element (node/link) failures are the dominant failures in large communication networks like the Internet. Thus, having the routin...
详细信息
ISBN:
(纸本)9780889868113
A recent study characterizing failures in computer networks shows that transient single element (node/link) failures are the dominant failures in large communication networks like the Internet. Thus, having the routing paths globally recomputed on a failure does not pay off since the failed element recovers fairly quickly, and the recomputed routing paths need to be discarded. In this paper, we present the first distributed algorithm that computes the alternate paths required by some proactive recovery schemes for handling transient failures. Our algorithm computes paths that avoid a failed node, and provides an alternate path to a particular destination from an upstream neighbor of the failed node. With minor modifications, we can have the algorithm compute alternate paths that avoid a failed link as well. To the best of our knowledge all previous algorithms proposed for computing alternate paths are centralized, and need complete information of the network graph as input to the algorithm.
UnaGrid is an opportunistic virtual grid infrastructure that takes advantage of the idle processing capabilities of conventional desktop machines in computer labs through the use of Customizable Processing Virtual Clu...
详细信息
ISBN:
(纸本)9780889869073
UnaGrid is an opportunistic virtual grid infrastructure that takes advantage of the idle processing capabilities of conventional desktop machines in computer labs through the use of Customizable Processing Virtual Clusters (CPVCs), these capabilities are used in the development of e-Science projects. Up to now, however, a dedicated NFS-NAS solution is employed as the storage system. This solution does not take advantage of the idle storage capabilities of the desktop machines. This paper presents the design, implementation and assessment of a virtual storage system, which simultaneously allows UnaGrid to take advantage of the storage and processing capabilities available in tens of desktop machines. The tests executed show that the strategy used to create the virtual storage system achieves large storage capabilities, at low cost, and superior performance than a NFS-NAS dedicated solution.
A large number of tasks in distributedsystems are based on the fundamental problem of tracing the causal dependencies among the events that characterize a run of the computation. This problem is commonly solved by ap...
详细信息
ISBN:
(纸本)9780889866379
A large number of tasks in distributedsystems are based on the fundamental problem of tracing the causal dependencies among the events that characterize a run of the computation. This problem is commonly solved by appliance of vector clocks as a means of capturing the flow of information within and among distributed processes. In the paper at hand a new kind of logical clock concept is presented and examined that is meant to overcome the vector clocks' great drawback: that the number of processes in the distributed system has to be constant and known in advance. Tree clocks are designed to naturally and efficiently scale with the dynamic creation and termination of processes without losing their primary functionality, such as causality tracing, event ordering, and gap detection. In most aspects, they even are more efficient than vector clocks.
Since FPGAs are more flexible than general purpose processors, they are used for accelerating the computation of algorithms. Singular value decomposition is the factorization of a matrix, useful in computations execut...
详细信息
ISBN:
(纸本)9780889869073
Since FPGAs are more flexible than general purpose processors, they are used for accelerating the computation of algorithms. Singular value decomposition is the factorization of a matrix, useful in computations executed for signal processing and pattern recognition. This paper deals with the hardware implementation of the singular value decomposition of a given matrix. The used algorithm is based on a compact SVD presented by A. O. Tarakanov. That algorithm compared with others showed to be resource efficient and straightforward for being implemented in hardware. It has been programmed in VHDL and implemented in an FPGA trying to find a compromise among resources, run time, precision and parallelism. The VHDL module uses floating point numbers which bit width can be modified for tuning the precision of the results. Conventional higher level languages for the development of hardware do not allow working with floating point numbers and give rather a less optimal design as the evaluation at the end shows.
Timeliness is an important issue for video based surveillance and is often quantified by the delay between the time of availability of image frames from cameras and completion of their processing. Most existing commer...
详细信息
ISBN:
(纸本)9780889867741
Timeliness is an important issue for video based surveillance and is often quantified by the delay between the time of availability of image frames from cameras and completion of their processing. Most existing commercial video surveillance systems focus on the issues of efficient storage and retrieval, remote monitoring, data streaming, forensics and limited real-time analysis - but not explicitly on the timeliness issues of large scale online analysis vis-a-vis resource utilization. In this paper we present a new load distribution strategy for on-line, large scale video data processing clusters that are used as an aid to manual surveillance. We propose a novel approach for fine grained load balancing, modeled as a minimization of average completion time problem. The proposed approach is robust in the sense that it is not dependent on the estimates of future loads or on the worst case execution requirements of the video processing load. Simulation results with real-life video surveillance data establish that for a desired timeliness in processing the data, our approach reduces the number of compute nodes by more than a factor of two, compared to systems without the load migration heuristics.
暂无评论