Scheduling the execution of computing tasks for heterogeneous computingsystems is a widely studied problem in the field of parallel and distributedcomputing. Many algorithms belong to list scheduling algorithms in w...
详细信息
ISBN:
(纸本)9780889867741
Scheduling the execution of computing tasks for heterogeneous computingsystems is a widely studied problem in the field of parallel and distributedcomputing. Many algorithms belong to list scheduling algorithms in which tasks are scheduled sequentially in the order of their pre-assigned priorities. The determination of task priorities is typically based on problem-specific heuristics, which is critical to the performance of a list scheduling algorithm. We design a list scheduling algorithm for heterogeneous computingsystems in which task priorities are determined by both the completion time and upward rank of a task. We extend the notion of upward ranks used in HEFT and our method of calculating a task's upward rank improves over the method used in HEFT, with the inclusion of additional domain knowledge embedded in scheduling problems. As a result, more accurate estimation of the execution time of remaining tasks can be achieved. Experimental results on benchmark task graphs show that our algorithm consistently outperforms HEFT with higher execution speedups.
We present a scalable software architecture for distributed direct volume rendering on HPC systems. Our approach allows for generically replacing components along the distributed volume rendering pipeline. Renderer co...
详细信息
We present a scalable software architecture for distributed direct volume rendering on HPC systems. Our approach allows for generically replacing components along the distributed volume rendering pipeline. Renderer components range from highly specialized GPU Tenderers that implement state of the art features to more versatile remote Tenderers, that can make use of numerous distributed memory nodes to exploit sort-last parallel rendering, with each node running a generic renderer component itself. Renderer components that are designed to run on CPUs or GPUs respectively make our software architecture most useful for HPC systems. Using zero configuration networking, our system is able to scale at run time by introducing additional resources without having to reset the whole cluster. Generic I/O subsystems allow for various interprocess communication technologies to be used interchangeably, while generalizing the display phase and decoupling it from the rendering and I/O phases can be exploited to hide latency. We integrate our proposed software architecture into the freely available open source direct volume rendering library Virvo.
Current visual text mining platforms are still focused on small or medium-scale datasets and sequential algorithms. However, as document collections increase in size and complexity, more computing resources are requir...
详细信息
ISBN:
(纸本)9780889867741
Current visual text mining platforms are still focused on small or medium-scale datasets and sequential algorithms. However, as document collections increase in size and complexity, more computing resources are required in order to achieve the expected interactive experience. In order to address the scalability problem, this paper proposes and evaluates parallel implementations for three critical visual text mining algorithms. Experiments with the parallel solutions were conducted for varying dataset sizes and different numbers of processors. The results show a good speedup for the proposed solutions and indicate the potential benefits of exploring task parallelism in critical algorithms to improve scalability of an interactive visual text mining platform.
parallel applications continue to suffer more from I/O latency as the rate of increase in computing power grows faster than that of memory and storage access performance. I/O prefetching is an effective solution to hi...
详细信息
parallel applications continue to suffer more from I/O latency as the rate of increase in computing power grows faster than that of memory and storage access performance. I/O prefetching is an effective solution to hide the latency, yet existing I/O prefetching techniques are conservative and their effectiveness is limited. A pre-execution prefetching approach, whereby a thread dedicated to read operations is executed ahead of main thread in order to hide I/O latency, has been put forward to solve this "I/O wall" problem in a recent work. We first identify the limitation of applying the existing pre-execution prefetching approach due to read after write (RAW) dependency, and then propose a method to overcome this limitation by assigning a thread for each dependent read operation. Preliminary experiments, including one from Hill encryption as a real-life application, verify the benefits of the proposed approach.
The development of Grid environment provides an efficient way for implementing the parallel algorithm. Here we focus on the discovery of short recurring patterns in DNA sequences that represent binding sites for certa...
详细信息
The development of Grid environment provides an efficient way for implementing the parallel algorithm. Here we focus on the discovery of short recurring patterns in DNA sequences that represent binding sites for certain proteins in the process of gene regulation. We develop parallelcomputing algorithm for solving this problem by partitioning the overall DNA sequences. This provides us an option to implement the algorithm on Grid and to evaluate its performance. We use the open source software "Alchemi" for building the Grid environment. Also by using a merging repeat approach to find these patterns, we address other related problems, including finding of repetition with insertion and deletion.
In engineering applications often real-valued performance functions must be optimized. If the performance function is discontinuous, there exist no derivations with respect to the parameters that should be optimized. ...
详细信息
ISBN:
(纸本)9780889867741
In engineering applications often real-valued performance functions must be optimized. If the performance function is discontinuous, there exist no derivations with respect to the parameters that should be optimized. Thus, we can only use undirected search methods. In contrast to simple trial-and-error approaches, evolutionary strategies can extract population-based information that guide the search process and help to improve the quality of the results. The paper describes the parallel implementation of an state-of-the-art evolutionary strategy which uses the covariance adaptation operator. It proposes a communication topology for maintaining selective pressure and a master-slave scheme for fault tolerance in distributed environments which are composed of volatile resources. Experimental results demonstrate how the implementation can be adapted to specific needs and how the parallel implementation behaves in the case of resource failures.
In this paper, we define a concept called Eventually Consistent Transaction, which is the transaction in eventual consistency. We also show techniques for achieving Eventually Consistent Transactions. We have implemen...
详细信息
ISBN:
(纸本)9780889868786
In this paper, we define a concept called Eventually Consistent Transaction, which is the transaction in eventual consistency. We also show techniques for achieving Eventually Consistent Transactions. We have implemented Eventually Consistent Transactions in a distributed key-value store. Based our evaluation, the throughput of Eventually Consistent Transaciion could scale to 48 nodes.
Currently, middleware systems for Grid computing like gLite do not integrate knowledge about data availability into the scheduling process. That is, data may be on tertiary storage, e.g. hierarchical storage managemen...
详细信息
ISBN:
(纸本)9780889867741
Currently, middleware systems for Grid computing like gLite do not integrate knowledge about data availability into the scheduling process. That is, data may be on tertiary storage, e.g. hierarchical storage management, and first access is delayed. To this end, we investigate the gLite middleware software stack and highlight those services which could be enhanced to increase system utilization. Here, we focus on modifications of the workload management system and the computing element. As the computing element depends on a resource management system, we discuss the impact of file systems on the scheduling process. Optimizing the LHCb job workflow, we design a scheduling strategy which respects those constraints. Conceptually, this strategy performs a co-allocation of data movements concurrently to computational processes. Based on a workload trace from the parallel Workload Archive we are able to present improvements in terms of system utilization and average weighted response time.
A speech recognition front-end is a digital signal processing device used to transform an audio signal into feature vectors used for Automatic Speech Recognition or storage of semantic audio information. The complete ...
详细信息
ISBN:
(纸本)9780889868113
A speech recognition front-end is a digital signal processing device used to transform an audio signal into feature vectors used for Automatic Speech Recognition or storage of semantic audio information. The complete implementation of this device does not fit in the fabric of the FPGA of the used development board. Exploiting the inherent parallelism seen in the design of the device, the redesigning using a developed library of algorithmic skeletons and the use of dynamic partial reconfiguration have made possible to fit the device into the used FPGA.
To achieve good parallel efficiency, applications using structured adaptive mesh refinement (SAMR) need to repeatedly repartition and redistribute the underlying dynamic grid hierarchy. However, no single partitioner ...
详细信息
ISBN:
(纸本)9780889867741
To achieve good parallel efficiency, applications using structured adaptive mesh refinement (SAMR) need to repeatedly repartition and redistribute the underlying dynamic grid hierarchy. However, no single partitioner works well for all application and computer states. This paper presents the implementation and evaluation of a patch-based partitioner for SAMR grid hierarchies. The partitioner results in a good and stable load balance, on average 3.1%. Space-filling curves are used to reduce the high communication volumes that are inherent in this type of partitioner. The partitioner will be a part of the Meta-Partitioner, a partitioning framework that automatically selects, configures and invokes good-performing partitioners for general SAMR applications. Access to a large number of complementing partitioners is essential for the Meta-Partitioner. The presented partitioner will help to significantly decrease run-times for SAMR applications where load balance is the main priority.
暂无评论