Duplication has proved to be a vital technique for scheduling task graphs on a network of unrelated parallel machines. Few attempts have been made to model duplication in a Mixed Integer Linear Program (MILP) to reduc...
详细信息
ISBN:
(纸本)9780769548982;9781467345668
Duplication has proved to be a vital technique for scheduling task graphs on a network of unrelated parallel machines. Few attempts have been made to model duplication in a Mixed Integer Linear Program (MILP) to reduce schedule length. Other known optimal MILPs duplicate a job on all the available processing elements and this increases their complexities. This paper proposes a new REStricted Duplication (RESDMILP) approach to model duplication in a MILP. The complexity of this model increases with the increase in the amount of duplication. Experiments conducted have revealed that RESDMILP achieves better runtimes when the problem instance is solved optimally and provides better lower bound and percentage gap if it is run for a fixed amount of time. The percentage gap is defined as, (UB-LB)/UB where UB and LB are the upper and lower bounds achieved by the MILPs respectively.
作者:
Mavriplis, DJNASA
Inst Comp Applicat Sci & Engn Langley Res Ctr Hampton VA 23681 USA
The implementation and performance of a hybrid OpenMP/MPI parallel communication strategy for an unstructured mesh computational fluid dynamics code is described. The solver is cache efficient and fully vectorizable, ...
详细信息
The implementation and performance of a hybrid OpenMP/MPI parallel communication strategy for an unstructured mesh computational fluid dynamics code is described. The solver is cache efficient and fully vectorizable, and is parallelized using a two-level hybrid MPI-OpenMP implementation suitable for shared and/or distributed memory architectures, as well as clusters of shared memory machines. parallelism is obtained through domain decomposition for both communication models. Single processor computational rates as well as scalability curves are given on various architectures. For the architectures studied in this work, the OpenMP or hybrid OpenMP/MPI communication strategies achieved no appreciable performance benefit over an exclusive MPI communication strategy.
Cluster of Workstations is becoming an important kind of parallel computing platform. Two issues are essential for promoting the popularization of COW: high-performance communication mechanism and powerful programming...
详细信息
ISBN:
(纸本)0818682596
Cluster of Workstations is becoming an important kind of parallel computing platform. Two issues are essential for promoting the popularization of COW: high-performance communication mechanism and powerful programming environment. In this paper Mie present our research work on COW, which aims at giving a practical solution to the above issues. In order to build a high performance communication system, we design a reduced communication protocol in addition to making use of high-speed network hardware. Based on PVM, we implement a practical programming environment which include parallel compiling systems, parallel debugger, load balancing tool and execution fault recovery tool. Some performance results are given in this paper.
bsp is a bridging model between abstract execution and concrete parallel systems. Structure and abstraction brought by bsp allow to have portable parallel programs with scalable performance predictions, without dealin...
详细信息
bsp is a bridging model between abstract execution and concrete parallel systems. Structure and abstraction brought by bsp allow to have portable parallel programs with scalable performance predictions, without dealing with low-level details of architectures. In the past, we designed bsml for programming bsp algorithms in ml. However, the simplicity of the bsp model does not fit the complexity of today's hierarchical architectures such as clusters of machines with multiple multi-core processors. The multi-bsp model is an extension of the bsp model which brings a tree-based view of nested components of hierarchical architectures. To program multi-bsp algorithms in ml, we propose the multi-ml language as an extension of bsml where a specific kind of recursion is used to go through a hierarchy of computing nodes. We define a formal semantics of the language and present preliminary experiments which show performance improvements with respect to bsml.
A large number of reads generated by the next generation sequencing platform will contain many repetitive subsequences. Effective localizing and identifying genomic regions containing repetitive subsequences will cont...
详细信息
ISBN:
(纸本)9781665496391
A large number of reads generated by the next generation sequencing platform will contain many repetitive subsequences. Effective localizing and identifying genomic regions containing repetitive subsequences will contribute to the subsequent genomic data analysis. To accelerate the alignment between large-scale short reads and reference genome with many repetitive subsequences, this paper develops a compact de Bruijn graph based short-read alignment algorithm on distributed parallel computing platform. The algorithm uses resilient distributed data sets (RDDS) to perform calculations in memory, and executes the broadcast method to distribute short reads and reference genome to the computing nodes to reduce the data communication time on the cluster system, and the number of RDD partitions is set to optimize the performance of parallel aligning algorithm. Experimental results on real datasets show that compared with the compact de Bruijn graph based sequential short-read alignment algorithm, our implemented distributed parallel alignment algorithm achieves good acceleration on the premise of obtaining the same correct alignment percentage as a whole, and compared with existing distributed parallel alignment algorithms, the implemented parallel algorithm can more quickly complete the alignment between large-scale short reads and reference genome with highly repetitive subsequences.
Wireless data broadcast is usually used to disseminate the public information to considerable clients in wireless environment. Based on the development of mobile device, some mobile devices can simultaneously use mult...
详细信息
ISBN:
(纸本)9780769548982;9781467345668
Wireless data broadcast is usually used to disseminate the public information to considerable clients in wireless environment. Based on the development of mobile device, some mobile devices can simultaneously use multiple antennae to process some operations. In this paper, we commit to study data retriveal schdeuling problem in wireless data broadcast, which mobile device has multiple antennae and the data items are non-consecutivly broadcasted in mutiple parallel channels. In order to improve the performance of downloading requested data items, we propose a scheme of scheduling parallel data retrievals for non-consecutive broadcast to decrease access latency and energy consumption. In this scheme, we utilize a weight of channel through two balancing factors to choose the suitable channel for data retrieval. We analyze the performance of propsed scheme and the other schemes. Evaluation results prove that the proposed scheme has the better performance.
The OREGAMI project involves the design, implementation, and testing of algorithms for mapping parallel computations to message-passing parallelarchitectures. OREGAMI addresses the mapping problem by exploiting regul...
详细信息
The OREGAMI project involves the design, implementation, and testing of algorithms for mapping parallel computations to message-passing parallelarchitectures. OREGAMI addresses the mapping problem by exploiting regularity and by allowing the user to guide and evaluate mapping decisions made by OREGAMI's efficient combinatorial mapping algorithms. OREGAMI's approach to mapping is based on a new graph theoretic model of parallel computation called the Temporal Communication Graph. The OREGAMI software tools include three components: (I ) LaRCS is a graph description language which allows the user to describe regularity in the communication topology as well as the temporal communication behavior (the pattern of message-passing over time). (2) MAPPER is our library of mapping algorithms which utilize information provided by LaRCS to perform contraction, embedding, and routing. (3) METRICS is an interactive graphics tool for display and analysis of mappings. This paper gives an overview of the OREGAMI project, the software tools, and OREGAMI's mapping algorithms.
Network packet processing architectures use heterogeneous processors as accelerators to speed-up classic application domain tasks. Our platform compiles applications to bytecodes for a generalized packet processing ma...
详细信息
ISBN:
(纸本)9781538694039
Network packet processing architectures use heterogeneous processors as accelerators to speed-up classic application domain tasks. Our platform compiles applications to bytecodes for a generalized packet processing machine, then uses microcoded interpreters running in parallel to trigger accelerators as needed. To make the system effective requires helping users debug apps, which includes tracking runtime exceptions. Exception tracking is complicated when a system-thrown exception is detected on an accelerator and the current binary form is far removed from the original high-level language source or associated assembly code. We tackle this problem by (1) instrumenting the compiler and a low-level bytecode tool, (2) reporting exceptions with the interpreter, (3) creating a specialized tool to collate the higher-level program forms with the lower-level bytecode forms. This functionality provides data needed for post-mortem program analysis.
General purpose parallel computing systems come in a variety of forms. We have various kinds of distributed memory architectures, shared memory multiprocessors, and clusters of workstations. New technologies may incre...
详细信息
General purpose parallel computing systems come in a variety of forms. We have various kinds of distributed memory architectures, shared memory multiprocessors, and clusters of workstations. New technologies may increase this range still further. Can one hope to design portable and scalable parallel software in the face of such architectural diversity? In this paper we show that it is indeed possible to produce fully portable parallel software which will run with highly efficient, scalable and predictable performance on any general purpose parallel architecture. The approach we describe is based on the bulk synchronous parallel (BSP) model of computation. The BSP model provides a simple, unified framework for the design and programming of all kinds of general purpose parallel systems. Over the last few years, a number of important research activities in algorithms and architectures have been pursued as part of this new approach to scalable parallel computing. In this paper we give some simple BSP algorithms and show how they can be expressed as programs. We also briefly describe some of the BSP programming language developments which are now being pursued.
暂无评论