Graph partitioning is used to solve the problem of distributing computations to a number of processors, in order to improve the performance of time consuming applications in parallel environments. A common approach to...
详细信息
ISBN:
(纸本)9781467387767
Graph partitioning is used to solve the problem of distributing computations to a number of processors, in order to improve the performance of time consuming applications in parallel environments. A common approach to solve this problem is based on a multilevel framework, where the graph is firstly coarsened to a smaller instance and then it is partitioned in a number of parts using recursive bisection (RB) based methods. However, in applications where initial fixed vertices are used to model additional constraints of the problem, RB based methods often fail to produce partitions of good quality. In this paper, we propose a new direct k-way greedy graph growing algorithm, called KGGGP, that overcomes this issue and succeeds to produce partition with better quality than RB while respecting the constraint of fixed vertices. In the experimental section, we present results which compare KGGGP against state-of-the-art methods for graphs available from the popular DIMACS'10 collection.
Recently, a number of application-level multicast protocols have been proposed using a center based approach. These multicast trees critically depend on the position of the core for balanced tree organization. Motivat...
详细信息
ISBN:
(纸本)1932415262
Recently, a number of application-level multicast protocols have been proposed using a center based approach. These multicast trees critically depend on the position of the core for balanced tree organization. Motivated by peer-to-peer streaming media applications, this paper presents a novel approach for core migration in application-level multicast trees which aims to reduce the maximum overall delay. Our architecture is inspired by previous research that proposed migration as an approach to reduce the overall delay in the tree. Our core migration protocol uses a heuristic approach based on the tree diameter to determine the node to be elected as the core. We also present an alternative tree reorganization approach which we compare with core migration. We present simulation results with our overlay protocol YimaCast and show the feasibility of the approach for trees of different sizes. We demonstrate that core migration works well in a very dynamic environment while tree reconstruction is beneficial in more stable scenarios.
Many large scale applications, have significant I/O requirements as well as computational and memory requirements. Unfortunately, limited number of I/O nodes provided by the contemporary message passing distributed-me...
详细信息
ISBN:
(纸本)0818686510
Many large scale applications, have significant I/O requirements as well as computational and memory requirements. Unfortunately, limited number of I/O nodes provided by the contemporary message passing distributed-memory architectures such as Intel Paragon and IBM SP-2 limits the I/O performance of these applications severely. In this paper, we examine some software optimization techniques and architectural scalability and evaluate the effect of them in five I/O intensive applications from both small and large application domains. Our goals in this study are twofold: First, we want to understand the behavior of large-scale data intensive applications and the impact of I/O subsystem on their performance and vice-versa. Second, and more importantly, we strive to determine the solutions for improving the applications' performance by a mix of architectural and software solutions. Our results reveal that the different applications can benefit from different optimizations. For example, we found that some applications benefit from file layout optimizations whereas some others benefit from collective IIO. A combination of architectural and software solutions is normally needed to obtain good I/O performance. For example, we show that with limited number of I/O resources, it is possible to obtain good performance by using appropriate software optimizations. We also show that beyond a certain level, imbalance in the architecture results in performance degradation even when using optimized software, thereby indicating the necessity of increase in I/O resources.
The efficient scheduling of large mixed parallelapplications is challenging. Most existing algorithms utilize scheduling heuristics and approximation algorithms to determine a good schedule as basis for an efficient ...
详细信息
ISBN:
(纸本)9780769535449
The efficient scheduling of large mixed parallelapplications is challenging. Most existing algorithms utilize scheduling heuristics and approximation algorithms to determine a good schedule as basis for an efficient execution in large scale scientific computing. This paper concentrates on the scheduling of mixed parallelapplications represented by task graphs with parallel tasks and precedence constraints between them. Layer-based scheduling algorithms for homogeneous target platforms are improved by adding a move-blocks phase that further reduces the resulting parallel runtime. The layer-based scheduling approach is described and the move-blocks algorithm is introduced in detail. The move-blocks extension provides better scheduling results for small as well as for large problems but has only a small increase in runtime. This is shown by a comparison of the modified and the original algorithms over a wide range of test cases.
The pyramid network is one of the important architectures in parallel computing, network computing, computer vision, and image processing. The property of pancyclic is a graph or a network, which has cycles with all p...
详细信息
ISBN:
(纸本)1932415262
The pyramid network is one of the important architectures in parallel computing, network computing, computer vision, and image processing. The property of pancyclic is a graph or a network, which has cycles with all possible lengths. In this paper, we study the pancyclic property of pyramid networks. First, all possible cycles of the length in the n×n mesh are constructed, where n is even. We then propose an algorithm to construct cycles of all possible lengths in pyramid networks and show that the pyramid network is a pancyclic network. An n-node network containing a cycle of length n is a Hamiltonian network. Thus, the pyramid network is also a Hamiltonian network.
Tasks running in computational Grids may need multiple types of resources simultaneously. Traditional scheduling heuristics only schedule tasks to single type of resource - computing resource. Therefore, it is necessa...
详细信息
ISBN:
(纸本)1892512416
Tasks running in computational Grids may need multiple types of resources simultaneously. Traditional scheduling heuristics only schedule tasks to single type of resource - computing resource. Therefore, it is necessary to develop resource co-allocation heuristics for tasks running in computational Grids. In this paper, models of computational Grids and meta-task running in computational Grids are presented. Based on these models, two resource co-allocation heuristics for meta-task are investigated and the performance is studied.
Marginal analysis pervades economic and physical theory. Yet, little research in the area of parallel job scheduling has exploited this technique in order to allocate processors among parallel jobs in a multiprocessin...
详细信息
ISBN:
(纸本)1892512416
Marginal analysis pervades economic and physical theory. Yet, little research in the area of parallel job scheduling has exploited this technique in order to allocate processors among parallel jobs in a multiprocessing, multiprocessor environment. In this paper, we examine "where best" to (re)allocate each of m processors based on a comparison of its marginal benefit among a set of n (n &le m) malleable jobs. Using this strategy with full a priori information available to the job scheduler, we compare the average time-to-completion and overall time-to-completion (or throughput) with the benchmark strategy of "shortest job first". The simulation results demonstrate that marginal analysis markedly improves both system efficiency and system performance over the "shortest job first".
We have been building the Seoul Grid testbed which covers whole area of Seoul, the capital city of Korea, For it, we use Globus, an enabling software which allows a vertically integrated treatment of networked computi...
详细信息
ISBN:
(纸本)1892512416
We have been building the Seoul Grid testbed which covers whole area of Seoul, the capital city of Korea, For it, we use Globus, an enabling software which allows a vertically integrated treatment of networked computing resources. However, currently Globus does not provide all solutions in package for our Grid testbed. In order to run our Grid testbed, we need an unified resource management framework which meets our requirement. Thus, we have developed an unified resource management framework for our Seoul Grid testbed. We implemented the framework into Seoul Grid portal. In this paper, we explain the details of the framework in our Seoul Grid portal.
Todays high-end parallel clusters are architecturally very complex. Most large scale applications nowadays are utilizing multiple parallel programming paradigms to achieve the required scalability, with MPI+threads be...
详细信息
ISBN:
(纸本)9781665469586
Todays high-end parallel clusters are architecturally very complex. Most large scale applications nowadays are utilizing multiple parallel programming paradigms to achieve the required scalability, with MPI+threads being the most common approach. Yet, as of today, there is no parallel I/O library that matches this hybrid programming model. File I/O operations are typically executed by a single thread for each process. This paper explores multi-threaded optimizations for individual MPI I/O operations, an important step towards matching the execution model of modern parallelapplications. We describe the changes necessary to the internal processing in the MPI I/O library as well as to the file access phase. We demonstrate the performance improvement of the redesigned functions using multiple benchmarks and on multiple platforms for many scenarios over the original, single-threaded version.
A distributed Lock Manager (DLM) provides advisory locking services to applications such as databases and file systems that run on distributed systems. Lock management at the server is implemented using First-In-First...
详细信息
ISBN:
(纸本)0769523803
A distributed Lock Manager (DLM) provides advisory locking services to applications such as databases and file systems that run on distributed systems. Lock management at the server is implemented using First-In-First-Out (FIFO) queues. In this paper, we demonstrate a novel way of delegating the lock management to the participating lock-requesting nodes, using advanced network primitives such as Remote Direct Memory Access (RDMA) and Atomic operations. This nicely complements the original idea of DLM, where management of the lock space is distributed. Our implementation achieves better load balancing, reduction in server load and improved throughput over traditional designs.
暂无评论