Breadth-first Search (BFS) is a fundamental graph theory algorithm that is extensively used to abstract various challenging computational problems. Due to the fine-grained irregular memory accesses, parallelization of...
详细信息
ISBN:
(纸本)9780889868113
Breadth-first Search (BFS) is a fundamental graph theory algorithm that is extensively used to abstract various challenging computational problems. Due to the fine-grained irregular memory accesses, parallelization of BFS can exhibit limited performance on cache-based systems. In this paper, we study the relationship between the topology of input graphs and the performance of BFS on multicore systems. We propose a model to estimate the scalability of BFS with respect to a given graph. Using this model, we propose a topologically adaptive parallel BFS algorithm on multicore systems. The proposed algorithm estimates scalability of each iteration of BFS with respect to the input graph at runtime. An adaptive barrier is developed for this algorithm, which dynamically adjusts the number of threads participating in the BFS according to the estimated scalability. In this way, we reduce the synchronization overhead. We evaluate the proposed algorithm using various graphs on state-of-the-art multicore systems. The proposed method exhibits improved performance compared with traditional parallel BFS algorithms for which the number of threads is fixed.
distributed object-oriented middleware technologies have been adopted for ubiquitous communicating real-time and embedded systems. Although Java plays an important role in building distributed object-oriented middlewa...
详细信息
ISBN:
(纸本)9780889866386
distributed object-oriented middleware technologies have been adopted for ubiquitous communicating real-time and embedded systems. Although Java plays an important role in building distributed object-oriented middleware because of its portability and productivity, it is not popularto real-time system developers because it is slow and hard to keep real-timeness due to garbage collection. This paper presents a novel memory classification and measurement approach for modem programming language such as Java to analyze the memory utilization of middleware technologies in order to improve them for embedded systems using limited computing resource. We classified Java memory into four categories such as static, quasi-static, quasi-dynamic and dynamic, and show how to measure the size of each memory. Intensive case studies using popular Java middleware technologies such as Web Services, CORBA, RMI and HORB revealed that reducing the use of dynamic memory has contributed far more to the efficiency and performance than reducing the use of quasi-static and quasi-dynamic memory.
The Replica Placement Problem (RPP) aims at selecting the nodes for duplicating data objects in order to optimize their access. Even though a lot of work already exists on RPP, the issue of implementing the resulting ...
详细信息
ISBN:
(纸本)9780889866386
The Replica Placement Problem (RPP) aims at selecting the nodes for duplicating data objects in order to optimize their access. Even though a lot of work already exists on RPP, the issue of implementing the resulting allocation scheme is typically overlooked. In this paper we introduce the Replica Transfer Scheduling Problem (RTSP), briefly stated as: given a network of servers with limited storage capacitv, a set of data objects and two replication schemes x(old) and X-new, find a schedule of object transfers and deletions for implementing X-old based on X-old with minimum communication cost. Given that this problem is NP-complete, we introduce several different heuristics to solve it, and evaluate them via simulations.
We motivate the value of an abstract distributed service component (ADSC) with an overview of JANET, a high-performance grid computing service. We then present the architecture of an ADSC, focusing on some design issues.
ISBN:
(纸本)088986392X
We motivate the value of an abstract distributed service component (ADSC) with an overview of JANET, a high-performance grid computing service. We then present the architecture of an ADSC, focusing on some design issues.
We describe a highly scalable algorithm for secure system evolution in an infrastructure for widely distributed Byzantine fault-tolerant applications. To maintain high availability, the system and its applications evo...
详细信息
ISBN:
(纸本)9780889866386
We describe a highly scalable algorithm for secure system evolution in an infrastructure for widely distributed Byzantine fault-tolerant applications. To maintain high availability, the system and its applications evolve on-line, providing uninterrupted service during installation of upgrades. Installations are made to appear atomic with respect to other installations and application execution steps. Our algorithm guarantees safe installation despite Byzantine faulty replicas and replica groups. An initial phase prepares replica groups for an upgrade, while a second phase triggers the installation of the upgrade by gossip among groups. A simple but novel scheme using secret sharing and Byzantine quorums prevents faulty replicas and replica groups from disrupting or maliciously exploiting installations. Installation message complexity and computational complexity grow linearly with the number of replicas.
Scheduling tasks onto the processors of a parallel system is a crucial part of program parallelisation. Due to the NP-hardness of the task scheduling problem, scheduling algorithms are based on heuristics that try to ...
详细信息
ISBN:
(纸本)9780889866379
Scheduling tasks onto the processors of a parallel system is a crucial part of program parallelisation. Due to the NP-hardness of the task scheduling problem, scheduling algorithms are based on heuristics that try to produce good rather than optimal schedules. Nevertheless, in certain situations it is desirable to have optimal schedules, for example for time critical systems or to evaluate scheduling heuristics. This paper proposes a scheduling algorithm based on A* that can produce optimal schedules in reasonable time for small task graphs. A* is a best-first state space search algorithm. In comparison to a previous approach, the here presented scheduling algorithm has a significantly reduced search space due to a much improved cost function f (s) and additional pruning techniques. Experimental results reveal the relation between the runtime of the algorithm and the structure of the task graphs. Further, it is shown that the proposed algorithm significantly outperforms the previous approach.
Synchronization of data stream processing has a significant impact on performance of systems where processing of long sequences of data items needs to be done simultaneously. In earlier works on stream processing, syn...
详细信息
ISBN:
(纸本)9780889866379
Synchronization of data stream processing has a significant impact on performance of systems where processing of long sequences of data items needs to be done simultaneously. In earlier works on stream processing, synchronization has been discussed to a limited extent or has been completely overlooked. This work describes a formal model of synchronization in a data stream processing network. We use a notation of data stream processing networks to identify circumstances that necessitate synchronization and express processing of data items in a data stream processing network in terms of database transactions. A technique similar to timestamp ordering of database transactions is used to solve the problems. A solution is presented as a set of rules that govern processing of individual data items. A proof of correctness for the strategy used is also provided.
Resources in the computational Grid nowadays are usually owned by multiple administrative domains which are connected by WANs. Grid resources have their own local management policies and a Grid scheduler cannot contro...
详细信息
ISBN:
(纸本)9780889866386
Resources in the computational Grid nowadays are usually owned by multiple administrative domains which are connected by WANs. Grid resources have their own local management policies and a Grid scheduler cannot control resources beyond the domain of itself. Scheduling algorithms designed for traditional systems do not consider these new characteristics in Grids. In this paper, a two-phase scheduling approach for DAG-based applications in the Grid is proposed. In the first phase, the task graph is partitioned according to the status of selected available resource clusters by an algorithm called GPRC. In the second phase, an algorithm called HEFT is used to map task nodes to processing nodes inside a resource cluster. GPRC does not require detailed status information or control privilege on every Grid resource so that the dependence on Grid information services is reduced and, at the same time, the higher priority of local resource management policies is respected. Analysis and experiment show the effectiveness and adaptation of this new approach in the Grid scenario.
Most modem parallel computers are clusters using Myrinet or Ethernet communication networks. Several studies have been published comparing the performance of these two networks for parallelcomputing, however these fo...
详细信息
ISBN:
(纸本)9780889866379
Most modem parallel computers are clusters using Myrinet or Ethernet communication networks. Several studies have been published comparing the performance of these two networks for parallelcomputing, however these focus on average performance, and do not address the distributions of communication times, which can have long tails due to contention effects. In the case of Ethernet with TCP, retransmit timeouts (RTOs) can also occur. Slow communication events may have significant impact, particularly for applications requiring frequent synchronization, where the performance is determined by the slowest process. We have analysed the distributions of communication times for standard MPI routines on Ethernet with TCP and Myrinet with GM communications networks on the same cluster, and studied the scalability of the distributions as the number of communicating processes is increased, and the effect of RTOs for Ethernet with TCP.
Our objective is to provide location-, topology-, and administrative- transparent grid computing forMPI applications, while hiding the physical details of computing platforms and heterogeneous networks fromthe applica...
详细信息
ISBN:
(纸本)9780889868113
Our objective is to provide location-, topology-, and administrative- transparent grid computing forMPI applications, while hiding the physical details of computing platforms and heterogeneous networks fromthe application elopers and users. To achieve this objective, we introduced a new resource allocation model, workflow structures to specify MPI applications involving multiple tasks, and message relay to enable communication across different networks. We eloped the SGR framework, which integrates workflow scheduling, task grouping, and message relay services, while hiding resource allocation, heterogeneous networks, and decentralized resource management systems from application elopers and users. The SGR system has been implemented on a Globus-enabled computing grid. We created a simulation environment to investigate our model and various schedulers. Using the findings from simulation, we implemented the SGR framework and tested the model's implementation on a two-cluster grid. We observed that duplication can improve performance by more than 15%, which matches our simulation results. Moreover, we evaluated our new message relay service for cross-site message passing. The test results indicate that although the SGR's message relay service has some communication overhead, the system is scalable with respect to the number of processes and the message size.
暂无评论