parallel applications are notorious for their intractability to performance debugging. Automatic performance analysis techniques, such as those used by Kojak and KappaPI, are promising in alleviating the difficulty of...
详细信息
ISBN:
(纸本)9780889867048
parallel applications are notorious for their intractability to performance debugging. Automatic performance analysis techniques, such as those used by Kojak and KappaPI, are promising in alleviating the difficulty of discovering performance inefficiencies in parallel applications. However, as we show in this paper, the results produced by these tool can be potentially misleading and sometimes, outright incorrect. the reason is that the overhead due to performance inefficiencies originating at a certain point in the program can causally propagate and manifest itself at other points. Current techniques perform a flat analysis, i.e., they do not account for causal propagation. In this paper, we present a method of causal analysis that current analysis techniques can be retrofitted with to account for causal propagation of overhead to arrive at a more accurate description of performance bottlenecks. We also show various advantages rendered by this technique to improving the effectiveness of automatic performance analysis. In this paper, we only tackle overhead related to communication operations in MPI parallel application. In general, however, our technique can be used for non-communication related overhead for any parallel programming paradigm.
this paper presents an evolutionary prototyping methodology oriented to the model, design and implementation of concurrent distributedsystems. this methodology use two several stages: a modeling language of concurren...
详细信息
this paper presents an evolutionary prototyping methodology oriented to the model, design and implementation of concurrent distributedsystems. this methodology use two several stages: a modeling language of concurrent distributedsystems LeMSiDiC (a graphical modeled language who provides UML-like structuring capabilities and a precise syntax and semantic for automatic source code generation for these types of systems);a source code generator GeCSiDiC (a code generator able to construct the objects associated to the model specified with LeMSiDiC using the object-oriented paradigm). the methodology allows to interrelate with one architecture oriented to concurrent distributedsystems management or to interrelate with concurrent distributedsystems without a specialized support.
this paper proposes a network routing algorithm REI which has autonomous adaptability to network traffic conditions. When a routing node has some different paths to a given destination, we can evaluate these paths in ...
详细信息
this paper proposes a network routing algorithm REI which has autonomous adaptability to network traffic conditions. When a routing node has some different paths to a given destination, we can evaluate these paths in terms of their latency (delay time) given in inbound data packets. Evaluating scores of the paths, every node works as a distributed autonomous agent for adaptive routing. By network simulations to compare with a conventional OSPF and enhanced ones, we show that the multiagents-based routing algorithm has good adaptability in congested path avoidance and network load balancing.
Read-Copy-Update (RCU) is a mechanism designed to increase the level of concurrency in readers-writer synchronization scenarios, vastly improving scalability of software running on multiprocessor machines. Most existi...
详细信息
ISBN:
(纸本)9781467345651
Read-Copy-Update (RCU) is a mechanism designed to increase the level of concurrency in readers-writer synchronization scenarios, vastly improving scalability of software running on multiprocessor machines. Most existing RCU variants have been developed for and studied within the Linux kernel. Due to strong dependency on the Linux internals, they cannot be easily transferred to other operating system kernels. this paper presents a novel non-intrusive variant of the RCU mechanism (AP-RCU), which depends only on basic kernel-level concepts while maintaining the scalability benefits. We have implemented AP-RCU in the Solaris kernel (UTS) and experimentally confirmed the expected benefits over traditional forms of synchronization, comparable with previous RCU implementations.
In this paper we present and evaluate the performance of two different strategies for the deployment of parallel multifrontal and multiple frontal sparse linear solvers in the context of a parallel finite element code...
详细信息
ISBN:
(纸本)0889865701
In this paper we present and evaluate the performance of two different strategies for the deployment of parallel multifrontal and multiple frontal sparse linear solvers in the context of a parallel finite element code. Direct sparse linear solvers are based on sophisticated reorganisation of the standard Gaussian elimination algorithm withthe aim of exploring matrix sparsity and reducing the amount of fill-in. Such codes can be successfully applied to very large linear systems, and are especially effective when a sparse linear system needs to be solved for multiple right-hand sides. Unfortunately, many important applications, such as finite element solutions of non-linear, transient problems, require repeated factorisation of the coefficient matrix. In such cases the only way of achieving good performance is parallelisation of boththe computation of the finite element matrices and the linear system solution phase. We have developed two different designs for deployment of parallel multifrontal and multiple frontal sparse linear solvers in this context, each deploying three different strategies for the assembly of the global data. these designs are suitable for parallel and heterogeneous architectures. Experiments confirm high efficiency, low communication cost, and reduced initial memory requirements of our deployment designs, compared to a standard deployment strategy.
We have proposed and implemented a distributed asynchronous Web-based training (WBT) system. In order to improve the scalability and robustness of this system, all exercises and functions, such as scores user's an...
详细信息
ISBN:
(纸本)9780889867048
We have proposed and implemented a distributed asynchronous Web-based training (WBT) system. In order to improve the scalability and robustness of this system, all exercises and functions, such as scores user's answers are realized on mobile agents. these agents are distributed to computers, and they can be constructed with a P2P network that modified Content-Addressable Network (CAN). In this paper, we present the exercise management scheme for the proposed WBT system. In a WBT system based on client/server model, management of exercises is simply achieved by manipulating data, since all contents are concentrated in one server computer. In the proposed system, however, we need to pay attention to distributed agents for management of exercises. In order to achieve operation of exercise management, i.e., adding, deleting and updating exercises on the distributed WBT system, we use steps with considering multi-agent based distributed environment: specifying an agent which provides an exercise and searching its node as pre-operation, and sending an agent to the expected node and notifying other cooperating agents as post-operation.
Due to the prevalence of sensors such as live cameras or environmental sensors, sensor data stream delivery, which requires continuous and cyclic data delivery attracts great attention. For sensor data stream delivery...
详细信息
ISBN:
(纸本)9781467345651;9780769549033
Due to the prevalence of sensors such as live cameras or environmental sensors, sensor data stream delivery, which requires continuous and cyclic data delivery attracts great attention. For sensor data stream delivery, various communication loads balancing techniques have been studied since the load of the sensor data source become high to accommodate large number of clients. However, these studies assume only the requests that have the same collection cycle, which is not enough for the actual applications. In this paper, we propose a sensor data stream delivery system with communication loads balancing for heterogeneous collection cycle requests. the proposed system distributes the loads by re-delivering the sensor data that are requested by other clients with different collection cycles but have common cycles.
the increasing significance of distributedcomputing becomes more and more crucial withthe prevail of technological advances that make Global computing a reality in modern world. Indeed, it is hard to imagine some ap...
详细信息
ISBN:
(纸本)9783642328206
the increasing significance of distributedcomputing becomes more and more crucial withthe prevail of technological advances that make Global computing a reality in modern world. Indeed, it is hard to imagine some application or computational activity and process that falls outside distributedcomputing. Withthe large advent of distributedsystems, we are faced withthe real challenges of distributed computation: How do we cope with asynchrony and failures? How (and how well) do we achieve load balancing? How do we model and analyze malicious and selfish behavior? How do we address mobility, heterogeneity and the dynamic nature of participating processes? What can we achieve in the presence of disconnecting operations that cause network partitioning?
On the instruction-level parallel architecture such as VLIW, the performance is affected by the compiler technique. In this paper, we propose an integrated optimization technique which cooperates register reusing, spi...
详细信息
On the instruction-level parallel architecture such as VLIW, the performance is affected by the compiler technique. In this paper, we propose an integrated optimization technique which cooperates register reusing, spilling and rematerialization First, we develop a register allocation method that can be decided, whether the register must be reusing or spilled or rematerialized by the prediction of the execution timing of the instruction in the program, when registers are insufficient. We evaluate our method in comparison with conventional compiler technique for blocks of programs. Second, the spilling and the rematerialization are also applied to the software pipelining to improve the parallelism in the loops. It was shown that the spilling and the rematerialization adopted in the scheduling, improves the parallelism in the loop executions.
distributed storage systems are increasing being used by data-intensive applications for efficient and reliable data delivery. the Network Storage Manager (NSM) is a distributed storage framework with a unique archite...
详细信息
distributed storage systems are increasing being used by data-intensive applications for efficient and reliable data delivery. the Network Storage Manager (NSM) is a distributed storage framework with a unique architecture that maximizes applications control over many of the storage and retrieval policies. Several applications are utilizing NSM for efficient, tunable, and controllable performance. Data layout is one policy that is considered to be application-dependant and tailored algorithms are preferred for application with complex or irregular access patters. Experimental results have shown dramatic performance enhancement when optimized layout policies override the default NSM implementation. Layout algorithms are more effective when proper prefetching and cache replacement policies are implemented.
暂无评论