Code coupling applications can be divided into communicating modules, that may be executed on different clusters in a cluster federation. As a cluster federation comprises of a large number of nodes, there is a high p...
详细信息
ISBN:
(纸本)0769521320
Code coupling applications can be divided into communicating modules, that may be executed on different clusters in a cluster federation. As a cluster federation comprises of a large number of nodes, there is a high probability of a node failure. We propose a hierarchical checkpointing protocol that combines a synchronized checkpointing technique inside clusters and a communication- induced technique between clusters. This protocol fits to the characteristics of a cluster federation (large number of nodes, high latency and low bandwidth networking technologies between clusters). A preliminary performance evaluation performed using a discrete event simulator shows that the protocol is suitable for code coupling applications.
The proceedings contain 55 papers from the thirdieee International symposium on Network Computing and Applications, NCA 2004. The topics discussed include: intrusion tolerance for Internet applications;exploring the ...
详细信息
ISBN:
(纸本)0769522424
The proceedings contain 55 papers from the thirdieee International symposium on Network Computing and Applications, NCA 2004. The topics discussed include: intrusion tolerance for Internet applications;exploring the network/software boundary: a telecommunication perspective;objective-greedy algorithms for long-term web prefetching;a methodological construction of an efficient sequential consistency protocol;frequent episode rules for Internet anomaly detection;performance analysis of cryptographic protocols on handheld devices;dynamic key messaging for cluster computing;and speculative network processor for quality-of-service-aware protocol processing.
This paper presents a generic parallel architecture for fast elliptic curve scalar multiplication over binary extension fields. We show how the parallel strategy followed in this work leads to high performance designs...
详细信息
ISBN:
(纸本)0769521320
This paper presents a generic parallel architecture for fast elliptic curve scalar multiplication over binary extension fields. We show how the parallel strategy followed in this work leads to high performance designs. We also implemented the proposed architecture on reconfigurable hardware devices where the predicted expeditious performance figures were actually obtained. The results achieved show that our proposed design is able to compute GF(2 191) elliptic curve scalar multiplication operations in 56.44 μSecs.
Managing multi-resolution simulations is a real challenge in large-scale distributed simulation systems. Various levels of resolution in a simulation, may result in inconsistencies, due to improper correlations betwee...
详细信息
ISBN:
(纸本)0769521320
Managing multi-resolution simulations is a real challenge in large-scale distributed simulation systems. Various levels of resolution in a simulation, may result in inconsistencies, due to improper correlations between attributes of entities interacting at varied levels of resolution. Aggregation/ disaggregation (A/D) is a method for implementing multi- resolution simulations within a High Level Architecture (HLA) federation. This paper is a follow- up of our earlier work, where we studied the Dynamic Grid-Based vs. Region-Based Data Distribution Management (DDM) Strategies, in a mono-resolution distributed simulation system. In this paper, we wish to extend that work, and present a multi-resolution scheme, based upon the aggregation and disaggregation (A/D) paradigm. The purpose of A/D is to ensure consistency in state updates between federates simulating objects at various levels of resolution. We present an extensive simulation experiments using several real-world benchmarks to evaluate the performance of both data distribution management schemes using our multiresolution paradigm.
In this paper we study the problem of redistributing in parallel data between clusters interconnected by a backbone. This problem is a generalization of the well-known redistribution problem that appears in parallelis...
详细信息
ISBN:
(纸本)0769521320
In this paper we study the problem of redistributing in parallel data between clusters interconnected by a backbone. This problem is a generalization of the well-known redistribution problem that appears in parallelism [9]. We suppose that at most k communications can be performed at the same time (the value of k depending on the characteristics of the platform). We use the knowledge of the application in order to schedule the messages and perform a control of the congestion by ourselves. Previous results [7, 6] show that this problem is NP-Complete. We propose and study two fast and efficient algorithms for this problem. We prove that these algorithms are 2-approximation algorithms. Simulation results show that both algorithms perform very well compared to the optimal solution. These algorithms have been implemented using MPI. Experimental results show that both algorithms outperform a brute-force TCP based solution, where no scheduling of the messages is performed.
In order for I/O systems to achieve high performance in a parallel environment, they must either sacrifice client-side file caching, or keep caching and deal with complex coherency issues. The most common technique fo...
详细信息
ISBN:
(纸本)0769521320
In order for I/O systems to achieve high performance in a parallel environment, they must either sacrifice client-side file caching, or keep caching and deal with complex coherency issues. The most common technique for dealing with cache coherency in multi-client file caching environments uses file locks to bypass the client-side cache. Aside from effectively disabling cache usage, file locking is sometimes unavailable on larger systems. The high-level abstraction layer of MPI allows us to tackle cache coherency with additional information and coordination without using file locks. By approaching the cache coherency issue further up, the underlying I/O accesses can be modified in such a way as to ensure access to coherent data while satisfying the user's I/O request. We can effectively exploit the benefits of a file system's client-side cache while minimizing its management costs.
In mobile distributed systems, vital resources like battery power and wireless channel bandwidth impose significant challenges in ubiquitous information access. In this paper, we propose a novel energy and bandwidth e...
详细信息
ISBN:
(纸本)0769521320
In mobile distributed systems, vital resources like battery power and wireless channel bandwidth impose significant challenges in ubiquitous information access. In this paper, we propose a novel energy and bandwidth efficient data caching mechanism, called GreedyDual Least Utility (GDLU), that enhances dynamic data availability while maintaining consistency. The proposed utility-based caching triechanism considers several characteristics of mobile distributed systems, such as connection-disconnection, mobility handoff, data update and user request patterns to achieve significant energy savings in mobile devices. Based on the utility function derived from an analytical model, we propose a cache replacement algorithm and a passive prefetching algorithm to cache and prefetch data objects. Our comprehensive simulation experiments demonstrate that the proposed mechanism achieves more than 10% energy saving and near-optimal performance tradeoff between access latency and energy consumption.
This paper compares the performance of three programming paradigms for the parallelization of nested loop algorithms onto BMP clusters. More specifically, we propose three alternative models for tiled nested loop algo...
详细信息
ISBN:
(纸本)0769521320
This paper compares the performance of three programming paradigms for the parallelization of nested loop algorithms onto BMP clusters. More specifically, we propose three alternative models for tiled nested loop algorithms, namely a pure message passing paradigm, as well as two hybrid ones, that implement communication both through message passing and shared memory access. The hybrid models adopt an advanced hyperplane scheduling scheme, that allows both for minimal thread synchronization, as well as for pipelined execution with overlapping of computation and communication phases. We focus on the experimental evaluation of all three models, and test their performance against several iteration spaces and parallelization grains with the aid of a typical micro-kernel benchmark. We conclude that the hybrid models can in some cases be more beneficial compared to the monolithic pure message passing model, as they exploit better the configuration characteristics of an hierarchical parallel platform, such as an SMP cluster.
In this paper, we propose a new distribution scheme for a parallel Strassen's matrix multiplication algorithm on heterogeneous clusters. In the heterogeneous clustering environment, appropriate data distribution i...
详细信息
ISBN:
(纸本)0769521320
In this paper, we propose a new distribution scheme for a parallel Strassen's matrix multiplication algorithm on heterogeneous clusters. In the heterogeneous clustering environment, appropriate data distribution is the most important factor for achieving maximum overall performance. However, Strossen's algorithm reduces the total operation count to about 7/8 times per one recursion and, hence, the recursion level has an effect on the total operation count. Thus, we need to consider not only load balancing but also the recursion level in Strossen's algorithm. Our scheme achieves both load balancing and reduction of the total operation count. As a result, we achieve a speedup of nearly 21.7% compared to the conventional parallel Strossen's algorithm in a heterogeneous clustering environment.
As Grids become more available and mature in real world settings, users are faced with considerations regarding the efficiency of applications and their capability of utilizing additional nodes distributed over a wide...
详细信息
ISBN:
(纸本)0769522424
As Grids become more available and mature in real world settings, users are faced with considerations regarding the efficiency of applications and their capability of utilizing additional nodes distributed over a wide area network. When both tightly coupled clusters and loosely gathered Grids are available, a cost effective organization will schedule applications that can execute with minimal performance degradation over wide-area networks on Grids, while reserving clusters for applications with high communication costs. In this paper we analyze the performance of the NAS parallel Benchmarks using both MPICH-G2 and MPICH with the ch_p4 device. We compare the results of these communication devices on both tightly and loosely coupled systems, and present an analysis of how parallel applications perform in real-world environments. We make recommendations as to where applications run most efficiently, and under what conditions.
暂无评论