In this paper, we propose an inter-nest cache reuse optimization method for Jacobi codes. This method is easy to apply, but effective in that it enhances cache locality of the Jacobi codes while preserving their coars...
详细信息
Clos networks are an important class of switching networks due to their modular structure and much lower cost compared with crossbars. For routing I/O permutations of Clos networks, sequential routing algorithms are t...
详细信息
ISBN:
(纸本)1595935800
Clos networks are an important class of switching networks due to their modular structure and much lower cost compared with crossbars. For routing I/O permutations of Clos networks, sequential routing algorithms are too slow, and all known parallel algorithms are not practical. We present the algorithm-hardware codesign of a unified fast parallel routing architecture called distributed pipeline routing (DPR) architecture for rearrangeable nonblocking and strictly nonblocking Clos networks. The DPR architecture uses a linear interconnection structure andprocessing elements that performs only shift and logic And operations. We show that a DPR architecture can route any permutation in rearrangeable nonblocking and strictly nonblocking Clos networks in O(N) time. The same architecture can be used to carry out control of any group of connection/disconnection requests for strictly nonblocking Clos networks in O(N) time. Several speeding-up techniques are also presented. This architecture is applicable to packet and circuit switches of practical sizes. Copyright 2006 ACM.
In this paper, we study I/O server placement for optimizing parallel I/O performance on switch-based clusters, which typically adopt irregular network topologies to allow construction of scalable systems with incremen...
详细信息
ISBN:
(纸本)9783540241287
In this paper, we study I/O server placement for optimizing parallel I/O performance on switch-based clusters, which typically adopt irregular network topologies to allow construction of scalable systems with incremental expansion capability. Finding optimal solution to this problem is computationally intractable. We quantified the number of messages travelling through each network link by a workload function, and developed three heuristic algorithms to find good solutions based on the values of the workload function. The maximum-workload-based heuristic chooses the locations for I/O nodes in order to minimize the maximum value of the workload function. The distance-based heuristic aims to minimize the average distance between the compute nodes and I/O nodes, which is equivalent to minimizing average workload on the network links. The load-balance-based heuristic balances the workload on the links based on a recursive traversal of the routing tree for the network. Our simulation results demonstrate performance advantage of our algorithms over a number of algorithms commonly used in existing parallel systems. In particular, the load-balance-based algorithm is superior to the other algorithms in most cases, with improvement ratio of 10 to 95% in terms of parallel I/O throughput.
Deadlock detection and resolution are of the fundamental issues in distributed systems. Although many algorithms have been proposed. these message passing based traditional solutions can hardly meet the challenges of ...
详细信息
ISBN:
(纸本)3540241280
Deadlock detection and resolution are of the fundamental issues in distributed systems. Although many algorithms have been proposed. these message passing based traditional solutions can hardly meet the challenges of the prevailing Internet computing and mobile computing. In this paper, we present a novel algorithm, namely the M-Guard, for deadlock detection and resolution in distributed systems based on mobile agent technology. The proposed algorithm lies in the intersection of the centralized type algorithm and the distributed type algorithm. An agent is employed in our algorithm as a guard with dual-role: when roaming in the system according to a specified itinerary algorithm. the agent collects resource request/allocation information for detecting deadlock cycles as well as propagating the collected network and resource information among the nodes. Consequently, accurate and timely detections of deadlocks can be made without any network node being the performance bottleneck. Preliminary simulation results show that. compared with several other algorithms. the M-Guard algorithm achieves both shorter deadlock persisting time and smaller phantom deadlock ratio. Moreover. the overall network communication overhead can be decreased, too.
This paper integrates the concepts of realtime network monitoring and visualizations into a grid computing architecture on the Internet. We develop a Realtime Network Monitor(RNM) that performs realtime network monito...
详细信息
A distributed Virtual Environment (DVE) system offers a computer-generated virtual world in which individuals located at different places in the physical world can interact with one another. In order to achieve real-t...
详细信息
A distributed Virtual Environment (DVE) system offers a computer-generated virtual world in which individuals located at different places in the physical world can interact with one another. In order to achieve real-time response for a large user base, DVE systems need to have a scalable architecture. In this paper, we present the design of a grid-enabled service oriented framework for facilitating the construction of scalable DVE systems on computing grids. A service component called "gamelet" is proposed, whose distinctive mark is its high mobility for supporting dynamic load sharing. We propose a gamelet migration protocol which can ensure the transparency and efficiency of gamelet migration, and an adaptive gamelet load-balancing (AGLB) algorithm for making gamelet redistribution decisions at runtime. The algorithm considers both the synchronization costs of the DVE system and network latencies inherent in the grid nodes. The activities of the users and the heterogeneity of grid resources are also considered in order to carry out load sharing more effectively. We evaluate the performance of the proposed mechanisms through a multiplayer online game prototype implemented using the Globus toolkit. The results show that our approach can achieve faster response times and higher throughputs than some existing approaches.
The explosive growth of distributed technologies requires frameworks to be adaptable. This paper uses design patterns as building blocks to develop an adaptive pattern-oriented framework for distributed computing appl...
详细信息
Although TCP is known to be inefficient over networks such as wireless, satellite, and log-fat-pipes, it is still the most widely used transport layer protocol even on these networks. In this paper, we explore an alte...
详细信息
ISBN:
(纸本)3540241280
Although TCP is known to be inefficient over networks such as wireless, satellite, and log-fat-pipes, it is still the most widely used transport layer protocol even on these networks. In this paper, we explore an alternative strategy for designing a reliable transport layer protocol that is much more suitable for today's mobile and other types of non-conventional networks. The objective here is to have a single protocol that is compatible with today's communication software and can be easily made to perform better over all types of network. The outcome of the research is a reconfigurable, user-level, reliable transport layer protocol, called RRTP (Reliable and Reconfigurable Transport Protocol) that is TCP-friendly, i.e. it asymptotically converges to fairness as in the case of LIMD (Linear Increase Multiplicative Decrease) algorithms. The protocol is implemented on top of UDP, but it can also easily be incorporated into OS kernels. The paper presents the RRTP algorithm and the key parameters that are necessary for its reconfiguration. We evaluate our protocol using the standard network simulation tool (ns2). Several representative network configurations are used to benchmark the performance of our protocol against TCP in terms of network throughput and congestion loss rate. It is observed that under normal operating conditions, our protocol has a performance advantage of 30% to 700% over TCP in lossy, wireless environments as well as high bandwidth, high latency networks.
The idea of virtual backbone routing has been proposed for efficient routing among a set of mobile nodes in wireless ad hoc networks.. Virtual backbone routing can reduce communication overhead and speedup the routing...
详细信息
ISBN:
(纸本)3540241280
The idea of virtual backbone routing has been proposed for efficient routing among a set of mobile nodes in wireless ad hoc networks.. Virtual backbone routing can reduce communication overhead and speedup the routing process compared with many existing on-demand routing protocols for routing detection. In many studies, Minimum Connected Dominating Set (MCDS) is used to approximate virtual backbones in a unit-disk graph. However finding a MCDS is a NP-hard problem. We propose a distributed, 3-phase protocol for calculating the CDS in this paper. Our new protocol largely reduces the number of nodes in CDS compared with Wu and Li's method, while message and time complexities of our approach remain almost the same as those of Wu and Li's method. We conduct extensive simulations and show our protocol can consistently outperform Wu and Li's method. The correctness of our protocol is proved through theoretical analysis.
暂无评论