Barrier synchronization is a common operation in parallel and distributed systems. A fast implementation is important because it allows fine grained parallel programs to be more efficient. It is therefore important to...
详细信息
Many per-flow scheduling algorithms have been proposed to provide rate and delay guarantees to flows. It is often argued that the need for maintaining per-flow state and performing per-packet classification seriously ...
详细信息
ISBN:
(纸本)0780370163
Many per-flow scheduling algorithms have been proposed to provide rate and delay guarantees to flows. It is often argued that the need for maintaining per-flow state and performing per-packet classification seriously limits the scalability of routers that employ such per-flow scheduling algorithms. Consequently, design of algorithms that can provide per-flow rate and delay guarantees without requiring per-flow functionality in the network core routers has become an active area of research. We propose a methodology to transform any guaranteed rate (GR) per-flow scheduling algorithm into a version that does not require per-flow state to be maintained in the core routers. We prove that a network of such core-stateless servers provides the same delay guarantee as a corresponding network of GR servers.
In this paper, first studied are the distribution characteristics of user behaviors based on log data from a massive web search engine. Analysis shows that stochastic distribution of user queries accords with the char...
详细信息
In this paper, first studied are the distribution characteristics of user behaviors based on log data from a massive web search engine. Analysis shows that stochastic distribution of user queries accords with the characteristics of power-law function and exhibits strong similarity, and the user' s queries and clicked URLs present dramatic locality, which implies that query cache and 'hot click' cache can be employed to improve system performance. Then three typical cache replacement policies are compared, including LRU, FIFO, and LFU with attenuation. In addition, the distribution character-istics of web information are also analyzed, which demonstrates that the link popularity and replica pop-ularity of a URL have positive influence on its importance. Finally, variance between the link popularity and user popularity, and variance between replica popularity and user popularity are analyzed, which give us some important insight that helps us improve the ranking algorithms in a search engine.
An efficient barrier implementation is desirable on parallel systems to obtain good parallel speedup and to support finer-grained computation. Some modern network Interface Cards (NICs) have programmable processors wh...
详细信息
ISBN:
(纸本)0769509908
An efficient barrier implementation is desirable on parallel systems to obtain good parallel speedup and to support finer-grained computation. Some modern network Interface Cards (NICs) have programmable processors which can be used to provide support for collective communications such as barrier In this paper we utilize such a programmable NIC to provide an efficient barrier synchronization operation. This paper describes the design, implementation and evaluation of a NIC-based barrier operation as an addition to Myricom's GM message passing system. Our NIC-based barrier implementation achieved a barrier latency of 102.14 /spl mu/s for 16 nodes which is a 1.78 factor of improvement over the host-based barrier using the same algorithm for LANai 4.3 NIC cards. Using LANai 7.2 cards, which has a faster processor, we achieved a 1.83 factor of improvement for eight nodes. Our NIC-based barrier operation promises scalable fine-grained parallel computation over clusters of workstations. To the best of our knowledge, this is the first NIC-level barrier implementation on a cluster with Myrinet/GM.
Focuses on the utilization of alternative communication paths in local and system area networks with static routing. A lot of research work has been devoted to employing such paths for fault tolerance, but the issue o...
详细信息
ISBN:
(纸本)0769514324
Focuses on the utilization of alternative communication paths in local and system area networks with static routing. A lot of research work has been devoted to employing such paths for fault tolerance, but the issue of utilizing them for performance enhancement has been largely neglected, especially for static routing networks. This work formally proves that the throughput of multiple paths is maximal if the traffic is uniformly distributed over them. Based on this, a procedure for destination partitioning in static routing networks is introduced. It is applicable to arbitrary multi-path topologies and traffic patterns that lend themselves to partitioning. The procedure is applied to several topologies with different degree of equivalent paths coverage and their performance is evaluated through simulations. The results demonstrate that the network performance is significantly improved when the proposed partitioning procedure is applied.
Simultaneous advances in processor, network and protocol technologies have made clusters of workstations attractive vehicles for high performance computing. However, clusters are now being increasingly used in environ...
详细信息
Simultaneous advances in processor, network and protocol technologies have made clusters of workstations attractive vehicles for high performance computing. However, clusters are now being increasingly used in environments characterized by non-cooperating communication flows with a range of service requirements. This necessitates quality of service (QoS) mechanisms in clusters. The approaches to QoS in the wide-area networking context are not suitable for clusters because of the high overheads. Also, contention between flows at the end-nodes has not been addressed earlier. In this paper, we explore the use of "rate control" as a means for proportional bandwidth allocation in clusters. A NIC-based solution is presented, with details on implementation in Myrinet/GM. Experimental results show that rate control can handle both end-node and network contention, without adding significant overhead. Our approach is particularly attractive since it does not require hardware modifications, and can hence work with commodity systems with programmable NICs.
Parallel computation of unsteady, free-surface flow applications are performed using stabilized finite element method. The finite element formulations are written for fix meshes and are based on the Navier-Stokes equa...
详细信息
ISBN:
(纸本)0769509908
Parallel computation of unsteady, free-surface flow applications are performed using stabilized finite element method. The finite element formulations are written for fix meshes and are based on the Navier-Stokes equations and an advection equation governing the motion of the interface function. To increase the accuracy of the method, an interface-sharpening/mass conservation algorithm is designed. The method has been implemented on the CRAY T3E and also IBM SP/6000 using the MPI libraries. We show the effectiveness of the method in simulating complex 3D costal and hydraulic applications such as flow in open channels, wave formation and wave interaction with ships in motion. Some simulations are performed on unstructured meshes with 200 million tetrahedral elements.
Parallel computation of unsteady, free-surface flow applications are performed using stabilized finite element method. The finite element formulations are written for fix meshes and are based on the Navier-Stokes equa...
详细信息
暂无评论