We consider the problem of scheduling the rendering component of 3D multimedia applications on a cluster of workstations connected via a local area network. Our goal is to meet a periodic real-time *** abstract terms,...
ISBN:
(纸本)9780897919821
We consider the problem of scheduling the rendering component of 3D multimedia applications on a cluster of workstations connected via a local area network. Our goal is to meet a periodic real-time *** abstract terms, the problem we address is how best to schedule tasks with unpredictable service times on distinct processing nodes so as to meet a real-time deadline, given that all communication among nodes entails some (possibly large) overhead. We consider two distinct classes of schemes, static, in which task reallocations are scheduled to occur at specific times, and dynamic, in which reallocations are triggered by some processor going idle. For both classes we further examine both global reassignments, in which all nodes are rescheduled at a rescheduling moment, and local reassignments, in which only a subset of the nodes engage in rescheduling at any one *** show that global dynamic policies work best over a range of parameterizations appropriate to such systems. We introduce a new policy, Dynamic with Shadowing, that places a small number of tasks in the schedules of multiple workstations to reduce the amount of communication required to complete the schedule. This policy is shown to dominate the other alternatives considered over most of the parameter space.
One role for workload generation is as a means for understanding how servers and networks respond to variation in load. This enables management and capacity planning based on current and projected usage. This paper ap...
ISBN:
(纸本)9780897919821
One role for workload generation is as a means for understanding how servers and networks respond to variation in load. This enables management and capacity planning based on current and projected usage. This paper applies a number of observations of Web server usage to create a realistic Web workload generation tool which mimics a set of real users accessing a server. The tool, called Surge (Scalable URL Reference Generator) generates references matching empirical measurements of 1) server file size distribution; 2) request size distribution; 3) relative file popularity; 4) embedded file references; 5) temporal locality of reference; and 6) idle periods of individual users. This paper reviews the essential elements required in the generation of a representative Web workload. It also addresses the technical challenges to satisfying this large set of simultaneous constraints on the properties of the reference stream, the solutions we adopted, and their associated accuracy. Finally, we present evidence that Surge exercises servers in a manner significantly different from other Web server benchmarks.
We consider broadcast WDM networks operating with schedules that mask the transceiver tuning latency. We develop and analyze a queueing model of the network in order to obtain the queue-length distribution and the pac...
详细信息
ISBN:
(纸本)9780897919821
We consider broadcast WDM networks operating with schedules that mask the transceiver tuning latency. We develop and analyze a queueing model of the network in order to obtain the queue-length distribution and the packet loss probability at the transmitting and receiving side of the nodes. The analysis is carried out assuming finite buffer sizes, non-uniform destination probabilities and two-state MMBP traffic sources; the latter naturally capture the notion of burstiness and correlation, two important characteristics of traffic in high-speed networks. We present results which establish that the performance of the network is a complex function of a number of system parameters, including the load balancing and scheduling algorithms, the number of available channels, and the buffer capacity. We also show that the behavior of the network in terms of packet loss probability as these parameters are varied cannot be predicted without an accurate analysis. Our work makes it possible to study the interactions among the system parameters, and to predict, explain and fine tune the performance of the network.
We demonstrate that high-level file system events exhibit self-similar behaviour, but only for short-term time scales of approximately under a day. We do so through the analysis of four sets of traces that span time s...
ISBN:
(纸本)9780897919821
We demonstrate that high-level file system events exhibit self-similar behaviour, but only for short-term time scales of approximately under a day. We do so through the analysis of four sets of traces that span time scales of milliseconds through months, and that differ in the trace collection method, the filesystems being traced, and the chronological times of the tracing. Two sets of detailed, short-term file system trace data are analyzed; both are shown to have self-similar like behaviour, with consistent Hurst parameters (a measure of self-similarity) for all file system traffic as well as individual classes of file system events. Long-term file system trace data is then analyzed, and we discover that the traces' high variability and self-similar behaviour does not persist across time scales of days, weeks, and months. Using the short-term trace data, we show that sources of file system traffic exhibit ON/OFF source behaviour, which is characterized by highly variably lengthed bursts of activity, followed by similarly variably lengthed periods of inactivity. This ON/OFF behaviour is used to motivate a simple technique for synthesizing a stream of events that exhibit the same self-similar short-term behaviour as was observed in the file system traces.
This paper presents cooperative prefetching and caching --- the use of network-wide global resources (memories, CPUs, and disks) to support prefetching and caching in the presence of hints of future demands. Cooperati...
ISBN:
(纸本)9780897919821
This paper presents cooperative prefetching and caching --- the use of network-wide global resources (memories, CPUs, and disks) to support prefetching and caching in the presence of hints of future demands. Cooperative prefetching and caching effectively unites disk-latency reduction techniques from three lines of research: prefetching algorithms, cluster-wide memory management, and parallel I/O. When used together, these techniques greatly increase the power of prefetching relative to a conventional (non-global-memory) system. We have designed and implemented PGMS, a cooperative prefetching and caching system, under the Digital Unix operating system running on a 1.28 Gb/sec Myrinet-connected cluster of DEC Alpha workstations. Our measurements and analysis show that by using available global resources, cooperative prefetching can obtain significant speedups for I/O-bound programs. For example, for a graphics rendering application, our system achieves a speedup of 4.9 over a non-prefetching version of the same program, and a 3.1-fold improvement over that program using local-disk prefetching alone.
作者:
Costas CourcoubetisVasilios A. SirisGeorge D. StamoulisDept. of Computer Science
University of Crete and Institute of Computer Science (ICS) Foundation for Research and Technology - Hellas (FORTH) P.O. Box GR 711 10 Heraklion Crete Greece Institute of Computer Science (ICS)
Foundation for Research and Technology - Hellas (FORTH) P.O. Box GR 711 10 Heraklion Crete Greece Dept. of Computer Science
University of Crete and Institute of Computer Science (ICS) Foundation for Research and Technology - Hellas (FORTH) P.O. Box GR 711 10 Heraklion Crete Greece
Accurate yet simple methods for traffic engineering are important for efficient dimensioning of broadband networks. The goal of this paper is to apply and evaluate large deviation techniques for traffic engineering. I...
详细信息
ISBN:
(纸本)9780897919821
Accurate yet simple methods for traffic engineering are important for efficient dimensioning of broadband networks. The goal of this paper is to apply and evaluate large deviation techniques for traffic engineering. In particular, we employ the recently developed theory of effective bandwidths, where the effective bandwidth depends not only on the statistical characteristics of the traffic stream, but also on a link's operating point through two parameters, the space and time parameters, which are computed using the many sources asymptotic. We show that this effective bandwidth definition can accurately quantify resource usage. Furthermore, we estimate and interpret values of the space and time parameters for various mixes of real traffic demonstrating how these values can be used to clarify the effects on the link performance of the time scales of burstiness of the traffic input, of the link parameters (capacity and buffer), and of traffic control mechanisms, such as traffic shaping. Our approach relies on off-line analysis of traffic traces, the granularity of which is determined by the time parameter of the link, and our experiments involve a large set of MPEG-1 compressed video and Internet Wide Area Network (WAN) traces, as well as modeled voice traffic.
The sharing of a common buffer by TCP data segments and acknowledgments in a network or internet has been known to produce the effect of ack compression, often causing dramatic reductions in throughput. We study sever...
ISBN:
(纸本)9780897919821
The sharing of a common buffer by TCP data segments and acknowledgments in a network or internet has been known to produce the effect of ack compression, often causing dramatic reductions in throughput. We study several schemes for improving the performance of two-way TCP traffic over asymmetric links where the bandwidths in the two directions may differ substantially, possibly by many orders of magnitude. These approaches reduce the effect of ack compression by carefully controlling the flow of data packets and acknowledgments. We first examine a scheme where acknowledgments are transmitted at a higher priority than data. By analysis and simulation, we show that prioritizing acks can lead to starvation of the low-bandwidth connection. Next, we introduce and analyze a connection-level backpressure mechanism designed to limit the maximum amount of data buffered in the outgoing IP queue of the source of the low-bandwidth connection. We show that this approach, while minimizing the queueing delay for acks, results in unfair bandwidth allocation on the slow link. Finally, our preferred solution separates the acks from data packets in the outgoing queue, and makes use of a connection-level bandwidth allocation mechanism to control their bandwidth shares. We show that this scheme overcomes the limitations of the previous approaches, provides isolation, and enables precise control of the connection throughputs. We present analytical models of the dynamic behavior of each of these approaches, derive closed-form expressions for the expected connection efficiencies in each case, and validate them with simulation results.
Internet (IP) address lookup is a major bottleneck in high performance routers. IP address lookup is challenging because it requires a longest matching prefix lookup. It is compounded by increasing routing table sizes...
ISBN:
(纸本)9780897919821
Internet (IP) address lookup is a major bottleneck in high performance routers. IP address lookup is challenging because it requires a longest matching prefix lookup. It is compounded by increasing routing table sizes, increased traffic, higher speed links, and the migration to 128 bit IPv6 addresses. We describe how IP lookups can be made faster using a new technique called controlled prefix expansion. Controlled prefix expansion, together with optimization techniques based on dynamic programming, can be used to improve the speed of the best known IP lookup algorithms by at least a factor of two. When applied to trie search, our techniques provide a range of algorithms whose performance can be tuned. For example, with 1 MB of L2 cache, trie search of the MaeEast database with 38,000 prefixes can be done in a worst case search time of 181 nsec, a worst case insert/delete time of 2.5 msec, and an average insert/delete time of 4 usec. Our actual experiments used 512 KB L2 cache to obtain a worst-case search time of 226 nsec, a worst-case worst case insert/delete time of 2.5 msec and an average insert/delete time of 4 usec. We also describe how our techniques can be used to improve the speed of binary search on prefix lengths to provide a scalable solution for IPv6. Our approach to algorithm design is based on measurements using the VTune tool on a Pentium to obtain dynamic clock cycle counts.
The proceedings contains 26 papers from the 1997 acm Sigmetrics internationalconference on measurement and modeling of computersystems. Topics discussed include: network measurement and modeling;network resource man...
详细信息
The proceedings contains 26 papers from the 1997 acm Sigmetrics internationalconference on measurement and modeling of computersystems. Topics discussed include: network measurement and modeling;network resource management;parallel computersystems;memory management;benchmarking;computing and switching platforms;and storage systems.
The proceedings contain 18 papers. The special focus in this conference is on modeling Techniques and Tools for computer Performance Evaluation. The topics include: A performability modeling environment tool;dependabi...
ISBN:
(纸本)9783540631019
The proceedings contain 18 papers. The special focus in this conference is on modeling Techniques and Tools for computer Performance Evaluation. The topics include: A performability modeling environment tool;dependability evaluation and the optimization of performability;design and implementation of a network computing platform using JAVA;storage alternatives for large structured state spaces;an efficient disk-based tool for solving very large markov models;efficient transient overload tests for real-time systems;towards an analytical tool for performance modeling of ATM networks by decomposition;an embedded network simulator to support network protocols’ development;synchronized two-way voice simulation for internet phone performance analysis and evaluation;processes as language-oriented building blocks of stochastic petri nets;measurement tools and modeling techniques for evaluating WEB server performance;workload characterization of input/output intensive parallel applications;interval based workload characterization for distributed systems;bounding the loss rates in a multistage ATM switch;simple bounds for queues fed by markovian sources and on queue length moments in fork and join queuing networks with general service times.
暂无评论