This study characterizes the performance of a variant of UNIX SVR4 on a large shared-memory multiprocessor and analyzes the effects of possible OS and architectural changes. We use a nonintrusive cache miss monitor to...
详细信息
For a video-on-demand computer system we propose a scheme which balances the load on the disks, thereby helping to solve a performance problem crucial to achieving maximal video throughput. Our load balancing scheme c...
详细信息
ISBN:
(纸本)9780897916950
For a video-on-demand computer system we propose a scheme which balances the load on the disks, thereby helping to solve a performance problem crucial to achieving maximal video throughput. Our load balancing scheme consists of two stages. The static stage determines good assignments of videos to groups of striped disks. The dynamic phase uses these assignments, and features a DASD dancing algorithm which performs real-time disk scheduling in an effective manner. Our scheme works synergistically with disk striping. We examine the performance of the DASD dancing algorithm via simulation experiments.
Recent technological advances have made multimedia on-demand services, such as home entertainment and home-shopping, important to the consumer market. One of the most challenging aspects of this type of service is pro...
详细信息
ISBN:
(纸本)9780897916950
Recent technological advances have made multimedia on-demand services, such as home entertainment and home-shopping, important to the consumer market. One of the most challenging aspects of this type of service is providing access either instantaneously or within a small and reasonable latency upon request. In this paper, we discuss a novel approach, termed adaptive piggybacking, which can be used to provide on-demand or nearly-on-demand service and at the same time reduce the I/O demand on the multimedia storage server.
We explore processor-cache affinity scheduling of parallel network protocol processing in a setting in which protocol processing executes on a shared-memory multiprocessor concurrently with a general workload of non-p...
详细信息
ISBN:
(纸本)9780897916950
We explore processor-cache affinity scheduling of parallel network protocol processing in a setting in which protocol processing executes on a shared-memory multiprocessor concurrently with a general workload of non-protocol activity. We find that affinity scheduling can significantly reduce the communication delay associated with protocol processing, enabling the host to support a greater number of concurrent streams and to provide a higher maximum throughput to individual streams. In addition, we compare implementations of two parallelization approaches (Locking and Independent Protocol Stacks) with very different caching behaviors.
Talisman is a simulator that models the execution semantics and timing of a multicomputer. Talisman is unique in combining high semantic accuracy, high timing accuracy, portability, and good performance. This good per...
详细信息
ISBN:
(纸本)9780897916950
Talisman is a simulator that models the execution semantics and timing of a multicomputer. Talisman is unique in combining high semantic accuracy, high timing accuracy, portability, and good performance. This good performance allows users to run significant programs on large simulated multicomputers. The combination of high accuracy and good performance yields an ideal tool for evaluating architectural trade-offs. Talisman models the semantics of virtual memory, a circuit-switched internode interconnect, I/O devices, and instruction execution in both user and supervisor modes. It also models the timing of processor pipelines, caches, local memory buses, and a circuit-switched interconnect. Talisman executes the same program binary images as a hardware prototype at a cost of about 100 host instructions per simulated instruction. On a suite of accuracy benchmarks run on the hardware and the simulator, Talisman and the prototype differ in reported running times by only a few percent.
This paper describes the active memory abstraction for memory-system simulation. In this abstraction---designed specifically for on-the-fly simulation, memory references logically invoke a user-specified function depe...
详细信息
ISBN:
(纸本)9780897916950
This paper describes the active memory abstraction for memory-system simulation. In this abstraction---designed specifically for on-the-fly simulation, memory references logically invoke a user-specified function depending upon the reference's type and accessed memory block state. Active memory allows simulator writers to specify the appropriate action on each reference, including "no action" for the common case of cache hits. Because the abstraction hides implementation details, implementations can be carefully tuned for particular platforms, permitting much more efficient on-the-fly simulation than the traditional trace-driven *** SPARC implementation, Fast-Cache, executes simple data cache simulations two or three times faster than a highly-tuned trace-driven simulator and only 2 to 7 times slower than the original program. Fast-Cache implements active memory by performing a fast table look up of the memory block state, taking as few as 3 cycles on a SuperSPARC for the no-action case. modeling the effects of Fast-Cache's additional lookup instructions qualitatively shows that Fast-Cache is likely to be the most efficient simulator for miss ratios between 3% and 40%.
Redundant disk arrays are an increasingly popular way to improve I/O system performance. Past research has studied how to stripe data in non-redundant (RAID Level 0) disk arrays, but none has yet been done on how to s...
详细信息
ISBN:
(纸本)9780897916950
Redundant disk arrays are an increasingly popular way to improve I/O system performance. Past research has studied how to stripe data in non-redundant (RAID Level 0) disk arrays, but none has yet been done on how to stripe data in redundant disk arrays such as RAID Level 5, or on how the choice of striping unit varies with the number of disks. Using synthetic workloads, we derive simple design rules for striping data in RAID Level 5 disk arrays given varying amounts of workload information. We then validate the synthetically derived design rules using real workload traces to show that the design rules apply well to real *** find no difference in the optimal striping units for RAID Level 0 and 5 for read-intensive workloads. For write-intensive workloads, in contrast, the overhead of maintaining parity causes full-stripe writes (writes that span the entire error-correction group) to be more efficient than read-modify writes or reconstruct writes. This additional factor causes the optimal striping unit for RAID Level 5 to be four times smaller for write-intensive workloads than for read-intensive *** next investigate how the optimal striping unit varies with the number of disks in an array. We find that the optimal striping unit for reads in a RAID Level 5 varies inversely to the number of disks, but that the optimal striping unit for writes varies with the number of disks. Overall, we find that the optimal striping unit for workloads with an unspecified mix of reads and writes is independent of the number of ***, these trends lead us to recommend (in the absence of specific workload information) that the striping unit over a wide range of RAID Level 5 disk array sizes be equal to 1/2 * average positioning time * disk transfer rate.
Today large amounts of data are stored on tertiary storage media such as magnetic tapes and optical disks. DBMSs typically operate only on magnetic disks since they know how to maneuver disks and how to optimize acces...
详细信息
ISBN:
(纸本)9780897916950
Today large amounts of data are stored on tertiary storage media such as magnetic tapes and optical disks. DBMSs typically operate only on magnetic disks since they know how to maneuver disks and how to optimize accesses on them. Tertiary devices present a problem for DBMSs since these devices have dismountable media and have very different operational characteristics compared to magnetic disks. For instance, most tape drives offer very high capacity at low cost but are accessed sequentially, involve lengthy latencies, and deliver lower bandwidth. Typically, the scope of a DBMS's query optimizer does not include tertiary devices, and the DBMS might not even know how to control and operate upon tertiary-resident data. In a three-level hierarchy of storage devices (main memory, disk, tape), the typical solution is to elevate tape-resident data to disk devices, thus bringing such data into the DBMS' control, and then to perform the required operations on disk. This requires additional space on disk and may not give the lowest response time possible. With this challenge in mind, we studied the trade-offs between memory and disk requirements and the execution time of a join with the help of two well-known join methods. The conventional, disk-based Nested Block Join and Hybrid Hash Join were modified to operate directly on tapes. An experimental implementation of the modified algorithms gave us more insight into how the algorithms perform in practice. Our performance analysis shows that a DBMS desiring to operate on tertiary storage will benefit from special algorithms that operate directly on tape-resident data and take into account and exploit the mismatch in disk and tape characteristics.
暂无评论