this paper describes the design, implementation, and evaluation of TBBT, the first comprehensive NFS trace replay tool. Given an NFS trace, TBBT automatically detects and repairs missing operations in the trace, deriv...
this paper describes the design, implementation, and evaluation of TBBT, the first comprehensive NFS trace replay tool. Given an NFS trace, TBBT automatically detects and repairs missing operations in the trace, derives a file system image required to successfully replay the trace, ages the file system image appropriately, initializes the file server under test withthat image, and finally drives the file server with a workload that is derived from replaying the trace according to user-specified parameters. TBBT can scale a trace temporally or spatially to meet the need of a simulation run without violating dependencies among file system operations in the trace.
We present the WEAVER codes, new families of simple highly fault tolerant XOR-based erasure codes for storage systems (with fault tolerance up to 12). the design features of WEAVER codes are (a) placement of data and ...
We present the WEAVER codes, new families of simple highly fault tolerant XOR-based erasure codes for storage systems (with fault tolerance up to 12). the design features of WEAVER codes are (a) placement of data and parity blocks on the same strip, (b) constrained parity in-degree and (c) balance and symmetry. these codes are in general not maximum distance separable (MDS) but have optimal storage efficiency among all codes with constrained parity in-degree. though applicable to RAID controller systems, the WEAVER codes are probably best suited in dRAID systems (distributed Redundant Arrangement of Independent Devices). We discuss the advantages these codes have over many other erasure codes for storage systems.
Due to their non-deterministic nature, Time of Check To Time of Use (TOCTTOU) vulnerabilities in Unix-style file systems (e.g., Linux) are difficult to find and prevent. We describe a comprehensive model of TOCTTOU vu...
详细信息
Due to their non-deterministic nature, Time of Check To Time of Use (TOCTTOU) vulnerabilities in Unix-style file systems (e.g., Linux) are difficult to find and prevent. We describe a comprehensive model of TOCTTOU vulnerabilities, enumerating 224 file system call pairs that may lead to successful TOCTTOU attacks. Based on this model, we built kernel monitoring tools that confirmed known vulnerabilities and discovered new ones (in often-used system utilities such as rpm, vi, and emacs). We evaluated the probability of successfully exploiting these newly discovered vulnerabilities and analyzed in detail the system events during such attacks. Our performance evaluation shows that the dynamic monitoring of system calls introduces non-negligible overhead in microbenchmark of those file system calls, but their impact on application benchmarks such as Andrew and PostMark is only a few percent.
Consistency, throughput, and scalability form the backbone of a cluster-based parallel file system. With little or no information about the workloads to be supported, a file system designer has to often make a one-glo...
Consistency, throughput, and scalability form the backbone of a cluster-based parallel file system. With little or no information about the workloads to be supported, a file system designer has to often make a one-glove-fits-all decision regarding the consistency policies. Taking a hard stance on consistency demotes throughput and scalability to second-class status, having to make do with whatever leeway is available. Leaving the choice and granularity of consistency policies to the user at open/mount time provides an attractive way of providing the best of all worlds. We present the design and implementation of such a file-store, CAPFS (Content Addressable Parallel file System), that allows the user to define consistency semantic policies at runtime. A client-side plug-in architecture based on user-written plug-ins leaves the choice of consistency policies to the end-user. the parallelism exploited by use of multiple data stores provides for bandwidth and scalability. We provide extensive evaluations of our prototype file system on a concurrent read/write workload and a parallel tiled visualization code.
Most desktop search systems maintain per-user indices to keep track of file contents. In a multi-user environment, this is not a viable solution, because the same file has to be indexed many times, once for every user...
Most desktop search systems maintain per-user indices to keep track of file contents. In a multi-user environment, this is not a viable solution, because the same file has to be indexed many times, once for every user that may access the file, causing both space and performance problems. Having a single system-wide index for all users, on the other hand, allows for efficient indexing but requires special security mechanisms to guarantee that the search results do not violate any file *** present a security model for full-text file system search, based on the UNIX security model, and discuss two possible implementations of the model. We show that the first implementation, based on a postprocessing approach, allows an arbitrary user to obtain information about the content of files for which he does not have read permission. the second implementation does not share this problem. We give an experimental performance evaluation for both implementations and point out query optimization opportunities for the second one.
Years of innovation in file systems have been highly successful in improving their performance and functionality, but at the cost of complicating their interaction withthe disk. A variety of techniques exist to ensur...
Years of innovation in file systems have been highly successful in improving their performance and functionality, but at the cost of complicating their interaction withthe disk. A variety of techniques exist to ensure consistency and integrity of file system data, but the precise set of correctness guarantees provided by each technique is often unclear, making them hard to compare and reason about. the absence of a formal framework has hampered detailed verification of file system *** present a logical framework for modeling the interaction of a file system withthe storage system, and show how to apply the logic to represent and prove correctness properties. We demonstrate that the logic provides three main benefits. First, it enables reasoning about existing file system mechanisms, allowing developers to employ aggressive performance optimizations without fear of compromising correctness. Second, the logic simplifies the introduction and adoption of new file system functionality by facilitating rigorous proof of their correctness. Finally, the logic helps reason about smart storage systems that track semantic information about the file system.A key aspect of the logic is that it enables incremental modeling, significantly reducing the barrier to entry in terms of its actual use by file system designers. In general, we believe that our framework transforms the hitherto esoteric and error-prone "art" of file system design into a readily understandable and formally verifiable process.
Proper data placement schemes based on erasure correcting code are one of the most important components for a highly available data storage system. For such schemes, low decoding complexity for correcting (or recoveri...
Proper data placement schemes based on erasure correcting code are one of the most important components for a highly available data storage system. For such schemes, low decoding complexity for correcting (or recovering) storage node failures is essential for practical systems. In this paper, we describe a new coding scheme, which we call the STAR code, for correcting triple storage node failures (erasures). the STAR code is an extension of the double-erasure-correcting EVENODD code, and a modification of the generalized triple-erasure-correcting EVENODD code. the STAR code is an MDS code, and thus is optimal in terms of node failure recovery capability for a given data redundancy. We provide detailed STAR code's decoding algorithms for correcting various triple node failures. We show that the decoding complexity of the STAR code is much lower than those of the existing comparable codes, thus the STAR code is practically very meaningful for storage systems that need higher reliability
Replaying traces is a time-honored method for benchmarking, stress-testing, and debugging systems--and more recently--forensic analysis. One benefit to replaying traces is the reproducibility of the exact set of opera...
Replaying traces is a time-honored method for benchmarking, stress-testing, and debugging systems--and more recently--forensic analysis. One benefit to replaying traces is the reproducibility of the exact set of operations that were captured during a specific workload. Existing trace capture and replay systems operate at different levels: network packets, disk device drivers, network file systems, or system calls. System call replayers miss memory-mapped operations and cannot replay I/O-intensive workloads at original speeds. Traces captured at other levels miss vital information that is available only at the file system *** designed and implemented Replayfs, the first system for replaying file system traces at the VFS level. the VFS is the most appropriate level for replaying file system traces because all operations are reproduced in a manner that is most relevant to file-system developers. thanks to the uniform VFS API, traces can be replayed transparently onto any existing file system, even a different one than the one originally traced, without modifying existing file systems. Replayfs's user-level compiler prepares a trace to be replayed efficiently in the kernel where multiple kernel threads prefetch and schedule the replay of file system operations precisely and efficiently. these techniques allow us to replay I/O-intensive traces at different speeds, and even accelerate them on the same hardware that the trace was captured on originally.
No single encoding scheme or fault model is optimal for all data. A versatile storage system allows them to be matched to access patterns, reliability requirements, and cost goals on a per-data item basis. Ursa Minor ...
No single encoding scheme or fault model is optimal for all data. A versatile storage system allows them to be matched to access patterns, reliability requirements, and cost goals on a per-data item basis. Ursa Minor is a cluster-based storage system that allows data-specific selection of, and on-line changes to, encoding schemes and fault models. thus, different data types can share a scalable storage infrastructure and still enjoy specialized choices, rather than suffering from "one size fits all." Experiments with Ursa Minor show performance benefits of 2-3× when using specialized choices as opposed to a single, more general, configuration. Experiments also show that a single cluster supporting multiple workloads simultaneously is much more efficient when the choices are specialized for each distribution rather than forced to use a "one size fits all" configuration. When using the specialized distributions, aggregate cluster throughput nearly doubled.
Currently, the fields of impact analysis and policy based management are two important storage management topics that are not being treated in an integrated manner. Policy-based storage management is being adopted by ...
Currently, the fields of impact analysis and policy based management are two important storage management topics that are not being treated in an integrated manner. Policy-based storage management is being adopted by most storage vendors because it lets system administrators specify high level policies and moves the complexity of enforcing these policies to the underlying management software. Similarly, proactive impact analysis is becoming an important aspect of storage management because system administrators want to assess the impact of making a change before actually making it. Impact analysis is increasingly becoming a complex task when one is dealing with a large number of devices and workloads. Adding the policy dimension to impact analysis (that is, what policies are being violated due to a particular action) makes this problem even more *** this paper we describe a new framework and a set of optimization techniques that combine the fields of impact analysis and policy management. In this framework system administrators define policies for performance, interoperability, security, availability, and then proactively assess the impact of desired changes on boththe system observables and policies. Additionally, the proposed optimizations help to reduce the amount of data and the number of policies that need to be evaluated. this improves the response time of impact analysis operations. Finally, we also propose a new policy classification scheme that classifies policies based on the algorithms that can be used to optimize their evaluation. Such a classification is useful in order to efficiently evaluate user-defined policies. We present an experimental study that quantitatively analyzes the framework and algorithms on real life storage area network policies. the algorithms presented in this paper can be leveraged by existing impact analysis and policy engine tools.
暂无评论