A technique is described for implementing k-resilient objects, i. e. , distributed objects that remain available, and whose operations are guaranteed to progress to completion, despite up to k site failures. The imple...
详细信息
ISBN:
(纸本)0818605642
A technique is described for implementing k-resilient objects, i. e. , distributed objects that remain available, and whose operations are guaranteed to progress to completion, despite up to k site failures. The implementation is derived from the object specification automatically, and does not require any information beyond what would be required for a nonresilient, nondistributed implementation. It is therefore unnecessary for an applications programmer to have knowledge of the complex protocols normally used to implement fault-tolerant objects. The technique is used in ISIS, a system being developed at Cornell to support resilient objects.
A description is given of Bugnet, a portable Unix system designed to debug distributed programs. It gives the user information about interprocess communication, I/O events, and execution traces of each component proce...
详细信息
ISBN:
(纸本)0818607378
A description is given of Bugnet, a portable Unix system designed to debug distributed programs. It gives the user information about interprocess communication, I/O events, and execution traces of each component process. Bugnet allows the programmer to detect an error situation, roll back to a time in the event sequence before the error, and replay events exactly as they occurred prior to the error. It reproduces real-time execution error sequences with an accuracy of 0. 2s. A graphics interface allows the user to manage process groups and monitor process interactions very conveniently.
作者:
Minoura, ToshimiOregon State Univ
Dep of Computer Science Corvallis OR USA Oregon State Univ Dep of Computer Science Corvallis OR USA
A typical database system maintains target data, which contain information useful for users, and access path data, which facilitate faster accesses to target data. Further, most large databasesystems.support concurre...
详细信息
ISBN:
(纸本)0818605642
A typical database system maintains target data, which contain information useful for users, and access path data, which facilitate faster accesses to target data. Further, most large databasesystems.support concurrent processing of multiple transactions. For a static database system model, where units of concurrency control are not dynamically created or deleted, various concurrency control methods are known. Also, many methods that allow concurrent accesses to indexing structures without invalidating their integrity are known. However, a straightforward integration of these two kinds of concurrency control methods fails because of the phantom problem. The author introduces group locks in order to solve this problem and discusses their implementation. It is shown that if the lowest-level access path data as well as the target data are two-phase locked by transactions, consistency of the logical data will be preserved.
An environment is considered in which interprocess communication is only through messages, and modular redundancy is used at the level of processes to increase availability. It is assumed that failing processes do not...
详细信息
ISBN:
(纸本)0818607378
An environment is considered in which interprocess communication is only through messages, and modular redundancy is used at the level of processes to increase availability. It is assumed that failing processes do not malfunction, i. e. , there is no Byzantine fault. It is also assumed that link failures do not occur. If such replicated processes exercise choice during communication, they have to be coordianted so that the replicas make the same choice. A centralized scheme is presented for this choice coordination problem in which a master makes the choice, which is then obeyed by others. The scheme incorporates protocols for recovering from master failures, and for reintroducing processes after repair. A correctness proof for the scheme is presented.
For performance evaluation of distributedsystems.and concurrency control mechanisms, several concurrent tree algorithms with dynamic, distributed concurrency control have been proposed that combine selective locking ...
详细信息
ISBN:
(纸本)0818605642
For performance evaluation of distributedsystems.and concurrency control mechanisms, several concurrent tree algorithms with dynamic, distributed concurrency control have been proposed that combine selective locking policies and 'built-in' recovery mechanisms in place of more centralized concurrency control. Claims have been made that such algorithms allow a high degree of reliability and an improved degree of concurrency over traditional approaches. Because concurrency control mechanisms regulate the potential interference between concurrent processes, they have a dramatic effect on these aspects of system performance. An evaluation methodology that has been used to test these claims is described. The methodology combines evaluation of a prototype implementation of a concurrent system with simulation of the system on a more ambitious scale using a general-purpose multiprocessor simulation system. Results for the particular system on the degree of concurrency, degree of interference, and frequency of recovery generally supports the performance claims for the distributed concurrency-control mechanisms.
Several protocols that ensure that a database can be recovered to a consistent state after a transaction failure or system crash are compared. The study includes a collection of simple analytic models, based on Markov...
详细信息
ISBN:
(纸本)0818605642
Several protocols that ensure that a database can be recovered to a consistent state after a transaction failure or system crash are compared. The study includes a collection of simple analytic models, based on Markov processes, for these protocols and some surprising results on the relative performance of the protocols. The authors consider only two-stage transactions (all reads before writes) and ignore effects of serializing transactions. The most interesting performance result presented is that, for systems.obeying the assumptions of this paper, the 'pessimistic' policy of holding write locks to commit point is considerably less efficient than the 'optimistic' policy that allows reading of uncommitted data but risks cascading aborts. A multiversion policy was also studied and found always to be nearly as good as the optimistic policy and sometimes much better.
作者:
McKendry, Martin S.Georgia Inst of Technology
Sch of Information & Computer Science Atlanta GA USA Georgia Inst of Technology Sch of Information & Computer Science Atlanta GA USA
Examples to illustrate requirements for ordering mechanisms are introduced. A model of nested actions is then used as a basis for categorizing visibility requirements. These requirements go beyond those typical of dat...
详细信息
ISBN:
(纸本)0818605642
Examples to illustrate requirements for ordering mechanisms are introduced. A model of nested actions is then used as a basis for categorizing visibility requirements. These requirements go beyond those typical of databasesystems. because often the entities managed by operating systems.cannot be recovered if an action fails. Several simplifications that apply to many operating system problems are discussed. Algorithms for controlling ordering are then presented, with examples of their use. Several expediencies that result from ordering requirements are established. In many situations, recovery for nested actions can be implemented with a single backup copy of each item, a single synchronization variable can be used to control blocking, and generalized locking is not required. These savings appear to be fundamental to making the object-action approach viable for operating system construction.
distributed deep reinforcement learning(DDRL) has been used in distributedsystems.to better improve the adaptability. However, DDRL-based systems.are also inevitably under the threat of Byzantine workers. There is an...
详细信息
ISBN:
(纸本)9781665451321
distributed deep reinforcement learning(DDRL) has been used in distributedsystems.to better improve the adaptability. However, DDRL-based systems.are also inevitably under the threat of Byzantine workers. There is an urgent need to enhance the resilience of the DDRL-based system against Byzantine failures. This paper proposes a resilient mechanism for mitigating the influence of Byzantine workers on DDRL-based systems. First, we formalize the DDRL-based system as a multi-armed bandit model for well capturing the collective effect of workers on the whole learning process, and then transforming the resilient mechanism design problem into the sampling policy optimization problem. Second, we propose a self-adaptation process for filtering out the harmful data generated by Byzantine workers and theoretically give a mathematical analysis of the understanding, demonstrating its effectiveness under ideal conditions. third, based on a typical DDRL-based system (i.e., Asynchronous Advantage Actor-Critic, A3C), we implement a resilient distributed A3C (ReD-A3C). With extensive experiments on the DDRL benchmark tasks, we show that ReD-A3C outperforms available Byzantine tolerant approaches.
One way for processes residing on different machines to identify each other and establish communication is through a special connection service. A connection service for Berkeley Unix has been implemented that is reli...
详细信息
ISBN:
(纸本)0818607378
One way for processes residing on different machines to identify each other and establish communication is through a special connection service. A connection service for Berkeley Unix has been implemented that is reliable, available, secure, and easy to use. The connection service achieves ease of use through a simple interface based on the library routine meet. Meet allows one process to connect to another by specifying arbitrary names for itself and the other process. The connection service imposes no naming conventions of its own so it can be used with most name spaces and naming services. The service is location-transparent. It also provides a routine for posting services. Reliable and available service is provided by replicating connection servers.
A controversial point in designing a distributed system is whether the user or the system should be responsible for taking actions as a consequence of system failures. The author proposes a dynamic configuration schem...
详细信息
ISBN:
(纸本)0818606908
A controversial point in designing a distributed system is whether the user or the system should be responsible for taking actions as a consequence of system failures. The author proposes a dynamic configuration scheme for runtime reconfiguration of application software that is more flexible than existing proposals. A description is given of a reconfigurable scheme implemented by the operating system which only requires that software components be virtually connected. The authors demonstrate that the scheme increases the reliability and availability of distributedsystems. and compare and contrast this scheme with other similar proposals.
暂无评论