FaSST is an RDMA-based system that provides distributed in-memory transactions with serializability and durability. Existing RDMA-based transaction processing systems.use one-sided RDMA primitives for their ability to...
详细信息
ISBN:
(纸本)9781931971331
FaSST is an RDMA-based system that provides distributed in-memory transactions with serializability and durability. Existing RDMA-based transaction processing systems.use one-sided RDMA primitives for their ability to bypass the remote CPU. This design choice brings several drawbacks. First, the limited flexibility of one-sided RDMA reduces performance and increases software complexity when designing distributed data stores. second, deep-rooted technical limitations of RDMA hardware limit scalability in large clusters. FaSST eschews one-sided RDMA for fast RPCs using two-sided unreliable datagrams, which we show drop packets extremely rarely on modern RDMA networks. This approach provides better performance, scalability, and simplicity, without requiring expensive reliability mechanisms in software. In comparison with published numbers, FaSST outperforms FaRM on the TATP benchmark by almost 2x while using close to half the hardware resources, and it outperforms DrTM+R on the SmallBank benchmark by around 1.7x without making data locality assumptions.
The new challenging era of scientific data management in the coming decade named "Big Data" requires giant complexes for distributed computing and corresponding grid-cloud internet services. Known common app...
详细信息
The new challenging era of scientific data management in the coming decade named "Big Data" requires giant complexes for distributed computing and corresponding grid-cloud internet services. Known common approaches to softwarereliability based on the probability theory or on considering software as an open non-equilibrium dynamic system cannot conform to advanced grid-cloud software management systems. Therefore to provide the optimality and reliability of such sophisticated systems.we choose the imitative simulation method oriented on a knowledge of dynamics of the system functioning. A new grid and cloud service simulation system was developed in the JINR Dubna laboratory of information technologies which focused on improving the efficiency and reliability of the grid-cloud systems.development by using work quality indicators of some real system to design and predict its evolution. For these purposes the simulation program is combined with real monitoring system of the gridcloud service through a special database. Some examples of the program applications to simulate a sufficiently general cloud structure, which can be used for more common purposes, are given.
database Management systems.(DBMS) are used by software applications, to store, manipulate, and retrieve large sets of data. However, the requirements of current softwaresystems.pose various challenges to established...
详细信息
ISBN:
(纸本)9781450326278
database Management systems.(DBMS) are used by software applications, to store, manipulate, and retrieve large sets of data. However, the requirements of current softwaresystems.pose various challenges to established DBMS. First, most softwaresystems.organize their data by means of objects rather than relations leading to increased maintenance, redundancy, and transformation overhead when persisting objects to relational databases. second, complex objects are separated into several objects resulting in Object Schizophrenia and hard to persist distributed State. Last but not least, current softwaresystems.have to cope with increased complexity and changes. These challenges have lead to a general paradigm shift in the development of softwaresystems. Unfortunately, classical DBMS will become intractable, if they are not adapted to the new requirements imposed by these softwaresystems. As a result, we propose an extension of DBMS with roles to represent complex objects within a relational database and support the flexibility required by current softwaresystems. To achieve this goal, we introduces RSQL, an extension to SQL with the concept of objects playing roles when interacting with other objects. Additionally, we present a formal model for the logical representation of roles in the extended DBMS.
Non-determinism in concurrent or distributedsoftwaresystems.(i.e., various possible execution orders among different distributed components) presents new challenges to the existing reliability analysis methods based...
详细信息
ISBN:
(纸本)9781450330565
Non-determinism in concurrent or distributedsoftwaresystems.(i.e., various possible execution orders among different distributed components) presents new challenges to the existing reliability analysis methods based on Markov chains. In this work, we present a toolkit RaPiD for the reliability analysis of non-deterministic systems. Taking Markov decision process as reliability model, RaPiD can help in the analysis of three fundamental and rewarding aspects regarding softwarereliability. First, to have reliability assurance on a system, RaPiD can synthesize the overall system reliability given the reliability values of system components. second, given a requirement on the overall system reliability, RaPiD can distribute the reliability requirement to each component. Lastly, RaPiD can identify the component that affects the system reliability most significantly. RaPiD has been applied to analyze several real-world systems.including a financial stock trading system, a proton therapy control system and an ambient assisted living room system. The is available at http://***/cfp/demos
softwaresystems.running continuously for a long time often confront software aging, which is the phenomenon of progressive degradation of execution environment caused by latent software faults. Removal of such faults...
详细信息
ISBN:
(纸本)9781479955848
softwaresystems.running continuously for a long time often confront software aging, which is the phenomenon of progressive degradation of execution environment caused by latent software faults. Removal of such faults in software development process is a crucial issue for system reliability. A known major obstacle is typically the large latency to discover the existence of software aging. We propose a systematic approach to detect software aging which has shorter test time and higher accuracy compared to traditional aging detection via stress testing and trend detection. The approach is based on a differential analysis where a software version under test is compared against a previous version in terms of behavioral changes of resource metrics. A key instrument adopted is a divergence chart, which expresses time-dependent differences between two signals. Our experimental study focuses on memory-leak detection and evaluates divergence charts computed using multiple statistical techniques paired with application-level memory related metrics (RSS and Heap Usage). The results show that the proposed method achieves good performance for memory-leak detection in comparison to techniques widely adopted in previous works (e.g., linear regression, moving average and median).
Today, many distributedsystems.are deployed in high-performance computing environments such as a multi-core architecture or a managed network like a data center. As the new computing architectures require more parall...
Today, many distributedsystems.are deployed in high-performance computing environments such as a multi-core architecture or a managed network like a data center. As the new computing architectures require more parallelism to improve performance and responsiveness, implementing distributed applications that work consistently in parallel architectures without causing any deadlock or data race issues have become a challenging task. Even more, data center applications must handle fault-tolerance as well because random or correlated crash-restart failures can happen in data centers. Many approaches to solve these issues have been proposed independently to make data center applications to be concurrent, fault-tolerant, or both. Popular applications like graph computing systems.or non-relational databasesystems.have their own mechanism to handle concurrency and failures. There are even more generic frameworks that provide both parallelism and fault tolerance in data computing frameworks, message-passing interfaces, and software transactional memory systems. However, making a data center application that works in these generic frameworks may require major restructuring or learning a new paradigm. In this dissertation, we present a solution that provides parallelism, and another solutions that provides fault-tolerance, and both in an event-driven system framework transparently. First, we present InContext, a concurrent event execution model that runs events in parallel by associating access behaviors with the shared variables. second, we present Ken, an uncoordinated rollback recovery protocol for event-driven systems.that can mask crash-restart failures and guarantee composable reliability. We also present MaceKen, integrated with Mace frameworks, that transparently provides crash-restart fault-tolerance for legacy Mace applications. Finally, we propose MultiKen, a combined framework for parallelism and fault-tolerance in event-driven systems.
We consider the problem of reliably broadcasting messages in a multi-hop network where nodes can fail in some unforeseen manner. We consider the most general failure model: the Byzantine model, where failing nodes may...
详细信息
ISBN:
(纸本)9781479955848
We consider the problem of reliably broadcasting messages in a multi-hop network where nodes can fail in some unforeseen manner. We consider the most general failure model: the Byzantine model, where failing nodes may exhibit arbitrary behavior, and actively try to harm the network. Previous approaches dealing with permanent Byzantine failures limit either the number of Byzantine nodes or their density. In dense network, the density criterium is the allowed fraction of Byzantine neighbors per correct node. In sparse networks, density has been defined as the distance between Byzantine nodes. In this context, we first propose a new algorithm for networks whose communication graph can be decomposed into cycles: e.g., a torus can be decomposed into square cycles, a planar graph into polygonal cycles, etc. Our algorithm ensures reliable broadcast when the distance between permanent Byzantine failures is greater than twice the diameter of the largest cycle of the decomposition. Then, we refine the first protocol to make it Byzantine fault tolerant for transient faults (in addition to permanent Byzantine faults). This additional property is guaranteed by means of self-stabilization, which permits to recover from any arbitrary initial state. This arbitrary initial state can be seen as the result of every node being Byzantine faulty for a short period of time (hence the transient qualification). This second protocol thus tolerates permanent (constrained by density) and transient (unconstrained) Byzantine failures. When the maximum degree and cycle diameter are both bounded, both solutions perform in a time that remains proportional to the network diameter.
Full-system emulation has been an extremely useful tool in developing and debugging systems.software like operating systems.and hypervisors. However, current full-system emulators lack the support for deterministic re...
详细信息
ISBN:
(纸本)9781450319225
Full-system emulation has been an extremely useful tool in developing and debugging systems.software like operating systems.and hypervisors. However, current full-system emulators lack the support for deterministic replay, which limits the reproducibility of concurrency bugs that is indispensable for analyzing and debugging the essentially multi-threaded systems.software. This paper analyzes the challenges in supporting deterministic replay in parallel full-system emulators and makes a comprehensive study on the sources of non-determinism. Unlike application-level replay systems. our system, called ReEmu, needs to log sources of non-determinism in both the guest software stack and the dynamic binary translator for faithful replay. To provide scalable and efficient record and replay on multicore machines, ReEmu makes several notable refinements to the CREW protocol that replays shared memory systems. First, being aware of the performance bottlenecks in frequent lock operations in the CREW protocol, ReEmu refines the CREW protocol with a seqlock-like design, to avoid serious contention and possible starvation in instrumentation code tracking dependence of racy accesses on a shared memory object. second, to minimize the required log files, ReEmu only logs minimal local information regarding accesses to a shared memory location, but instead relies on an offline log processing tool to derive precise shared memory dependence for faithful replay. Third, ReEmu adopts an automatic lock clustering mechanism that clusters a set of uncontended memory objects to a bulk to reduce the frequencies of lock operations, which noticeably boost performance. Our prototype ReEmu is based on our open-source COREMU system and supports scalable and efficient record and replay of full-system environments (both x64 and ARM). Performance evaluation shows that ReEmu has very good performance scalability on an Intel multicore machine. It incurs only 68.9% performance overhead on average (ranging
Web applications are increasingly used as portals to interact with back-end databasesystems.and support business processes. This type of data-centric workflow-driven web application is vulnerable to two types of secu...
详细信息
ISBN:
(纸本)9780769547848;9781467323970
Web applications are increasingly used as portals to interact with back-end databasesystems.and support business processes. This type of data-centric workflow-driven web application is vulnerable to two types of security threats. The first is an request integrity attack, which stems from the vulnerabilities in the implementation of business logic within web applications. The second is guideline violation, which stems from privilege misuse in scenarios where business logic and policies are too complex to be accurately defined and enforced. Both threats can lead to sequences of web requests that deviate from typical user behaviors. The objective of this paper is to detect anomalous user behaviors based on the sequence of their requests within a web session. We first decompose web sessions into workflows based on their data objects. In doing so, the detection of anomalous sessions is reduced to detection of anomalous workflows. Next, we apply a hidden Markov model (HMM) to characterize workflows on a per-object basis. In this model, the implicit business logic involved in this object defines the unobserved states of the Markov process, where the web requests are observations. To derive more robust HMMs, we extend the object-specific approach to an object-cluster approach, where objects with similar workflows are clustered and HMM models are derived on a per-cluster basis. We evaluate our models using two real systems. including an open source web application and a large web-based electronic medical record system. The results show that our approach can detect anomalous web sessions and lend evidence to suggest that the clustering approach can achieve relatively low false positive rates while maintaining its detection accuracy.
The current control technology of modern gas turbine engines is based on centralized control architecture of the Full Authority Digital Electronic Controls (FADEC). The concern of such centralized control strategy is ...
详细信息
ISBN:
(纸本)9781622761388
The current control technology of modern gas turbine engines is based on centralized control architecture of the Full Authority Digital Electronic Controls (FADEC). The concern of such centralized control strategy is the high weight ratio of the control system to the weight of the engine and the engine's life cycle cost. Alternatively, distributed control architecture could be adopted as the control choice of future turbine engine to reduce overall system weight, cost, and increase safety and reliability. This paper provides an attempt to the solution of the challenge of transforming current turbine engine control to distributed architecture. Also, the paper presents a methodology used for the development and integration of networked control systems.with diagnostics software in a distributed manner. In this paper, we present two approaches for the design of decentralized controllers. Then, we present realization of the designed controllers in the form of a distributed networked control system. The first approach considered in the design is the H ∞ robust controller design. In this design approach, specifications such as control loop bandwidths are imposed by selecting an appropriate low-pass and high-pass filters for the controller design. The filters are represented by the sensitivity transfer matrices for disturbance rejection and complementary sensitivity transfer matrices for minimizing the control effort for the fuel flow, nozzle area, and rear bypass door control loops. The second approach considered in the design is the H2 optimal controller design. In this approach, Linear Quadratic Gaussian (LQG) design concept is followed where specifications are imposed by selecting appropriate parameters so that the closed loop system has desired response shapes. Finally, the distributed controllers are implemented in the form of networked control system (NCS) where a shared communication medium is used as the backbone of the control system. The fundamental issues in net
暂无评论