Work on scheduling concurrent transactions in real-time databases must address two issues: (i) synchronization of the corresponding tasks' accesses to shared data items (ii) guaranteeing timing requirements of the...
详细信息
Work on scheduling concurrent transactions in real-time databases must address two issues: (i) synchronization of the corresponding tasks' accesses to shared data items (ii) guaranteeing timing requirements of the transactions. In this paper, first, we present a concurrency control protocol for real-time databases in a uniprocessor system. In this protocol, we consider the system characteristics to be dynamic. This is in contrast to the priority ceiling protocol and most of the work in scheduling theory where the system workload is assumed to be static and predetermined. Priorities are assigned dynamically to transactions using the well-known strategy Earliest Deadline first. The protocol is proven to avoid deadlocks. The blocking duration arising from mutual exclusion of shared resources is bounded under this protocol. Schedulability analysis for dynamically occurring transactions is provided. Next, we extend the protocol for distributed databases in a shared memory multiprocessor system. The protocol for distributed databases is shown to have the properties of the uniprocessor protocol.< >
In locking-based concurrency control algorithms for distributed databases, two basic strategies are usually used to control the allocation and deallocation of locks to transactions: centralized locking (CL) and distri...
详细信息
In locking-based concurrency control algorithms for distributed databases, two basic strategies are usually used to control the allocation and deallocation of locks to transactions: centralized locking (CL) and distributed locking (DL). Recently, there has been some debate on which strategy is better. Although previous work has shown that in a failure-free environment, the CL algorithm performs better than the DL version, there are still some doubts about the performance of the CL algorithm in an environment where failures could occur. Thus, in this paper, the performance of a resilient CL algorithm is compared with that of a resilient DL algorithm in an environment where site failures could occur. The results show that the resilient CL algorithm still outperforms the resilient DL algorithm in terms of mean response time of transactions, resource utilization and communication cost. In addition, it is shown that the reliability of the resilient CL algorithm is improved considerably and is comparable to that of the resilient distributed version when an election protocol is used to rapidly elect a new central site whenever the central site fails.
The problem of updating materialized views in distributed database systems is discussed. An architecture and detailed procedures to update a collection of remote views with arbitrary refresh times by using a single di...
详细信息
The problem of updating materialized views in distributed database systems is discussed. An architecture and detailed procedures to update a collection of remote views with arbitrary refresh times by using a single differential file are prescribed. The efficiency of the update procedure is enhanced by adopting a multiple-query optimization approach and by introducing a powerful prescreening procedure to eliminate differential tuples. It is shown that even for a single remote view there are many instances where the presented update procedure performs better (with respect to total I/O and communication costs) than existing methods.< >
Analysing large amounts of biomedical data is the new challenge in the post-genomic era. One of the goals in gene research is the computation of the similarity between diseases based on the genes they are related to. ...
详细信息
ISBN:
(纸本)9781479921393
Analysing large amounts of biomedical data is the new challenge in the post-genomic era. One of the goals in gene research is the computation of the similarity between diseases based on the genes they are related to. Identifying biomedical relationships between diseases can lead to finding of new drugs and medicaments. The human disease network (Diseasome) illustrates the association between diseases based on genes these diseases share. A disadvantage of this network is the data itself, as Diseasome is based only on a single database (OMIM). There exist, however, a large number of other biomedical databases, and integrating them, in order to be able to profit from all their data, is an impossible task. Thus, we propose a different approach, namely, to focus only on the integration of the knowledge of all these databases. In our approach, we extend Diseasome by integrating the knowledge from other distributed databases, without needing to integrate the data itself. To compute the similarity between diseases we apply data mining techniques.
As distributed databases expand in popularity, there is ever-growing research into new database architectures that are designed from the start with built-in self-tuning and self-healing features. In real world deploym...
详细信息
ISBN:
(数字)9781728177281
ISBN:
(纸本)9781728177298
As distributed databases expand in popularity, there is ever-growing research into new database architectures that are designed from the start with built-in self-tuning and self-healing features. In real world deployments, however, migration to these entirely new systems is impractical and the challenge is to keep massive fleets of existing databases available under constant software and hardware change. Apache Cassandra is one such existing database that helped to popularize "scale-out" distributed databases and it runs some of the largest existing deployments of any open-source distributed *** this paper, we demonstrate the techniques needed to transform the typical, highly manual, Apache Cassandra deployment into a self-healing system. We start by composing specialized agents together to surface the needed signals for a self-healing deployment and to execute local actions. Then we show how to combine the signals from the agents into the cluster level control-planes required to safely iterate and evolve existing deployments without compromising database availability. Finally, we show how to create simulated models of the database's behavior, allowing rapid iteration with minimal risk. With these systems in place, it is possible to create a truly self-healing database system within existing large-scale Apache Cassandra deployments.
The problem of heterogeneous distributed databases is discussed, and a uniform logical interface to the end user from a collection of different environments is proposed. The logical interface will allow a relational v...
详细信息
The problem of heterogeneous distributed databases is discussed, and a uniform logical interface to the end user from a collection of different environments is proposed. The logical interface will allow a relational view to any database model in a distributed system. The general structure of the basic elements of the model supporting the proposed interface are given. Fault-tolerance aspects of the system are discussed, and error-detection techniques and recovery protocols are described.< >
The balance between privacy and utility is a classical problem with an increasing impact on the design of modern information systems. On the one side it is crucial to ensure that sensitive information is properly prot...
详细信息
ISBN:
(纸本)9781424444816
The balance between privacy and utility is a classical problem with an increasing impact on the design of modern information systems. On the one side it is crucial to ensure that sensitive information is properly protected;on the other side, the impact of protection on the workload must be limited as query efficiency and system performance remain a primary requirement. We address this privacy/efficiency balance proposing an approach that, starting from a flexible definition of confidentiality constraints on a relational schema, applies encryption on information in a parsimonious way and mostly relies on fragmentation to protect sensitive associations among attributes. Fragmentation is guided by workload considerations so to minimize the cost of executing queries over fragments. We discuss the minimization problem when fragmenting data and provide a heuristic approach to its solution.
Association rule mining, which is a data mining technique, finds interesting association or correlation relationships among a large set of data items. Current association rule mining tasks can only be accomplished suc...
详细信息
Association rule mining, which is a data mining technique, finds interesting association or correlation relationships among a large set of data items. Current association rule mining tasks can only be accomplished successfully only in a distributed setting, which will require integration of knowledge generated from the multiple data sites. Most existing architectures for mining in such circumstances require massive movement of data resulting in high communication overheads leading to slow response time. These challenges are heightened when we have extremely large data sizes in multiple heterogeneous sites. Moreover, most existing algorithms and architectures are only moderately suitable for real-world scenarios. There is therefore an urgent need for improved architectures that will explore the capabilities of software agents' paradigms in order to improve on the existing systems. This work therefore introduces an adaptive architectural framework that mines association rules across multiple data sites, and more importantly the architecture adapts to changes in the updated database giving special considerations to the incremental database with the X-Apriori algorithm. The results integration agent also adapts to changes in the results sites considering the size of the agents; size of intermediate results; bandwidth, and other computational resources at the data servers. The proposed system promises to reduce communication and interpretation costs, improve autonomy and efficiency of distributed association rule mining tasks.
Designing distributed databases involves determining how data sets are to be partitioned and spread over multiple sites in a network in order to achieve good performance. This paper presents a methodology for partitio...
详细信息
Designing distributed databases involves determining how data sets are to be partitioned and spread over multiple sites in a network in order to achieve good performance. This paper presents a methodology for partitioning and allocating data while designing distributed databases. In a relational database environment, the knowledge available in the database schema such as identifier domains (primary keys, foreign keys) defines semantic relationships within and between the relations in the schema. This semantic knowledge is used along with use knowledge such as user views to form the design alternatives. These can subsequently be evaluated using performance measures like response time or total processing cost.< >
暂无评论