Designing applications for use in a hybrid cloud has many features. These include dynamic virtualization management and an unknown route switching customers. This makes it impossible to evaluate the query and hence th...
详细信息
Designing applications for use in a hybrid cloud has many features. These include dynamic virtualization management and an unknown route switching customers. This makes it impossible to evaluate the query and hence the optimal distribution of data. In this paper, we formulate the main challenges of designing and simulation offer installation for processing.
Integrating multiple databases that are distributed among different data owners can be beneficial in numerous contexts of statistical analysis. Unfortunately, the actual sharing of data is often impeded by concerns ab...
详细信息
Integrating multiple databases that are distributed among different data owners can be beneficial in numerous contexts of statistical analysis. Unfortunately, the actual sharing of data is often impeded by concerns about data confidentiality. A situation like this requires tools that can produce correct results while minimizing risk of disclosure. Over the past ten years a number of "secure'' protocols have been proposed to solve specific statistical problems such as linear regression and classification in a distributed setting. In this thesis, we first explore the disclosure risks associated with several existing protocols designed for the vertically partitioned database setting. We focus on the specific case where two parties are trying to perform logistic regression without actually combining their data. Although the protocols can be considered secure in the sense that there is no danger for either party's data to be fully exposed, there is information leakage resulting from the intermediate computations and also from the estimated coefficients. We provide detailed analysis of such cases. Secondly we show how these previously proposed secure computation protocols can be applied to penalize regression methods, with a focus on the LARS algorithm used to do Lasso regression. A protocol for the vertically partitioned database setting is described, along with a thorough discussion on possible disclosure risks and computation. We also provide a detailed description on how to perform model selection and possible ways to expand our protocol to LARS-type algorithms for generalized linear models, such as logistic regression.
Information and communication systems for vehicles are getting significant like ETC (electronic toll collection) and car navigation systems. In the next generation navigation systems, each vehicle can not only receive...
详细信息
Information and communication systems for vehicles are getting significant like ETC (electronic toll collection) and car navigation systems. In the next generation navigation systems, each vehicle can not only receive various types of information like maps and traffic but also obtain traffic information around the vehicle by using the sensors and send them to a navigation center. It is critical to discuss how to store information collected by vehicles in databases and how vehicles access the information in the database in the presence of a huge number of vehicles on roads. In this paper, we propose an enhanced dynamic R-tree (EDR-tree) scheme to store and retrieve traffic data collected by vehicles in dynamic distributed database systems. In distributed tree-structured indexes like R-tree and B-tree, the root node and nodes at upper layers easily get performance bottleneck and points of failure since every query request is transferred from root to leaf node. In this paper, we propose a new tree-structured scheme named EDR-tree to store data. A road is realized as a sequence of road units. A geographical space of roads is separated into area units where road units are stored. An area unit is stored in a leaf node and there is a tree-structured index on the leaf nodes like B + -tree and R-tree. Each vehicle first makes an access to a leaf node, not the root, which has information of a road unit where the vehicle is currently moving. Then, a query request is efficiently and reliably delivered to a target node by using not only parent-child links but also enhancing links, sibling and adjacent links. We evaluate the EDR-tree in terms of search time and insertion time.
In statistical databases and data warehousing applications it is commonly the case that aggregate views are maintained as an underlying mechanism for summarising information. Where the databases or applications are di...
详细信息
In statistical databases and data warehousing applications it is commonly the case that aggregate views are maintained as an underlying mechanism for summarising information. Where the databases or applications are distributed, or arise from independent data collections or system developments, there may be incompatibility, heterogeneity, and data inconsistency. These challenges need to be overcome if federations of aggregated databases are to be successfully incorporated into systems for database management, querying, retrieval, and knowledge discovery. In this paper we address the issue of integrating aggregate views that have semantically heterogeneous classification schemes. In previous work we have developed a methodology that is efficient but that cannot easily handle data inconsistencies. Our previous approach is therefore not particularly well-suited to very large databases or federations of large numbers of databases. We now address these scalability issues by introducing a methodology for heterogeneous aggregate view integration that constructs a dynamic shared ontology to which each of the aggregate views can be explicitly related. A maximum likelihood technique, implemented using the EM (Expectation-Maximisation) algorithm, is used to inherently handle data inconsistencies in the computation of integrated aggregates that are described in terms of the dynamic shared ontology.
In a one-copy distributed database, each data item is stored at exactly one site. In a replicated database, some data items may be stored at multiple sites. The main motivation is improved reliability: by storing impo...
详细信息
In a one-copy distributed database, each data item is stored at exactly one site. In a replicated database, some data items may be stored at multiple sites. The main motivation is improved reliability: by storing important data at multiple sites, the DBS can operate even though some sites have *** paper describes an algorithm for handling replicated data, which allows users to operate on data so long as one copy is “available.” A copy is “available” when (i) its site is up, and (ii) the copy is not out-of-date because of an earlier *** algorithm handles clean, detectable site failures, but not Byzantine failures or network partitions.
Since the early 1990, significant progress in database technology has provided new platform for emerging new dimensions of data engineering. New models were introduced to utilize the data sets stored in the new genera...
详细信息
Since the early 1990, significant progress in database technology has provided new platform for emerging new dimensions of data engineering. New models were introduced to utilize the data sets stored in the new generations of databases. These models have a deep impact on evolving decision-support systems. But they suffer a variety of practical problems while accessing real-world data sources. Specifically a type of data storage model based on data distribution theory has been increasingly used in recent years by large-scale enterprises, while it is not compatible with existing decision-support models. This data storage model stores the data in different geographical sites where they are more regularly accessed. This leads to considerably less inter-site data transfer that can reduce data security issues in some circumstances and also significantly improve data manipulation transactions speed. The aim of this paper is to propose a new approach for supporting proactive decision-making that utilizes a workable data source management methodology. The new model can effectively organize and use complex data sources, even when they are distributed in different sites in a fragmented form. At the same time, the new model provides a very high level of intellectual management decision-support by intelligent use of the data collections through utilizing new smart methods in synthesizing useful knowledge. The results of an empirical study to evaluate the model are provided.
A Hybrid cloud is an integration of resources between private and public clouds. It enables users to horizontally scale their on-premises infrastructure up to public clouds in order to improve performance and cut up-f...
详细信息
A Hybrid cloud is an integration of resources between private and public clouds. It enables users to horizontally scale their on-premises infrastructure up to public clouds in order to improve performance and cut up-front investment cost. This model of applications deployment is called cloud bursting that allows data-intensive applications especially distributed database systems to have the benefit of both private and public clouds. In this work, we present an automated implementation of a hybrid cloud using (i) a robust and zero-cost Linux-based VPN to make a secure connection between private and public clouds, and (ii) Terraform as a software tool to deploy infrastructure resources based on the requirements of hybrid cloud. We also explore performance evaluation of cloud bursting for six modern and distributed database systems on the hybrid cloud spanning over local OpenStack and Microsoft Azure. Our results reveal that MongoDB and MySQL Cluster work efficient in terms of throughput and operations latency if they burst into a public cloud to supply their resources. In contrast, the performance of Cassandra, Riak, Redis, and Couchdb reduces if they significantly leverage their required resources via cloud bursting.
A distributed database is a collection of data stored in different locations of a distributed system. The processing of queries in distributed databases is quite complex but of great importance for information managem...
详细信息
A distributed database is a collection of data stored in different locations of a distributed system. The processing of queries in distributed databases is quite complex but of great importance for information management. Students who have to learn that process have serious difficulties for understanding them. On this work we present a web platform for helping the students learning the processing and optimization of queries in distributed databases. The novelty of this platform is that as far as we know, there is no similar graphical tool. It allows to visualize step by step the different phases of distributed query processing, showing how are they forming, making it easier for the students to understand these concepts. Moreover, having this web platform available, always and everywhere, indirectly have an impact on other competences like encouraging students’ autonomous work and self-learning, adapting the teaching to its one-time necessities and reinforcing the advantages to apply information techniques in the teaching field. The results of the developed tests to validate the platform's functionalities and student's satisfaction were very positive.
Database consistency is one of the major issues in replicated database in distributed database systems. The logical design for the replicated nodes and the transaction management mechanism are two aspects that give a ...
详细信息
Database consistency is one of the major issues in replicated database in distributed database systems. The logical design for the replicated nodes and the transaction management mechanism are two aspects that give a serious impact to the performance and the consistency of replicated databases. This paper proposes a new model that combines the Neighbor Replication on Grid (NRG), where the data is replicated to the neighbors of the grid with the Update Ordering approach. The performance comparison shows that the proposed mechanism greatly improves the performance of the replicated database up to two orders of magnitude while preserving the data consistency.
The problem of finding optimal distribution of a database over a computer network to facilitate parallel searching for a set of database queries is analysed in this paper. The parallel searching of multiple segments r...
详细信息
The problem of finding optimal distribution of a database over a computer network to facilitate parallel searching for a set of database queries is analysed in this paper. The parallel searching of multiple segments required by the queries lowers the response time considerably. Procedures for finding the optimal distributions in a network to maximally exploit the parallel search capability with or without redundancy of segment types are proposed.
暂无评论