Since the era of big data is coming, the first important problem is how to enhance the speed of database query. For the query optimization of distributed database, the speed of query depends on the data transfer and o...
详细信息
ISBN:
(纸本)9781467395878
Since the era of big data is coming, the first important problem is how to enhance the speed of database query. For the query optimization of distributed database, the speed of query depends on the data transfer and order of join. The cost model minimizing communication cost is the emphasis of research. Parallel Genetic Algorithm-Max-Min Ant System was proposed to seek a best query execution plan, which combines faster convergence of Genetic Algorithm, globally search ability of Max-Min Ant System and parallel property of both them. The experiment results show that the proposed algorithm is effective for query processing of multi-join, and plays important role in improving the performance of distributed database.
In a distributed database system, data replicas are placed at different locations to achieve high data availability in the presence of link failures. With majority voting protocol, a location is survived for read/writ...
详细信息
ISBN:
(纸本)9781424431748
In a distributed database system, data replicas are placed at different locations to achieve high data availability in the presence of link failures. With majority voting protocol, a location is survived for read/write operations if and only if it is accessible to more titan half of the replicas. The problem is to find out the optimal placements for a given number of data replicas in a ring network. When the number of replicas is odd, it was conjectured by Hu et al. that every uniform placement is optimal, which is proved by Shekhar and Wu later. However, when the number of replicas is even, it was pointed out by Hu el al. that uniform placements are not optimal and the optimal placement problem may be very complicated. In this paper we study the optimal placement problem in a ring network, with majority voting protocol and even number of replicas, and give a complete characterization of optimal placements when the number of replicas is not too large compared with the number of locations.
Through the in-depth study of the existing distributed database query processing technology, this paper proposes a distributed database query processing program. This program optimizes the existing query processing, s...
详细信息
ISBN:
(纸本)9783037853191
Through the in-depth study of the existing distributed database query processing technology, this paper proposes a distributed database query processing program. This program optimizes the existing query processing, stores the commonly used query results according to the query frequency, to be directly used by the subsequent queries or used as intermediate query results, thus avoiding possible transmission of a large number of data, thereby reducing the query time and improving query efficiency.
The successful application of genetic algorithms in distributed database greatly depends on appropriate coding method for query optimization because the coding method of parameters can directly affect the construction...
详细信息
ISBN:
(纸本)9781424425020
The successful application of genetic algorithms in distributed database greatly depends on appropriate coding method for query optimization because the coding method of parameters can directly affect the construction of genetic operators and performance of the algorithms. Considering a combinatorial optimum with the restriction of both the position and the condition to include a lot of messages, this paper aims to devise a new coding method with tree structure based on the position and the value. Besides, the genetic operators, i.e. reproduction, crossover and mutation are designed for this coding. The improved crossover needs to be implemented by two steps and the improved mutation is composed of the value mutation and the position mutation. The proposed algorithm is used to implement query of distributed database, the experiment results showed that it is very effective for optimization.
Recently, distributed databases have achieved tremendous realistic performances and developed one of the most essentially utilized tools in society communication applications. However, the existing distributed databas...
详细信息
ISBN:
(纸本)9798350349184;9798350349191
Recently, distributed databases have achieved tremendous realistic performances and developed one of the most essentially utilized tools in society communication applications. However, the existing distributed databases often contain users' sensitive information and are vulnerable to web attackers, which may cause severe privacy issues and economic loss. In this paper, we first attempt to propose a novel protocol to dispose of the potential verification risks in distributed databases. Compared with currently distributed databases, the requester can steal important data without any payment. Therefore, our model faces two primary challenges including guaranteeing the efficiency and security of the distributed databases, the data verification procedure may lead to data leakage. To address the above problems, we utilize zero-knowledge proof to dispose of the data verification issue for the requester. Moreover, a secure and effective proof protocol is established to achieve database responses the privacy data access. From our extensive experimental results, we can conclude that our developed framework can achieve an effective performance with reasonable communication costs.
Privacy preserving data clustering is a useful method for extracting intrinsic cluster structures from distributed databases keeping personal privacy. In a previous research, a model of performing Fuzzy c-Lines cluste...
详细信息
ISBN:
(纸本)9781665499248
Privacy preserving data clustering is a useful method for extracting intrinsic cluster structures from distributed databases keeping personal privacy. In a previous research, a model of performing Fuzzy c-Lines clustering was proposed, where a privacy preserving scheme of k-means-type model was adopted with cryptographic calculation. This paper further improves the model for handling incomplete data ignoring the influences of missing values. The element-wise clustering criterion enables to derive local principal component vectors in each data sources by considering minimization of low-rank approximation of observed elements only. Then, fuzzy memberships of each object are calculated in a collaborative manner among organizations, where partial distances between objects and prototypes are derived with cryptographic framework so that intra-organization information is kept secret. The characteristic features of the proposed method are demonstrated through numerical experiments.
Modern distributed database systems scale horizontally by partitioning their data across a large number of nodes. Most such systems build their transactional layers on a replication layer, employing a consensus protoc...
详细信息
ISBN:
(纸本)9781728191843
Modern distributed database systems scale horizontally by partitioning their data across a large number of nodes. Most such systems build their transactional layers on a replication layer, employing a consensus protocol to ensure data consistency to achieve fault tolerance. Synchronization among replicated state machines thus becomes a significant overhead of transaction processing. Without careful design, synchronization could amplify transactions' lock duration and impair the system's scalability. Speculative techniques, such as Controlled Lock Violation (CLV) and Early Lock Release (ELR), prove useful in shortening lock's critical path and boosting transaction processing performance. To use these techniques to optimize geo-replicated distributed databases(GDDB) is an intuitive idea. This paper shows that a naive application of speculation is often unhelpful in a distributed environment. Instead, we introduce distributed Lock Violation (DLV), a specialized speculative technique for geo-replicated distributed databases. DLV can achieve good performance without incurring severe side effects.
Addressing security demands under fixed budgets and tight time constraints are becoming extremely challenging, time consuming and resource intensive. Moreover, securing the distributed database in compliance with seve...
详细信息
ISBN:
(数字)9783642227097
ISBN:
(纸本)9783642227080
Addressing security demands under fixed budgets and tight time constraints are becoming extremely challenging, time consuming and resource intensive. Moreover, securing the distributed database in compliance with several security guidelines makes the system more complex. Mission critical systems, military, government and financial institutions have been under tremendous pressure to secure their databases. Such requirements mandate that each system passes a strict security scan before it is deemed suitable to go into operational mode. This paper presents a framework that embeds security capabilities into distributed database by replicating different predefined security policies at different sites using multilevel secure database management system.
There were some traditional algorithms for mining global frequent itemsets. Most of them adopted Apriori-like algorithm frameworks. This resulted a lot of candidate itemsets, frequent database scans and heavy communic...
详细信息
ISBN:
(纸本)3540362975
There were some traditional algorithms for mining global frequent itemsets. Most of them adopted Apriori-like algorithm frameworks. This resulted a lot of candidate itemsets, frequent database scans and heavy communication traffic. To solve these problems, this paper proposes a fast algorithm for mining global frequent itemsets, namely the FMGFI algorithm. It can easily get the global frequency for any itemsets from the local FP-tree and require far less communication traffic by the searching strategies of top-down and bottom-up. It effectively reduces existing problems of most algorithms for mining global frequent itemsets. Theoretical analysis and experimental results suggest that the FMGFI algorithm is fast and effective.
In big data environment, apache Cassandra is a distributed database which offers very high availability. It is an open source database system and is designed to manage large transactional data across various server gl...
详细信息
ISBN:
(纸本)9789811319518;9789811319501
In big data environment, apache Cassandra is a distributed database which offers very high availability. It is an open source database system and is designed to manage large transactional data across various server globally. Main feature of Cassandra is to provide high availability and very high fault tolerance, decentralized database system with zero downtime. A traditional relational database (RDBMSs) is used to storing data for various applications from many years, but some changes are required because application must be scale to levels that were unimaginable. But only scaling is not the main concern of changes, companies are also requires such type of applications that always available and running fast where RDBMS database fail. Apache Cassandra is a fully distributed database that has such type of architecture where it handles extreme data velocity with highly availability, scalability and recovers from fault tolerance easily. In Cassandra architecture, there is no master node to handle all the nodes in the ring or network. The data distribution among nodes in this architecture is in equal probation. Cassandra creates such type of environment where an entire datacenter can lose but still perform as if nothing happened. This paper provides a brief idea about Cassandra.
暂无评论