An application processing center consists of a set of well-defined, well-designed and well-tested applications that are dynamically executed over a period of time. We assume that there is a set of candidate distribute...
详细信息
An application processing center consists of a set of well-defined, well-designed and well-tested applications that are dynamically executed over a period of time. We assume that there is a set of candidate distributed database designs each of which is optimal for some applications. The random execution of applications on a distributed database design is modeled as a discrete Markov process, and the problem of selecting the candidate design for each execution of an application is solved by using Sequential Markovian Decision Process analysis to generate an optimal redesign policy vector. The scope of the methodology developed in this paper is applicable to environments similar to application processing centers. The viability of this methodology is illustrated by means of a case study conducted at Georgia Institute of Technology. Copyright (C) 1996 Elsevier Science Ltd
The development of a distributeddatabase system requires effective solutions to many complex and interrelated design optimization problems. The cost dependencies between query optimization and data allocation on dist...
详细信息
The development of a distributeddatabase system requires effective solutions to many complex and interrelated design optimization problems. The cost dependencies between query optimization and data allocation on distributed systems are well recognized but little understood. We investigate these dependencies by proposing and analysing an iterative heuristic which provides an integrated solution to the query optimization and data allocation problems. The optimization heuristic iterates between finding minimum-cost query strategies and minimum-cost data allocations until a local minimum for the combined problem is found. A search from convergence efficiently scans the optimization search space for lower-cost solutions. In this paper, we apply the iterative heuristic to a realistic distributeddatabase system model and a general class of queries and obtain very significant performance benefits. Experimental results demonstrate clear improvements in performance for the iterative method over existing design methods in a general-query environment. The iterative heuristic is proposed as a framework for future research extensions to achieve distributed system optimization.
Partitioning and allocation of relations is an important component of the distributed database design. Several approaches (and algorithms) have been proposed for clustering data for pattern classification and for part...
详细信息
Partitioning and allocation of relations is an important component of the distributed database design. Several approaches (and algorithms) have been proposed for clustering data for pattern classification and for partitioning relations in distributeddatabases. Most of the approaches used for classification use square-error criterion. In contrast, most of the approaches proposed for partitioning of relations are either ad hoc solutions or solutions for special cases (e.g., binary vertical partitioning). In this paper, we first highlight the differences between the approaches taken for pattern classification and for distributeddatabases. Then an objective function for vertical partitioning of relations is derived using the square-error criterion commonly used in data clustering. The objective function derived generalizes and subsumes earlier work on vertical partitioning. Furthermore, the approach proposed in this paper is shown to be useful for comparing previously developed algorithms for vertical partitioning. The objective function has also been extended to include additional information, such as transaction types, different local and remote accessing costs and replication. Finally, we discuss the implementation of a distributed database design testbed.
This paper is concerned with the problem of optimal assignment of data to sites in a distributed relational database. It is shown that in general the optimal allocation will require exponential time in terms of the in...
详细信息
This paper is concerned with the problem of optimal assignment of data to sites in a distributed relational database. It is shown that in general the optimal allocation will require exponential time in terms of the input. Several heuristic algorithms that can be applied to various constraints and which provide feasible, near optimal results, as well as a model that determines the "best" assignment for a given input out of several optimal and near optimal assignments have been developed. The model is shown to be efficient, to require polynomial time, to be practical in term of feasible inputs and to achieve assignments with near minimal global and local costs.
In a distributeddatabase environment, query processing heavily depends upon locality of the requested data at the query site. When relations are horizontally fragmented, data locality can be improved significantly si...
详细信息
In a distributeddatabase environment, query processing heavily depends upon locality of the requested data at the query site. When relations are horizontally fragmented, data locality can be improved significantly since the fragments can be replicated flexibly without replication incurring overwhelming update cost. One major issue in developing a horizontal fragmentation technique is what criteria to use to guide the fragmentation. Conventional techniques have suggested to use typical user queries. In this paper we propose to use, in addition to typical user queries, particular knowledge about the data itself. Use of this knowledge allows revision of typical user queries into more precise forms. The revised query expressions produce better estimation of user reference clusters to the database than the original query expressions. The estimated user reference clusters form a basis to partition relations horizontally. In our proposed approach, an ordinary many-sorted language is extended to represent the queries and knowledge compatibly. This knowledge is identified in terms of five axiom schema. An inference procedure is developed to apply the knowledge to the queries deductively. An example is provided to illustrate the query revision process and the use of the estimated user reference clusters for fragmentation.
In this paper, the problem of determining an optimal data allocation in a distributeddatabase is discussed. The design depends on user‐supplied transaction information, including priorities of transactions, their fr...
详细信息
In this paper, the problem of determining an optimal data allocation in a distributeddatabase is discussed. The design depends on user‐supplied transaction information, including priorities of transactions, their frequencies, and sites of origin. The paper first restrains the complexity of each transaction. Later, the problem is described using the zero‐one goal programming model. The model also considers the replication of data. Using the characteristics of the model, we can develop an efficient algorithm to get a sub‐optimal solution. Finally, a small sample database is given to show the efficiency of the algorithm.
This paper introduces a generalized model for the allocation of data in a distributed relational database. Introduced is a generalized data distributed model applicable to a network of computers where there are differ...
详细信息
This paper introduces a generalized model for the allocation of data in a distributed relational database. Introduced is a generalized data distributed model applicable to a network of computers where there are different communication, update and retrieval costs at the various sites. The benefit of allocating data to to a site is computed as the difference between allocating the data to the site and not allocating the data to that site. The cost of allocating the data to a site is computed exactly while the cost of not allocating the data to a site is determined heuristically. The algorithm attempts to maximize the total benefits for the network. The distribution model will determine the data allocation in polynomial time. The model also achieve a near optimal allocation of data through maximizing the benefits for the sites in the network.
暂无评论