Clustering of distributed databases facilitates knowledge discovery through learning of new concepts that characterise common features and differences between datasets. Hence, general patterns can be learned rather th...
详细信息
Clustering of distributed databases facilitates knowledge discovery through learning of new concepts that characterise common features and differences between datasets. Hence, general patterns can be learned rather than restricting learning to specific databases from which rules may not be generalisable. We cluster databases that hold aggregate count data on categorical attributes that have been classified according to homogeneous or heterogeneous classification schemes. Clustering of datasets is carried out via the probability distributions that describe their respective aggregates. The homogeneous case is straightforward. For heterogeneous data we investigate a number of clustering strategies, of which the most efficient avoid the need to compute a dynamic shared ontology to homogenise the classification schemes prior to clustering. (c) 2004 Elsevier B.V. All rights reserved.
Privacy preservation in distributed database is an active area of research. With the advancement of technology, massive amounts of data are continuously being collected and stored in distributed database applications....
详细信息
Privacy preservation in distributed database is an active area of research. With the advancement of technology, massive amounts of data are continuously being collected and stored in distributed database applications. Indeed, temporal associations and correlations among items in large transactional datasets of distributed database can help in many business decision-making processes. One among them is mining frequent itemset and computing their association rules, which is a nontrivial issue. In a typical situation, multiple parties may wish to collaborate for extracting interesting global information such as frequent association, without revealing their respective data to each other. This may be particularly useful in applications such as retail market basket analysis, medical research, academic, etc. In the proposed work, we aim to find frequent items and to develop a global association rules model based on the genetic algorithm (GA). The GA is used due to its inherent features like robustness with respect to local maxima/minima and domain-independent nature for large space search technique to find exact or approximate solutions for optimization and search problems. For privacy preservation of the data, the concept of trusted third party with two offsets has been used. The data are first anonymized at local party end, and then, the aggregation and global association is done by the trusted third party. The proposed algorithms address various types of partitions such as horizontal, vertical, and arbitrary.
The problem of connecting together a number of different databases to produce an integrated information system has attracted a considerable amount of attention over the years and various approaches have been developed...
详细信息
The problem of connecting together a number of different databases to produce an integrated information system has attracted a considerable amount of attention over the years and various approaches have been developed to handle this. However, the general problem of gathering related information from a number of existing heterogeneous databases is complex because of the differences in representation and meaning of data in different data sets. Many different approaches have been described to resolve this problem, and some prototype systems built. However, it is difficult to compare the effectiveness of different approaches and prototypes. This paper is aimed at addressing the specific issue of assessing the generality of different approaches. To this end it presents a framework for classifying the differences between data in different databases and a test-suite which can be used to evaluate and compare the extent to which different approaches handle different aspects of this heterogeneity. (C) 2000 Elsevier Science B.V. All rights reserved.
For distributed databases, checkpointing is used to ensure an efficient way to perform global reconstruction. However, the need for global reconstruction is infrequent. Most current checkpointing approaches for distri...
详细信息
For distributed databases, checkpointing is used to ensure an efficient way to perform global reconstruction. However, the need for global reconstruction is infrequent. Most current checkpointing approaches for distributed databases are too expensive during run time. Some of them allow the checkpointing process to run in parallel with normal transactions at the cost of more data and resource contention, which in turn causes longer response time for normal transactions. Thus, an efficient way to checkpoint distributed databases is needed to avoid degrading the system performance. This paper presents a low-cost solution, called Loosely Synchronized Local Fuzzy Checkpointing (LSLFC), to these problems. LSLFC supports global reconstruction, and our performance study shows that LSLFC has little overhead during run time.
An integrated approach to concurrency control adaptively allows classical pessimistic (two-phase locking) or optimistic (using certification) approaches. The principles for a distributed integrated method controlling ...
详细信息
An integrated approach to concurrency control adaptively allows classical pessimistic (two-phase locking) or optimistic (using certification) approaches. The principles for a distributed integrated method controlling both locking and optimistic transactions are defined. The implementation of these principles leads to a method for constructing the serialization order of transactions, using their conflicts. This dynamic construction prevents the systematic rejection of old (long) readers, as in the multiversion methods. On the other hand, applying Thomas' rule to control the write conflicts permits the presence of old (long) writers.< >
Intelligent routing control is defined as the process in which the network interrogates the databases containing the relationships between logical numbers, such as personal or information identifiers, and physical add...
详细信息
Intelligent routing control is defined as the process in which the network interrogates the databases containing the relationships between logical numbers, such as personal or information identifiers, and physical addresses in the transport network to find the terminal having the information required to process a user request. The routing control system presented uses distributed databases, each of which manages a switching system and all of which are connected through high-speed signalling networks separate from the transport network. If the requested physical address cannot be found in one database, search requests are distributed at the same time to all other databases. For up to 100 million subscribers, the routing control system can find a physical address within 1 s when each database uses ten memories accessed at 200 ns with an interdatabase linkage speed of 14 Mb/s.< >
distributed database performance is often unpredictable due to issues such as system complexity, network congestion, or imbalanced data distribution. These issues are difficult for users to assess in part due to the o...
详细信息
distributed database performance is often unpredictable due to issues such as system complexity, network congestion, or imbalanced data distribution. These issues are difficult for users to assess in part due to the opaque mapping between declaratively specified queries and actual physical execution plans. Database developers currently must expend significant time and effort scanning log files to isolate and debug the root causes of performance issues. In response, we present Perfopticon, an interactive query profiling tool that enables rapid insight into common problems such as performance bottlenecks and data skew. Perfopticon combines interactive visualizations of (1) query plans, (2) overall query execution, (3) data flow among servers, and (4) execution traces. These views coordinate multiple levels of abstraction to enable detection, isolation, and understanding of performance issues. We evaluate our design choices through engagements with system developers, scientists, and students. We demonstrate that Perfopticon enables performance debugging for real-world tasks.
distributed databases allow us to integrate data from different sources which have not previously been combined. In this article, we are concerned with the situation where the data sources are held in a distributed da...
详细信息
distributed databases allow us to integrate data from different sources which have not previously been combined. In this article, we are concerned with the situation where the data sources are held in a distributed database. Integration of the data is then accomplished using the Dempster-Shafer representation of evidence. The weighted sum operator is developed and this operator is shown to provide an appropriate mechanism for the integration of such data. This representation is particularly suited to statistical samples which may include missing values and be held at different levels of aggregation. Missing values are incorporated into the representation to provide lower and upper probabilities for propositions of interest. The weighted sum operator facilitates combination of samples with different classification schemes. Such a capability is particularly useful for knowledge discovery when we are searching for rules within the concept hierarchy, defined in terms of probabilities or associations. By integrating information from different sources, we may thus be able to induce new rules or strengthen rules which have already been obtained. We develop a framework for describing such rules and show how we may then integrate rules at a high level without having to resort to the raw data, a useful facility for knowledge discovery where efficiency is of the essence. (C) 1997 John Wiley & Sons, Inc.
distributed databases on local area networks present additional considerations for query optimization over databases on geographically distributed, point-to-point networks. This paper surveys and evaluates the state o...
详细信息
distributed databases on local area networks present additional considerations for query optimization over databases on geographically distributed, point-to-point networks. This paper surveys and evaluates the state of current research on distributed query optimization for local area networks. A classification taxonomy is presented and used to analyze the proposed query-optimization algorithms. The unique features of each algorithm are highlighted and a qualitative comparison of the algorithms is given. Future research directions are discussed.
In a one-copy distributed database, each data item is stored at exactly one site. In a replicated database, some data items may be stored at multiple sites. The main motivation is improved reliability: by storing impo...
详细信息
In a one-copy distributed database, each data item is stored at exactly one site. In a replicated database, some data items may be stored at multiple sites. The main motivation is improved reliability: by storing important data at multiple sites, the DBS can operate even though some sites have *** paper describes an algorithm for handling replicated data, which allows users to operate on data so long as one copy is “available.” A copy is “available” when (i) its site is up, and (ii) the copy is not out-of-date because of an earlier *** algorithm handles clean, detectable site failures, but not Byzantine failures or network partitions.
暂无评论