An important task in database integration is to resolve data conflicts, on both schema-level and semantic-level. Especially difficult the latter is. Some existing ontology-based approaches have been criticized for the...
详细信息
An important task in database integration is to resolve data conflicts, on both schema-level and semantic-level. Especially difficult the latter is. Some existing ontology-based approaches have been criticized for their lack of domain generality and semantic richness. With the aim to overcome these limitations, this paper introduces a systematic approach for detecting and resolving various semantic conflicts in heterogeneous databases, which includes two important parts: a semantic conflict representation model based on our classification framework of semantic conflicts, and a methodology for detecting and resolving semantic conflicts based on this model. The system has been developed, experimental evaluations on which indicate that this approach can resolve much of the semantic conflicts effectively, and keep independent of domains and integration patterns.
The integration of distributed data sources is one of the main problems of engineering software. The data integration process for a heterogeneous legacy system is a key aspect of the development of a computerized syst...
详细信息
The integration of distributed data sources is one of the main problems of engineering software. The data integration process for a heterogeneous legacy system is a key aspect of the development of a computerized system and in the integration of a design framework. In this research, our approach to data integration focuses on developing system-building techniques for efficient data integration queries. Keyword-based data searching is investigated and applied within a database for a design framework. A database table connector (DTC) wrapper program is implemented based on the use of data integration processes and keyword-based searching. The DTC provides data integration for various data resources from legacy programs and database management systems using SQL querying. The DTC enables designers and developers to rapidly and efficiently develop integration frameworks for different data resources. This paper also describes the implementation and deployment of the Certification and Aircraft Design integration System framework, which integrates various analysis and optimization codes, computer-aided design software and database management systems. Multiple data types are used within the framework, including databases, spreadsheets, flat files, XML files and personal data management. Several aircraft design and optimization problems are successfully solved using the developed framework.
Resolving domain incompatibility among independently developed databases often involves uncertain information. DeMichiel [1] showed that uncertain information can be generated by the mapping of conflicting attributes ...
详细信息
Resolving domain incompatibility among independently developed databases often involves uncertain information. DeMichiel [1] showed that uncertain information can be generated by the mapping of conflicting attributes to a common domain, based on some domain knowledge. In this paper, we show that uncertain information can also arise when the database integration process requires information not directly represented in the component databases, but can be obtained through some summary of data. We therefore propose an extended relational model based on Dempster-Shafer theory of evidence [2] to incorporate such uncertain knowledge about the source databases. The extended relation uses evidence sets to represent uncertainty in information, which allow probabilities to be attached to subsets of possible domain values. We also develop a full set of extended relational operations over the extended relations. In particular, an extended union operation has been formalized to combine two extended relations using Dempster's rule of combination. The closure and boundedness properties of our proposed extended operations are formulated. We also illustrate the use of extended operations by some query examples.
The post-genomic era, beginning at the end of the Human Genome Project, has led to new needs and challenges in the management of clinical and -omics data. One of the main issues in this new context for biomedical data...
详细信息
The post-genomic era, beginning at the end of the Human Genome Project, has led to new needs and challenges in the management of clinical and -omics data. One of the main issues in this new context for biomedical data management is the integration of heterogeneous sources, enabling access to different, remote biological data sources and the interpretation and discovery of new knowledge. Many researchers and practitioners in a wide range of biomedical areas, such as, for instance, all those related to genomic and personalized medicine, have to access these data located at numerous remote sources. Over the last decade, this new scientific context has stimulated research into developing new techniques for seamless web-based data integration and access. Some of the main challenges include the integration of scattered, non-structured public databases, how to deal with sensitive personal information, or how to manage image data. This paper presents a review of methods, techniques and tools for data integration.
The rapid expansion of biomedical knowledge, reduction in computing costs, and spread of internet access have created an ocean of electronic data. The decentralized nature of our scientific community and healthcare sy...
详细信息
The rapid expansion of biomedical knowledge, reduction in computing costs, and spread of internet access have created an ocean of electronic data. The decentralized nature of our scientific community and healthcare system, however, has resulted in a patchwork of diverse, or heterogeneous, database implementations, making access to and aggregation of data across databases very difficult. The database heterogeneity problem applies equally to clinical data describing individual patients and biological data characterizing our genome. Specifically, databases are highly heterogeneous with respect to the data models they employ, the data schemas they specify, the query languages they support, and the terminologies they recognize. Heterogeneous database systems attempt to unify disparate databases by providing uniform conceptual schemas that resolve representational heterogeneities, and by providing querying capabilities that aggregate and integrate distributed data. Research in this area has applied a variety of database and knowledge-based techniques, including semantic data modeling, ontology definition, query translation, query optimization, and terminology mapping, Existing systems have addressed heterogeneous database integration in the realms, of molecular biology, hospital information systems, and application portability. (C) 2001 Elsevier Science (USA).
One of the central problems of database integration is schema matching, that is, the identification of similar data elements in two or more databases or other data sources. Existing definitions of " similarity&qu...
详细信息
One of the central problems of database integration is schema matching, that is, the identification of similar data elements in two or more databases or other data sources. Existing definitions of " similarity" in this context vary greatly. As a result, schema matching has given rise to a large number of heuristics software tools. However, the empirical understanding of this process in humans is very limited so that little guidance can be offered to the further development of heuristics and tools. This paper presents an exploratory process tracing study of the similarity judgement process in humans. The similarity judgements of 12 data integration professionals on a range of integration problems are recorded and analyzed. Implications for future empirical and applied research in this area are discussed.
A complete data integration solution can be viewed as an iterative process that consists of three phases, namely analysis, derivation and evolution. The entire process is similar to a software development process with...
详细信息
A complete data integration solution can be viewed as an iterative process that consists of three phases, namely analysis, derivation and evolution. The entire process is similar to a software development process with the target application being the derivation rules for the integrated databases. In many cases, data integration requires several iterations of refining the local-to-global database mapping rules before a stable set of rules can be obtained. In particular, the mapping rules, as well as the data model and query model for the integrated databases have to cope with poor data quality in local databases, ongoing local database updates and instance heterogeneities. In this paper, we therefore propose a new object-oriented global data model, known as OORA, that can accommodate attribute and relationship instance heterogeneities in the integrated databases. The OORA model has been designed to allow database integrators and end users to query both the local and resolved instance values using the same query language throughout the derivation and evolution phases of database integration. Coupled with the OORA model, we also define a set of local-to-global database mapping rules that can detect new heterogeneities among databases and resolve instance heterogeneities if situations permit. (C) 2003 Elsevier B.V. All rights reserved.
Most research on attribute identification in database integration has focused on integrating attributes using schema and summary information derived from the attribute values. No research has attempted to fully explor...
详细信息
Most research on attribute identification in database integration has focused on integrating attributes using schema and summary information derived from the attribute values. No research has attempted to fully explore the use of attribute values to perform attribute identification. We propose an attribute identification method that employs schema and summary instance information as well as properties of attributes derived from their instances. Unlike other attribute identification methods that match only single attributes, our method matches attribute groups for integration. Because our attribute identification method fully explores data instances, it can identify corresponding attributes to be integrated even when schema information is misleading. Three experiments were performed to validate our attribute identification method. In the first experiment, the heuristic rules derived for attribute classification were evaluated on 119 attributes from nine public domain data sets. The second was a controlled experiment validating the robustness of the proposed attribute identification method by introducing erroneous data. The third experiment evaluated the proposed attribute identification method on five data sets extracted from online music stores. The results demonstrated the viability of the proposed method.
The paper presents a database integration method named MysqlInside which mainly for small and medium information system, describes its composition and structure, introduces some key attributes and functions of its com...
详细信息
ISBN:
(纸本)9783642239977
The paper presents a database integration method named MysqlInside which mainly for small and medium information system, describes its composition and structure, introduces some key attributes and functions of its components, and gives the core code in Java language for example. With the performance advantages of Mysql itself, MysqlInside is easy to use and has fully competent for small and medium information system's database application requirements.
暂无评论