The solar-terrestrial physics distributed database for the ICSU World data Centers, and the NCEP/NCAR climate re-analysis data have been integrated into standard Grid environments using the OGSA-DAI framework. A set o...
详细信息
The solar-terrestrial physics distributed database for the ICSU World data Centers, and the NCEP/NCAR climate re-analysis data have been integrated into standard Grid environments using the OGSA-DAI framework. A set of algorithms and software tools for distributed querying and mining environmental archives using the UNIdatacommon data model concepts has been developed. In addition, the toolkit enables querying the data using meaningful 'human linguistic' terms. Copyright (c) 2007 John Wiley & Sons, Ltd.
Multidatabase systems are designed to achieve schema integration and data interoperation among distributed and heterogeneous database systems. But datamodel heterogeneity and schema heterogeneity make this a challeng...
详细信息
Multidatabase systems are designed to achieve schema integration and data interoperation among distributed and heterogeneous database systems. But datamodel heterogeneity and schema heterogeneity make this a challenging task. A multidatabase common data model is firstly introduced based on XML, named XML-based Integration datamodel (XIDM), which is suitable for integrating different types of schemas. Then an approach of schema mappings based on XIDM in multidatabase systems has been presented. The mappings include global mappings, dealing with horizontal and vertical partitioning between global schemas and export schemas, and local mappings, processing the transformation between export schemas and local schemas. Finally, the illustration and implementation of schema mappings in a multidatabase prototype - Panorama system are also discussed. The implementation results demonstrate that the XIDM is an efficient model for managing multiple heterogeneous data sources and the approaches of schema mapping based on XIDM behave very well when integrating relational, object-oriented database systems and other file systems.
Background: The large amount of data that are currently produced in the biological sciences can no longer be explored and visualized efficiently with traditional, specialized software. Instead, new capabilities are ne...
详细信息
Background: The large amount of data that are currently produced in the biological sciences can no longer be explored and visualized efficiently with traditional, specialized software. Instead, new capabilities are needed that offer flexibility, rapid application development and deployment as standalone applications or available through the Web. Results: We describe a new software toolkit - the Molecular Biology Toolkit (MBT;http://***) - that enables fast development of applications for protein analysis and visualization. The toolkit is written in Java, thus offering platform-independence and Internet delivery capabilities. Several applications of the toolkit are introduced to illustrate the functionality that can be achieved. Conclusions: The MBT provides a well-organized assortment of core classes that provide a uniform datamodel for the description of biological structures and automate most common tasks associated with the development of applications in the molecular sciences (data loading, derivation of typical structural information, visualization of sequence and standard structural entities).
Background: We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological on...
详细信息
Background: We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. Description: The Atlas system is based on relational datamodels that we developed for each of the source data types. data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference database (HPRD), Biomolecular Interaction Network database (BIND), database of Interacting Proteins ( DIP), Molecular Interactions database (MINT), IntAct, NCBI Taxonomy,Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. Conclusion: The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data o
Knowledge discovery is the non-trivial extraction of implicit, previously unknown and potentially useful information from data. We present a model of how concepts are structured within data sources, after exploring cu...
详细信息
ISBN:
(纸本)0780378652
Knowledge discovery is the non-trivial extraction of implicit, previously unknown and potentially useful information from data. We present a model of how concepts are structured within data sources, after exploring current conceptual structures applied to represent concepts embedded within data sources. These techniques include Formal Concept Analysis (FCA), Conceptual Graphs (CG), and Structured Concepts (SC). By developing a hybrid conceptual structure, we intend to capture the key features of FCA, CG and SC. In the end of this paper, we also present a system architecture for conceptual knowledge discovery.
暂无评论