semi-structured data is a widely used text format for data interchange and storage. This paper proposes a robust watermarking scheme of data protection for semi-structured data, which uses JSON format as an example fo...
详细信息
semi-structured data is a widely used text format for data interchange and storage. This paper proposes a robust watermarking scheme of data protection for semi-structured data, which uses JSON format as an example for illustration. We first parse JSON file into a data structure of distinct pairs. Afterwards, we generate a transfer matrix to get the intermediate sequences, which are then encoded using error-correction codes and embedded into the pairs. A private key is shared by the data hider and the recipient to resist collusion attacks. On the recipient's side, data extraction can be successfully carried out even the received stego data are tampered. The imperceptibility is realized by embedding data into the less significant digits of numeric data in the cover file. The proposed scheme can be extended on several other formats. The experimental results show that the proposed scheme is robust to various kinds of typical attacks such as contextual truncating, modification, and redundancy addition.
A semistructureddata extraction method to get the useful information embedded in a group of relevant web pages and store it with OEM(Object Exchange Model) is proposed. Then, the data mining method is adopted to dis...
详细信息
A semistructureddata extraction method to get the useful information embedded in a group of relevant web pages and store it with OEM(Object Exchange Model) is proposed. Then, the data mining method is adopted to discover schema knowledge implicit in the semistructureddata. This knowledge can make users understand the information structure on the web more deeply and thourouly. At the same time, it can also provide a kind of effective schema for the querying of web information.
New business applications require flexibility in data model structure and must support the next generation of web applications and handle complex data types. The performance of processing structureddata through a rel...
详细信息
ISBN:
(纸本)9789897583773
New business applications require flexibility in data model structure and must support the next generation of web applications and handle complex data types. The performance of processing structureddata through a relational database has become incompatible with big data challenges. Nowadays, there is a need to deal with semi-structured data with a flexible schema for different applications. Not only SQL (NoSQL) has been presented to overcome the limitations of relational databases in terms of scale, performance, data model, and distribution system. Also, NoSQL supports semi-structured data and can handle a huge amount of data and provide flexibility in the data schema. But the data models of NoSQL systems are very complex, as there are no tools available to represent a scheme for NoSQL databases. In addition, there is no standard schema for data modelling of document-oriented databases. This study proposes a semi-structured data model for big data (SS-DMBD) that is compatible with a document-oriented database, and also proposes an algorithm for mapping the entity relationship (ER) model to SS-DMBD. A case study is used to evaluate the SS-DMBD and its features. The results show that this model can address most features of semi- structureddata.
The principles of functioning of the semi-structured data dynamic integration Mashup system have been described. The modern technologies of Mashup data integration realization have been considered. The basic processes...
详细信息
ISBN:
(纸本)9781509027392
The principles of functioning of the semi-structured data dynamic integration Mashup system have been described. The modern technologies of Mashup data integration realization have been considered. The basic processes of information resources processing into Mashup system have been characterized.
The data produced by various services should be stored and managed in an appropriate format for gaining valuable knowledge conveniently. This leads to the emergence of various data models, including relational, semi-s...
详细信息
The data produced by various services should be stored and managed in an appropriate format for gaining valuable knowledge conveniently. This leads to the emergence of various data models, including relational, semi-structured, and graph models, and so on. Considering the fact that the mature relational databases established on relational data models are still predominant in today's market, it has fueled interest in storing and processing semi-structured data and graph data in relational databases so that mature and powerful relational databases' capabilities can all be applied to these various data. In this survey, we review existing methods on mapping semi-structured data and graph data into relational tables, analyze their major features, and give a detailed classification of those methods. We also summarize the merits and demerits of each method, introduce open research challenges, and present future research directions. With this comprehensive investigation of existing methods and open problems, we hope this survey can motivate new mapping approaches through drawing lessons from eachmodel's mapping strategies, aswell as a newresearch topic - mapping multi-model data into relational tables.
OLAP cubes enable aggregation-centric analysis of transactional data by shaping data records into measurable facts with dimensional characteristics. A multidimensional view is obtained from the available data fields a...
详细信息
OLAP cubes enable aggregation-centric analysis of transactional data by shaping data records into measurable facts with dimensional characteristics. A multidimensional view is obtained from the available data fields and explicit relationships between them. This classical modeling approach is not feasible for scenarios dealing with semi-structured or poorly structureddata. We propose to the data warehouse design methodology with a content-driven discovery of measures and dimensions in the original dataset. Our approach is based on introducing a data enrichment layer responsible for detecting new structural elements in the data using data mining and other techniques. Discovered elements can be of type measure, dimension, or hierarchy level and may represent static or even dynamic properties of the data. This paper focuses on the challenge of generating, maintaining, and querying discovered elements in OLAP cubes. We demonstrate the power of our approach by providing OLAP to the public stream of user-generated content on the Twitter platform. We have been able to enrich the original set with dynamic characteristics, such as user activity, popularity, messaging behavior, as well as to classify messages by topic, impact, origin, method of generation, etc. Knowledge discovery techniques coupled with human expertise enable structural enrichment of the original data beyond the scope of the existing methods for obtaining multidimensional models from relational or semi-structured data. (C) 2013 Elsevier Ltd. All rights reserved.
The global food and agricultural industry has a total market value of USD 8 trillion in 2016, and decision makers in the Agri sector require appropriate tools and up-to-date information to make predictions across a ra...
详细信息
The global food and agricultural industry has a total market value of USD 8 trillion in 2016, and decision makers in the Agri sector require appropriate tools and up-to-date information to make predictions across a range of products and areas. Traditionally, these requirements are met with information processed into a data warehouse and data marts constructed for analyses. Increasingly however, data are coming from outside the enterprise and often in unprocessed forms. As these sources are outside the control of companies, they are prone to change and new sources may appear. In these cases, the process of accommodating these sources can be costly and very time consuming. To automate this process, what is required is a sufficiently robust extract-transform-load process;external sources are mapped to some form of ontology, and an integration process to merge the specific data sources. In this paper, we present an approach to automating the integration of data sources in an Agri environment, where new sources are examined before an attempt to merge them with existing data marts. Our validation uses a case study of real world Agri data to demonstrate the robustness of our approach and the efficiency of materializing data marts.
Organisations use data in different formats: Word documents, Excel spreadsheets, databases, HTML pages and so on. It is not easy to make decisions with such data due to the lack of integration between the different so...
详细信息
Organisations use data in different formats: Word documents, Excel spreadsheets, databases, HTML pages and so on. It is not easy to make decisions with such data due to the lack of integration between the different sources and built-in decision-making rules. Decisions can be reached with knowledge bases, which, unlike databases, make it possible to store not only objects, facts and attributes but also more sophisticated patterns such as rules and axioms. The article proposes an ontology-based method for knowledge base creation that allows for the simultaneous integration of semi-structured data sources and extendibility while remaining context independent. At the initial steps of the method, data specification should be performed with the data Sources Ontology developed by the authors. This ontology provides data structure description that forms supportive knowledge graph. The graph's schema should be mapped with the domain ontology to be populated. Finally, the data are inserted into the domain ontology according to the mapping rules. Manual input is needed during data specification and data-to-ontology schema mapping.
XML and other semi-structured data can be represented by a graph model. The paths in a data graph are used as a basic constructor of a query. Especially, by using patterns on paths, a user can formulate more expressiv...
详细信息
XML and other semi-structured data can be represented by a graph model. The paths in a data graph are used as a basic constructor of a query. Especially, by using patterns on paths, a user can formulate more expressive queries. Patterns in a path enlarge the search space of a data graph and current research for indexing semi-structured data focuses on reducing the search space. However, the existing indexes cannot reduce the search space when a data graph has some references. In this paper, we introduce a partitioning technique for all paths in a data graph and an index graph which can effectively find appropriate path partitions for a path query with patterns. (C) 2004 Elsevier B.V. All rights reserved.
Distributed Hash Tables (DHTs) are widely used for indexing and locating many types of resources, including semi-structured data modeled as XML documents. A common distributed strategy to process an XML query over a D...
详细信息
Distributed Hash Tables (DHTs) are widely used for indexing and locating many types of resources, including semi-structured data modeled as XML documents. A common distributed strategy to process an XML query over a DHT consists in splitting it into a set of simple path queries, and resolving each of them separately. The traffic generated by this strategy grows with the number of paths in the query. To overcome this drawback, an alternative strategy consists in resolving only the sub-query associated with the most selective path, and then submitting the original query to the nodes in the result set. A first goal of this paper is to provide an analytical and experimental study of the two strategies to assess their relative merits in different scenarios. On the basis of this study, we introduce an Adaptive Path Selection (APS) search technique that resolves an XML query in a distributed way by querying either the most selective path or the whole path set, based on the selectivity of the paths in the query. The effective use of APS requires that the querying nodes know in advance the selectivity of all the paths. Addressing this problem is another goal of the paper, which is achieved through: (i) The definition of a space-efficient data structure, the Path Selectivity Table (PST), which given any path, returns an estimate of its selectivity. (ii) The definition of an efficient strategy that builds the PST in a distributed way and propagates it to all nodes in the network with logarithmic performance bounds and without redundant messages. Experimental results show that the PST accurately estimates the path selectivity values, and that the traffic generated by the APS algorithm using PST-estimated selectivity values is comparable to that produced by APS assuming to know the real path selectivity values. (C) 2016 Elsevier Inc. All rights reserved.
暂无评论