检索结果-内蒙古大学图书馆

WebDB: a system for querying semi-structured data on the Web

JOURNAL OF VISUAL LANGUAGES AND COMPUTING 2002年第1期13卷 3-33页

作者： Li, WS Shim, J Candan, KS NEC USA C&C Res Labs San Jose CA 95134 USA

The World-Wide Web can be viewed as a collection of semi-structured multimedia documents in the form of Web pages connected through hyperlinks. Unlike most web search engines, which primarily focus on information retrieval functionality, WebDB aims at supporting a comprehensive database-like query functionality, including selection, aggregation, sorting, summary, grouping, and projection. WebDB allows users to access (1) document level information, such as title, URL, length, keywords types and last modified date;(2) intra-document structures, such as tables, forms and images and (3) inter-document linkage information, such as destination URLs and anchors. With these three types of information, comprehensive queries for complex Web-based applications, such as Web mining and Web site management, can be answered. WebDB is based on object-relational concepts: Object-oriented modeling and relational query language. In this paper, we present the data model, language and implementation of WebDB. We also present the novel visual query/browsing interface for semi-structured Web and Web documents. Our system provides high usability compared with other existing systems. (C) 2002 Elsevier Science Ltd. All rights reserved.

关键词： WWW semi-structured data Web database Web query language SQL3 object-relational DBMS visual user interface

来源：评论

学校读者我要写书评

暂无评论

Information and Analytical Support of the Authorities Using semi-structured data 16

Information and Analytical Support of the Authorities Using ...

引用

9th International Conference on Theory and Practice of Electronic Governance (ICEGOV)

作者： Mikhaylova, Ekaterina Mityagin, Sergey Tikhonova, Olga Zakharov, Yuriy ITMO Univ Kronverkskiy 49 St Petersburg Russia St Petersburg Informat & Analyt Ctr Chernyakhovskogo 59 St Petersburg Russia

ISBN: (纸本)9781450336406

Informational and analytical support of the authorities is one of the most relevant subject on the development of a decision support system. This article describes the current process of information-analytical support of the authorities by the example of St. Petersburg, Russia, including the analysis of social networks, as well as an analysis of the existing approaches to information and analytical support, and the problems that can be solved by using semi-structured data.

关键词： e-Government Information-analytical Support semi-structured data Social Media Decision-support System

来源：评论

学校读者我要写书评

暂无评论

Concurrent Processing of Increments in Online Integration of semi-structured data 3

Concurrent Processing of Increments in Online Integration of...

引用

3rd International Conference on Information and Communication Technology ICoICT

作者： Handoko Getta, Janusz R. Univ Wollongong Sch Comp Sci & Software Engn Wollongong NSW 2522 Australia

ISBN: (纸本)9781479977529

An online integration system enables incremental computation shortly after an increment data arrived at the central site. Processing increments serially ensures all data containers are in their updated states for computation of the next increment data. In general, a data container may show up as several arguments in a data integration expression. Serial processing of increments at this data container failed to show its best performance due to expensive IO costs for materialization updates. This paper proposes an online integration system with dynamic scheduling to enable concurrent processing of increments of data. The online integration system allows a series of transformation of a data integration expression into a single increment expression upon the increments of multiple data containers, and generates a data integration plan. The dynamic scheduling system employs a monitoring system and a priority scheduling which is able to dynamically change the data integration plans according to the increment data behavior.

关键词： data integration dynamic scheduling distributed database semi-structured data

来源：评论

学校读者我要写书评

暂无评论

A semi-structured data Classification Model with Integrating Tag Sequence and Ngram 26th

A Semi-structured Data Classification Model with Integrating...

引用

26th International Conference on database Systems for Advanced Applications (DASFAA)

作者： Zhang, Lijun Li, Ning Pan, Wei Li, Zhanhuai Northwestern Polytech Univ Sch Comp Sci Xian 710072 Peoples R China Northwestern Polytech Univ Key Lab Big Data Storage & Management Minist Ind & Informat Technol Xian 710072 Peoples R China

ISBN: (纸本)9783030731960;9783030731977

Many collaboratively building resources, such as Wikipedia, Weibo and Quora, exist in the form of semi-structured data and semistructured data classification plays an important role in many data analysis applications. In addition to content information, semi-structured data also contain structural information. Thus, combining the structure and content features is a crucial issue in semi-structured data classification. In this paper, we propose a supervised semi-structured data classification approach that utilizes both the structural and content information. In this approach, generalized tag sequences are extracted from the structural information, and nGrams are extracted from the content information. Then the tag sequences and nGrams are combined into features called TSGram according to their link relation, and each semi-structured document is represented as a vector of TSGram features. Based on the TSGram features, a classification model is devised to improve the performance of semi-structured data classification. Because TSGram features retain the association between the structural and content information, they are helpful in improving the classification performance. Our experimental results on two real datasets show that the proposed approach is effective.

关键词： semi-structured data semi-structured data classification XML document classification TSGram feature Tag sequence

来源：评论

学校读者我要写书评

暂无评论

Multilevel data Storage Model of Fuzzy semi-structured data 18

Multilevel Data Storage Model of Fuzzy Semi-Structured Data

引用

18th International Conference on Soft Computing and Measurement (SCM)

作者： Yants, V. I. Chernov, A. V. Butakova, M. A. Klimanskaya, E. V. Rostov State Univ Civil Engn Rostov Na Donu Russia

ISBN: (纸本)9781467369619

The aim is the development of a new multi-level model of fuzzy semi-structured information data storage. A distinction of the proposed model from known is the use of extended polybasic intuitionistic sets for description of fuzzy data and representation of fuzzy attributes on three typing levels. The main result is formal description of fuzzy semi-structured data storage. To verify the result an example of application the developed model of data storage in the intellectual diagnostic decision-making system for railway transport is showed.

关键词： semi-structured data data model data storage of fuzzy data

来源：评论

学校读者我要写书评

暂无评论

JSON Tiles: Fast Analytics on semi-structured data 21

JSON Tiles: Fast Analytics on Semi-Structured Data

引用

ACM SIGMOD International Conference on Management of data (SIGMOD)

作者： Durner, Dominik Leis, Viktor Neumann, Thomas Tech Univ Munich Munich Germany Friedrich Schiller Univ Jena Jena Germany

ISBN: (纸本)9781450383431

Developers often prefer flexibility over upfront schema design, making semi-structured data formats such as JSON increasingly popular. Large amounts of JSON data are therefore stored and analyzed by relational database systems. In existing systems, however, JSON's lack of a fixed schema results in slow analytics. In this paper, we present JSON tiles, which, without losing the flexibility of JSON, enables relational systems to perform analytics on JSON data at native speed. JSON tiles automatically detects the most important keys and extracts them transparently - often achieving scan performance similar to columnar storage. At the same time, JSON tiles is capable of handling heterogeneous and changing data. Furthermore, we automatically collect statistics that enable the query optimizer to find good execution plans. Our experimental evaluation compares against state-of-the-art systems and research proposals and shows that our approach is both robust and efficient.

关键词： semi-structured data JSON JSONB Storage Analytics OLAP Scan

来源：评论

学校读者我要写书评

暂无评论

Metamodels and Category Theory in the Transformation of semi-structured data 8

Metamodels and Category Theory in the Transformation of Semi...

引用

8th International Conference in Software Engineering Research and Innovation (CONISOFT)

作者： Canton-Croda, Rosa-Maria Gibaja-Romero, Damian-Emilio UPAEP Univ Dept Engn Puebla Mexico UPAEP Univ Dept Math Puebla Mexico

ISBN: (纸本)9781728184500

Models' transformations involve code abstraction and program description, i.e., models' transformations (MTs) operate in a more diverse set of artifacts than program transformation. Note that MTs allow programmers to link different structures, as Category Theory does for Mathematics, through recognizing similar features and properties. In this paper, we show that Category Theory can be used to describe MTs. Specifically, we propose a categorical framework for transforming a semi-structured data model, an OEM database, into a model of structured data, in UML language. This categorical approach allows us to establish a bridge between such models and the categories of simple and directed graphs, which makes it possible to apply the features of such categories to manage databases.

关键词： semi-structured data Metamodel Abstract Algebra Category Theory

来源：评论

学校读者我要写书评

暂无评论

Automatic classification and taxonomy generation for semi-structured data 15

Automatic classification and taxonomy generation for semi-st...

引用

IEEE International Conference on Computer and Information

作者： Nunes, Bernardo Pereira Lopes, Giseli Rabello Casanova, Marco Antonio Univ Fed Estado Rio de Janeiro PUC Rio Dept Informat Dept Appl Informat Rio de Janeiro RJ Brazil Fed Univ Rio de Janeiro UFRJ Rio De Janeiro RJ Brazil Pontificia Univ Catolica Rio de Janeiro Dept Informat Rio de Janeiro RJ Brazil

ISBN: (纸本)9781509001545

The problem of data classification goes back to the definition of taxonomies covering knowledge areas. With the advent of the Web, the amount of data available increased several orders of magnitude, making manual data classification impossible. This work presents an approach based on the prototype theory to automatically classify semi-structured data, represented by frames, without any previous knowledge about structured classes. Our approach uses a variation of the K-Means algorithm that organizes a set of frames into classes, structured as a strict hierarchy.

关键词： automatic classification clustering k-means semi-structured data taxonomy generator

来源：评论

学校读者我要写书评

暂无评论

Context-Aware Duplicate Detection in semi-structured data Streams 10

Context-Aware Duplicate Detection in Semi-structured Data St...

引用

IEEE World Congress on Services (SERVICES)

作者： Shukla, Parijat Somani, Arun K. Iowa State Univ Dept Elect & Comp Engn Ames IA 50011 USA

ISBN: (纸本)9781479950690

State-of-the-art in duplicate detection in semi-structured data obtains significant improvement by exploiting the schema-related knowledge. Such schema-bound duplicate detection approaches, however, have severe limitations when dealing with multi-sourced, heterogeneous, high-velocity data streams. In this paper, we propose a novel context-aware duplicate detection system which is workload-and complexity-aware, and is adaptable to the underlying computing platform. The system operates in schema-oblivious manner, and relies upon information theory based heuristic and data shaping technique for efficient, and scalable duplicate detection in multi-sourced, heterogeneous data sets. Experiments with real-world data sets show speed up of up to 8X over state-of-the-art schemes, while maintaining upto 92 percent accuracy. In addition, our data shaping technique for GPGPU processing speeds up the duplicate detection throughput by up to two orders of magnitude.

关键词： data streams duplicate detection semi-structured data novel architectures GPUs data shaping

来源：评论

学校读者我要写书评

暂无评论

Scalable Processing of Contemporary semi-structured data on Commodity Parallel Processors - A Compilation-based Approach 19

Scalable Processing of Contemporary Semi-Structured Data on ...

引用

24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)

作者： Jiang, Lin Sun, Xiaofan Farooq, Umar Zhao, Zhijia Univ Calif Riverside Riverside CA 92521 USA

ISBN: (纸本)9781450362405

JSON ( JavaScript Object Notation) and its derivatives are essential in the modern computing infrastructure. However, existing software often fails to process such types of data in a scalable way, mainly for two reasons: (i) the processing often requires to build a memory-consuming parse tree;(ii) there exist inherent dependences in processing the data stream, preventing any data-level parallelization. Facing the challenges, developers often have to construct ad-hoc pre-parsers to split the data stream in order to reduce the memory consumption and increase the data parallelism. However, this strategy requires more programming efforts. Moreover, the pre-parsing itself is non-trivial to parallelize, thus introducing a new serial bottleneck. To solve the dilemma, this work introduces a scalable yet fully automatic solution - a compilation system, namely JPStream, that compiles standard JSONPath queries into parallel executables with bounded memory footprints. First, JPStream adopts a stream processing design that combines the querying and parsing into one pass, without generating any in-memory parse tree. To achieve this, JPStream uses a novel joint compilation technique that compiles the queries and the JSON syntax together into a single automaton. Furthermore, JPStream leverages the "enumerability" of automaton to break the dependences and reason about the transition rules to prune infeasible cases. It also features a module that learns data constraints from the input data to enhance the pruning. Evaluation on real-world JSON datasets with standard JSONPath queries shows that JPStream can reduce the memory consumption significantly, by up to 95%, meanwhile achieving near-linear speedup on multicore and manycore processors.

关键词： JSON semi-structured data querying parsing pushdown automata parallelization multicore

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：