检索结果-内蒙古大学图书馆

Generating custom code for efficient query execution on heterogeneous processors

VLDB JOURNAL 2018年第6期27卷 797-822页

作者： Bress, Sebastian Koecher, Bastian Funke, Henning Zeuch, Steffen Rabl, Tilmann Markl, Volker DFKI GmbH Berlin Germany TU Berlin Berlin Germany TU Dortmund Dortmund Germany

Processor manufacturers build increasingly specialized processors to mitigate the effects of the power wall in order to deliver improved performance. Currently, database engines have to be manually optimized for each processor which is a costly and error- prone process. In this paper, we propose concepts to adapt to and to exploit the performance enhancements of modern processors automatically. Our core idea is to create processor-specific code variants and to learn a well-performing code variant for each processor. These code variants leverage various parallelization strategies and apply both generic- and processor-specific code transformations. Our experimental results show that the performance of code variants may diverge up to two orders of magnitude. In order to achieve peak performance, we generate custom code for each processor. We show that our approach finds an efficient custom code variant for multi-core CPUs, GPUs, and MICs.

关键词： database systems database query processing query compilation Heterogeneous processors CPU GPU MIC Code generation Code variants Variant optimization

来源：评论

学校读者我要写书评

暂无评论

Data Lab-A community science platform

引用

ASTRONOMY AND COMPUTING 2020年 33卷

作者： Nikutta, R. Fitzpatrick, M. Scott, A. Weaver, B. A. NSFs Natl Opt Infrared Astron Res Lab 950 N Cherry Ave Tucson AZ 85719 USA

Data Lab is an open-access science platform developed and operated by the Community and Science Data Center (CSDC) at NSF's National Optical-Infrared Astronomy Research Laboratory (NOIRLab). It serves public photometric survey datasets, provides interactive and programmatic data access, and SQL/ADQL query capabilities via TAP. Users also receive generous storage allocations with VOSpace and MyDB, co-located with our data holdings. A host of services such as cross-matching, image cutouts via SIA, file services for survey data, and a Jupyter notebook interface for analysis close to the data complement the mission statement. Launched in 2017 at the National Optical Astronomy Observatory, Data Lab supports a base of over 1,300 registered users, processes on average 15,000 queries daily, serves over 50 TB of photometric catalogs, and provides access to over 2 PB of survey image products at NOIRLab's Science Data Archive. Future development will include support for massive spectroscopic datasets and for processing of alert streams generated by e.g. ZTF and LSST. Users will also be able to create and administrate ad hoc user groups for shared data access and scientific analysis, and will enjoy containerized services and notebook spaces. (C) 2020 The Author( s ). Published by Elsevier B.V.

关键词： Surveys Catalogs Astronomical databases Data analysis Computing platforms database query processing

来源：评论

学校读者我要写书评

暂无评论

Probe minimization by schedule optimization: Supporting top-k queries with expensive predicates

引用

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2007年第5期19卷 646-662页

作者： Hwang, Seung-won Chang, Kevin Chen-Chuan Pohang Univ Sci & Technol Dept Comp Sci & Engn Pohang 790784 South Korea Univ Illinois Dept Comp Sci Urbana IL 61801 USA

This paper addresses the problem of evaluating ranked top-k queries with expensive predicates. As major DBMSs now all support expensive user-defined predicates for Boolean queries, we believe such support for ranked queries will be even more important: First, ranked queries often need to model user-specific concepts of preference, relevance, or similarity, which call for dynamic user-defined functions. Second, middleware systems must incorporate external predicates for integrating autonomous sources typically accessible only by per-object queries. Third, ranked queries often accompany Boolean ranking conditions, which may turn predicates into expensive ones, as the index structure on the predicate built on the base table may be no longer effective in retrieving the filtered objects in order. Fourth, fuzzy joins are inherently expensive, as they are essentially user-defined operations that dynamically associate multiple relations. These predicates, being dynamically defined or externally accessed, cannot rely on index mechanisms to provide zero-time sorted output, and must instead require per-object probe to evaluate. To enable probe minimization, we develop the problem as cost-based optimization of searching over potential probe schedules. In particular, we decouple probe scheduling into object and predicate scheduling problems and develop an analytical object scheduling optimization and a dynamic predicate scheduling optimization, which combined together form a cost-effective probe schedule.

关键词： database query processing distributed information systems database systems

来源：评论

学校读者我要写书评

暂无评论

A graphical user-interface for knowledge discovery in databases

引用

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE 1996年第6期9卷 691-700页

作者： Wu, X Cercone, N UNIV REGINA REGINASK S4S 0A2CANADA

This paper describes a graphical user-interface for database-oriented knowledge discovery systems, DBLEARN, which has been developed for extracting knowledge rules from relational databases. The interface, designed using a query-by-example approach, provides a graphical means of specifying knowledge-discovery tasks. The interface supplies a graphical browsing facility to help users to perceive the nature of the target database structure. In order to guide users' task specification, a cooperative, menu-based guidance facility has been integrated into the interface. The interface also supplies a graphical interactive adjusting facility for helping users to refine the task specification to improve the quality of learned knowledge rules. Copyright (C) 1996 Elsevier Science Ltd

关键词： graphical user-interfaces knowledge discovery systems database mining database query processing AI applications

来源：评论

学校读者我要写书评

暂无评论

Fusion Coding of Correlated Sources for Storage and Selective Retrieval

引用

IEEE TRANSACTIONS ON SIGNAL processing 2010年第3期58卷 1722-1731页

作者： Ramaswamy, Sharadh Nayak, Jayant Rose, Kenneth Mayachitra Inc Santa Barbara CA 93111 USA Univ Calif Santa Barbara Dept Elect & Comp Engn Santa Barbara CA 93106 USA

We focus on a new, potentially important application of source coding directed toward storage and retrieval, termed fusion coding of correlated sources. The task at hand is to efficiently store multiple correlated sources in a database so that, at any point of time in the future, data from a selective subset of sources specified by user can be efficiently retrieved. Only statistical information about future queries is available in advance. A typical application scenario would be in storage of correlated data generated by dense sensor networks, where information from specific regions is requested in the future. We propose a fusion coder (FC) for lossy storage and retrieval, wherein different queries are handled by allowing for selective (compressed) bit retrieval. We derive the properties of an optimal FC and present an iterative algorithm for its design. Since iterative design is initialization-dependent, we present initialization heuristics that help avoid poor local optima. An analysis of design complexity reveals complexity growth with query-set size. We first tackle this problem by exploiting optimality properties of FCs. We also consider quantization of the query-space with decision trees in order to adapt to new queries, unseen during FC design. Experiments conducted on real and synthetic data-sets demonstrate that the proposed FC is able to achieve significantly better tradeoffs than joint compression by vector quantization (VQ), with retrieval speedups reaching 3x and distortion gains of up to 3.5 dB possible.

关键词： database query processing multisensor systems source coding vector quantization (VQ)

来源：评论

学校读者我要写书评

暂无评论

Fast in-database cross-matching of high-cadence, high-density source lists with an up-to-date sky model

引用

ASTRONOMY AND COMPUTING 2018年 23卷 27-39页

作者： Scheers, B. Bloemen, S. Muhleisen, H. Schellart, P. van Elteren, A. Kersten, M. Groot, P. J. CWI POB 94079 NL-1090 GB Amsterdam Netherlands Radboud Univ Nijmegen IMAPP Dept Astrophys NL-6500 GL Nijmegen Netherlands NOVA Opt InfraRed Instrumentat Grp Oude Hoogeveensedijk 4 NL-7991 PD Dwingeloo Netherlands Princeton Univ Dept Astrophys Sci Princeton NJ 08544 USA Leiden Univ Leiden Observ POB 9513 NL-2300 RA Leiden Netherlands

Coming high-cadence wide-field optical telescopes will image hundreds of thousands of sources per minute. Besides inspecting the near real-time data streams for transient and variability events, the accumulated data archive is a wealthy laboratory for making complementary scientific discoveries. The goal of this work is to optimise column-oriented database techniques to enable the construction of a full-source and light-curve database for large-scale surveys, that is accessible by the astronomical community. We adopted LOFAR's Transients Pipeline as the baseline and modified it to enable the processing of optical images that have much higher source densities. The pipeline adds new source lists to the archive database, while cross-matching them with the known catalogued sources in order to build a full light-curve archive. We investigated several techniques of indexing and partitioning the largest tables, allowing for faster positional source look-ups in the cross matching algorithms. We monitored all query run times in long-term pipeline runs where we processed a subset of IPHAS data that have image source density peaks over 170,000 per field of view (500,000 deg(-2)). Our analysis demonstrates that horizontal table partitions of declination widths of one-degree control the query run times. Usage of an index strategy where the partitions are densely sorted according to source declination yields another improvement. Most queries run in sublinear time and a few (< 20%) run in linear time, because of dependencies on input source-list and result-set size. We observed that for this logical database partitioning schema the limiting cadence the pipeline achieved with processing IPHAS data is 25 s. (C) 2018 Elsevier B.V. All rights reserved.

关键词： Telescopes Time-domain astrophysics Astronomical databases Surveys Catalogues database query processing

来源：评论

学校读者我要写书评

暂无评论

SORT VS HASH REVISITED

引用

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 1994年第6期6卷 934-944页

作者： GRAEFE, G LINVILLE, A SHAPIRO, LD UNIV COLORADO DEPT COMP SCIBOULDERCO 80309 PORTLAND STATE UNIV DEPT COMP SCIPORTLANDOR 97207

Efficient algorithms for processing large volumes of data are very important both for relational and new object-oriented database systems. Many query-processing operations can be implemented using sort- or hash-based algorithms, e.g., intersection, join, and duplicate elimination. In the early relational database systems, only sort-based algorithms were employed. In the last decade, hash-based algorithms have gained acceptance and popularity, and are often considered generally superior to sort-based algorithms such as merge-join. In this article, we compare the concepts behind sort- and hash-based query-processing algorithms and conclude that 1) many dualities exist between the two types of algorithms, 2) their costs differ mostly by percentages rather than factors, 3) several special cases exist that favor one or the other choice, and 4) there is a strong reason why both hash- and sort-based algorithms should be available in a query-processing system. Our conclusions are supported by experiments performed using the Volcano query execution engine.

关键词： database query processing VALUE-MATCHING PERFORMANCE SORTING MERGE-JOIN HASHING HASH JOIN HYBRID HASH JOIN COMPARISON DUALITY

来源：评论

学校读者我要写书评

暂无评论

AGGREGATE query processing FOR SEMANTIC WEB databaseS: AN ALGEBRAIC APPROACH

引用

INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING 2007年第4期1卷 479-495页

作者： Seid, Dawit Mehrotra, Sharad Univ Calif Irvine Dept Comp Sci Irvine CA 92697 USA

As a growing number of applications represent data as semantic graphs like RDF (Resource Description Format) and the many entity-attribute-value formats, query languages for such data are being required to support operations beyond graph pattern matching and inference queries. Specifically the ability to express aggregate queries is an important feature which is either lacking or is implemented with little attention to the peculiarities of the data model. In this paper, we study the meaning and implementation of grouping and aggregate queries over RDF graphs. We first define grouping and aggregate operators algebraically and then show how the SPARQL query language can be extended to express grouping and aggregate queries.

关键词： Semantic web database query processing aggregation

来源：评论

学校读者我要写书评

暂无评论

ON SORT-MERGE ALGORITHM FOR BAND JOINS

引用

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 1995年第3期7卷 508-510页

作者： LU, HJ TAN, KL Department ot Information Syatems and Computer Science National University of Singapore Singapore

This correspondence proposes two ways to improve the soft-merge based band join algorithm. The techniques proposed address issues that have not been previously discussed: to choose a right relation as the inner relation to achieve better performance and to optimally allocate and adjust buffer allocations to make the algorithms robust to data skew and estimation errors.

关键词： BAND JOIN ALGORITHMS BUFFER ALLOCATION database query processing DATA SKEW HANDLING

来源：评论

学校读者我要写书评

暂无评论

A query Simulation System To Illustrate database query Execution 08

A Query Simulation System To Illustrate Database Query Execu...

引用

39th ACM Technical Symposium on Computer Science Education

作者： Allenstein, Brett Yost, Andrew Wagner, Paul Morrison, Joline Skyline Technol Green Bay WI 54301 USA

ISBN: (纸本)9781595939470

The underlying processes that enable database query execution are fundamental to understanding database management systems. However, these processes are complex and can be difficult to explain and illustrate. To address this problem, we have developed a Java-based query simulation system that enables students to visualize the steps involved in processing DML queries. We performed a field experiment to evaluate the system, and the results suggest that the system improves student comprehension of the query execution process.

关键词： database query processing computer-based simulation visualization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：