检索结果-内蒙古大学图书馆

International Conference on Management of Data (SIGMOD)

作者： Del Monte, Bonaventura Zeuch, Steffen Rabl, Tilmann Markl, Volker Tech Univ Berlin Berlin Germany DFKI GmbH Kaiserslautern Germany Potsdam Univ HPI Potsdam Germany

ISBN: (纸本)9781450392495

Remote Direct Memory Access (RDMA) hardware has bridged the gap between network and main memory speed and thus invalidated the common assumption that network is often the bottleneck in distributed data processing systems. However, high-speed networks do not provide "plug-and-play" performance (e.g., using IP-over-InfiniBand) and require a careful co-design of system and application logic. As a result, system designers need to rethink the architecture of their data management systems to benefit from RDMA acceleration. In this paper, we focus on the acceleration of stream processing engines, which is challenged by real-time constraints and state consistency guarantees. To this end, we propose Slash, a novel stream processing engine that uses high-speed networks and RDMA to efficiently execute distributed streaming computations. Slash embraces a processing model suited for RDMA acceleration and scales out by omitting the expensive data re-partitioning demands of scale-out SPEs. While scale-out SPEs rely on data re-partitioning to execute a query over many nodes, Slash uses RDMA to share mutable state among nodes. Overall, Slash achieves a throughput improvement up to two orders of magnitude over existing systems deployed on an InfiniBand network. Furthermore, it is up to a factor of 22 faster than a self-developed solution that relies on RDMA-based data repartitioning to scale out query processing.

关键词： stream management distributed and parallel databases

来源：评论

学校读者我要写书评

暂无评论

parallel query processing in a polystore

引用

distributed and parallel databases 2021年第4期39卷 939-977页

作者： Kranas, Pavlos Kolev, Boyan Levchenko, Oleksandra Pacitti, Esther Valduriez, Patrick Jimenez-Peris, Ricardo Patino-Martinez, Marta LeanXcale Madrid Spain Univ Politecn Madrid Distributed Syst Lab Madrid Spain Univ Montpellier INRIA CNRS LIRMM Montpellier France

The blooming of different data stores has made polystores a major topic in the cloud and big data landscape. As the amount of data grows rapidly, it becomes critical to exploit the inherent parallel processing capabilities of underlying data stores and data processing platforms. To fully achieve this, a polystore should: (i) preserve the expressivity of each data store's native query or scripting language and (ii) leverage a distributed architecture to enable parallel data integration, i.e. joins, on top of parallel retrieval of underlying partitioned datasets. In this paper, we address these points by: (i) using the polyglot approach of the CloudMdsQL query language that allows native queries to be expressed as inline scripts and combined with SQL statements for ad-hoc integration and (ii) incorporating the approach within the LeanXcale distributed query engine, thus allowing for native scripts to be processed in parallel at data store shards. In addition, (iii) efficient optimization techniques, such as bind join, can take place to improve the performance of selective joins. We evaluate the performance benefits of exploiting parallelism in combination with high expressivity and optimization through our experimental validation.

关键词： Database integration Heterogeneous databases distributed and parallel databases Polystores Query languages Query processing

来源：评论

学校读者我要写书评

暂无评论

Rhino: Efficient Management of Very Large distributed State for Stream Processing Engines 20

Rhino: Efficient Management of Very Large Distributed State ...

引用

ACM SIGMOD International Conference on Management of Data (SIGMOD)

作者： Del Monte, Bonaventura Zeuch, Steffen Rabl, Tilmann Markl, Volker Tech Univ Berlin Berlin Germany DFKI GmbH Kaiserslautern Germany Univ Potsdam HPI Potsdam Germany

ISBN: (纸本)9781450367356

Scale-out stream processing engines (SPEs) are powering large big data applications on high velocity data streams. Industrial setups require SPEs to sustain outages, varying data rates, and low-latency processing. SPEs need to transparently reconfigure stateful queries during runtime. However, state-of-the-art SPEs are not ready yet to handle on-the-fly reconfigurations of queries with terabytes of state due to three problems. These are network overhead for state migration, consistency, and overhead on data processing. In this paper, we propose Rhino, a library for efficient reconfigurations of running queries in the presence of very large distributed state. Rhino provides a handover protocol and a state migration protocol to consistently and efficiently migrate stream processing among servers. Overall, our evaluation shows that Rhino scales with state sizes of up to TBs, reconfigures a running query 15 times faster than the state-of-the-art, and reduces latency by three orders of magnitude upon a reconfiguration.

关键词： stateful stream processing distributed and parallel databases

来源：评论

学校读者我要写书评

暂无评论

Leveraging 24/7 Availability and Performance for distributed Real-Time Data Warehouses

Leveraging 24/7 Availability and Performance for Distributed...

引用

36th Annual IEEE International Computer Software and Applications Conference (COMPSAC)

作者： Santos, Ricardo Jorge Bernardino, Jorge Vieira, Marco Univ Coimbra FCTUC DEI CISUC Coimbra Portugal Polytechn Inst Coimbra CISUC DEIS ISEC Coimbra Portugal

ISBN: (纸本)9780769547367

Real-time Data Warehouses (DWs) must be able to deal with continuous updates while ensuring 24/7 availability. To improve their performance, distributing data using round-robin algorithms on clusters of shared-nothing machines is normally used. This paper proposes a solution for distributed DW databases that ensures its continuous availability and deals with frequent data loading requirements, while adding small performance overhead. We use a data striping and replication architecture to distribute portions of each fact table among pairs of slave nodes, where each slave node is an exact replica of its partner. This allows balancing query execution and replacing any defective node, ensuring the system's continuous availability. The size of each portion in a given node depends on its individual features, namely performance benchmark measures and dedicated database RAM. The estimated cost for executing each query workload in each slave node is also used for balancing query performance. We include experiments using the TPC-H decision support benchmark to evaluate the scalability of the proposed solution and show that it outperforms standard round-robin distributed DW setups.

关键词： Real-time data warehousing availability fault tolerance data replication and redundancy distributed and parallel databases load balancing performance optimization

来源：评论

学校读者我要写书评

暂无评论

Asynchronous View Maintenance for VLSD databases

Asynchronous View Maintenance for VLSD Databases

引用

35th ACM SIGMOD Conference

作者： Agrawal, Parag Silberstein, Adam Cooper, Brian F. Srivastava, Utkarsh Ramakrishnan, Raghu Stanford Univ Stanford CA 94305 USA

ISBN: (纸本)9781605585543

The query models of the recent generation of very large scale distributed (VLSD) shared-nothing data storage systems, including our own PNUTS and others (e.g. Big Table, Dynamo, Cassandra, etc.) are intentionally simple, focusing on simple lookups and scans and trading query expressiveness for massive scale. Indexes and views can expand the query expressiveness of such systems by materializing more complex access paths and query results. In this paper, we examine mechanisms to implement indexes and views in a massive scale distributed database. For web applications, minimizing update latencies is critical, so we advocate deferring the work of maintaining views and indexes as much as possible. We examine the design space, and conclude that two types of view implementations, called remote view tables (RVTs) and local view tables (LVTs), provide a good tradeoff between system throughput and minimizing view staleness. We describe how to construct and maintain such view tables, and how they can be used to implement indexes, group-by-aggregate views, equijoin views and selection views. We also introduce and analyze a consistency model that makes it easier for application developers to cope with the impact of deferred view maintenance. An empirical evaluation quantifies the maintenance costs of our views, and shows that they can significantly improve the cost of evaluating complex queries.

关键词： indexes views distributed and parallel databases

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：