As parallel database systems have large amounts of data to process, it is important to utilize a scalable and efficient horizontal database partitioning method. The existing partitioning methods have major drawbacks t...
详细信息
As parallel database systems have large amounts of data to process, it is important to utilize a scalable and efficient horizontal database partitioning method. The existing partitioning methods have major drawbacks that not only cause large amounts of data redundancy but also still require expensive shuffle operations for join queries in many cases-despite their high data redundancy. We elucidate upon the drawbacks originating from the tree-basedpartitioning schemes and propose a novel graph-based database partitioning method called GPT that both improves the query performance and reduces data redundancy. We integrate the proposed GPT method into a parallel query processing system, Spark SQL, across all the relevant layers and modules, including the query plan generator and the scan operator. Through extensive experiments using three benchmarks, TPC-DS, IMDB and BioWarehouse, we show that GPT significantly outperforms the state-of-the-art method in terms of both storage overhead and query performance. (C) 2018 Elsevier Inc. All rights reserved.
Subsurface flow simulation is vital for many geoscience applications, including geoenergy extraction and gas (energy) storage. Reservoirs are often highly heterogeneous and naturally fractured. Therefore, scalable sim...
详细信息
Subsurface flow simulation is vital for many geoscience applications, including geoenergy extraction and gas (energy) storage. Reservoirs are often highly heterogeneous and naturally fractured. Therefore, scalable simulation strategies are crucial to enable efficient and reliable operational strategies. One of these scalable methods, which has also been recently deployed in commercial reservoir simulators, is algebraic multiscale (AMS) solvers. AMS, like all multilevel schemes, is found to be highly sensitive to the types (geometries and size) of coarse grids and local basis functions. Commercial simulators benefit from a graph-based partitioner;e.g., METIS to generate the multiscale coarse grids. METIS minimizes the amount of interfaces between coarse partitions, while keeping them of similar size which may not be the requirement to create a coarse grid. In this work, we employ a novel approach to generate the multiscale coarse grids, using unsupervised learning methods which is based on optimizing different parameter. We specifically use the Louvain algorithm and Multi-level Markov clustering. The Louvain algorithm optimizes modularity, a measure of the strength of network division while Markov clustering simulates random walks between the cells to find clusters. It is found that the AMS performance is improved when compared with the existing METIS-based partitioner on several field-scale test cases. This development has the potential to enable reservoir engineers to run ensembles of thousands of detailed models at a much faster rate.
暂无评论