检索结果-内蒙古大学图书馆

QEVIS: Multi-Grained Visualization of distributed query execution

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024年第1期30卷 153-163页

作者： Shen, Qiaomu You, Zhengxin Yan, Xiao Zhang, Chaozu Xu, Ke Zeng, Dan Qin, Jianbin Tang, Bo Southern Univ Sci & Technol Res Inst Trustworthy Autonomous Syst Shenzhen Peoples R China Southern Univ Sci & Technol Dept Comp Sci & Engn Shenzhen Peoples R China Huawei Technol Co Ltd Shenzhen Peoples R China Shenzhen Univ Shenzhen Inst Comp Sci Shenzhen Peoples R China

distributed query processing systems such as Apache Hive and Spark are widely-used in many organizations for large-scale data analytics. Analyzing and understanding the query execution process of these systems are daily routines for engineers and crucial for identifying performance problems, optimizing system configurations, and rectifying errors. However, existing visualization tools for distributed query execution are insufficient because (i) most of them (if not all) do not provide fine-grained visualization (i.e., the atomic task level), which can be crucial for understanding query performance and reasoning about the underlying execution anomalies, and (ii) they do not support proper linkages between system status and query execution, which makes it difficult to identify the causes of execution problems. To tackle these limitations, we propose QEVIS, which visualizes distributed query execution process with multiple views that focus on different granularities and complement each other. Specifically, we first devise a query logical plan layout algorithm to visualize the overall query execution progress compactly and clearly. We then propose two novel scoring methods to summarize the anomaly degrees of the jobs and machines during query execution, and visualize the anomaly scores intuitively, which allow users to easily identify the components that are worth paying attention to. Moreover, we devise a scatter plot-based task view to show a massive number of atomic tasks, where task distribution patterns are informative for execution problems. We also equip QEVIS with a suite of auxiliary views and interaction methods to support easy and effective cross-view exploration, which makes it convenient to track the causes of execution problems. QEVIS has been used in the production environment of our industry partner, and we present three use cases from real-world applications and user interview to demonstrate its effectiveness. QEVIS is open-source at https://***/

关键词： visual analytics system distributed query execution performance analysis

来源：评论

学校读者我要写书评

暂无评论

distributed query execution under access restrictions

引用

COMPUTERS & SECURITY 2023年第1期127卷

作者： Vimercati, Sabrina De Capitani di Foresti, Sara Jajodia, Sushil Livraga, Giovanni Paraboschi, Stefano Samarati, Pierangela Univ Milan Via Celoria 18 I-20133 Milan MI Italy George Mason Univ 10401 York River Rd Fairfax VA 22030 USA Univ Bergamo Viale Marconi 5 I-24044 Dalmine BG Italy

The availability of a multitude of data sources has naturally increased the need for subjects to collaborate for supporting distributed computations that combine different data collections for their elaboration and analysis. Due to the quick pace at which datasets grow, often the authorities collecting and owning such datasets resort to external third parties (e.g., cloud providers) for their storage and management. Data un-der the control of different authorities are autonomously encrypted (using different encryption schemes and keys) for their external storage. This makes distributed computations combining these sources dif-ficult to support. In this paper, we propose an approach enabling collaborative computations over data encrypted in storage, selectively involving also subjects that might not be authorized for accessing the data in plaintext when their collaboration is considered economically convenient. We also consider the possible adoption of trusted hardware components, to enable the evaluation of operations over plain -text data at non-fully trusted computational providers. The experimental results confirm the economic benefits that can be enabled by our proposal.(c) 2022 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license ( http://***/licenses/by-nc-nd/4.0/ )

关键词： distributed query execution Controlled data sharing Authorization model Relation profile Cloud computing

来源：评论

学校读者我要写书评

暂无评论

Protecting Data and Queries in Cloud-Based Scenarios

引用

SN Computer Science 2023年第5期4卷 440页

作者： De Capitani di Vimercati, Sabrina Foresti, Sara Samarati, Pierangela Computer Science Department Università degli Studi di Milano Via Celoria 18 Milan 20133 Italy

The availability of cloud services offered by different providers brings several advantages to users and companies, facilitating the storage, sharing, and processing of data. At the same time, the adoption of cloud services brings new security and privacy risks and challenges. As a matter of fact when leveraging cloud-based services for data storage and processing, data owners loose direct control on their data. Data and queries over them could then be at risk for both potentially improper exposure, compromising their confidentiality, or tampering, compromising their integrity. In this paper, we discuss the main issues to be addressed for guaranteeing data security and privacy in cloud-based storage and processing. We illustrate the different challenges to be considered and the research directions toward their solutions. © 2023, The Author(s).

关键词： Access confidentiality Cloud-based scenario Data protection distributed query execution query integrity querying encrypted data Selective data sharing

来源：评论

学校读者我要写书评

暂无评论

Exploring Controlled RDF Distribution 8

Exploring Controlled RDF Distribution

引用

8th IEEE International Conference on Cloud Computing Technology and Science (CloudCom)

作者： Penteado, Raqueline R. M. Schroeder, Rebeca Hara, Carmem S. Univ Estadual Maringa BR-87020900 Maringa Parana Brazil Univ Estado Santa Catarina BR-89219710 Joinville SC Brazil Univ Fed Parana BR-81531990 Curitiba Parana Brazil

ISBN: (纸本)9781509014453

RDF datasets have increased rapidly over the last few years. In order to process SPARQL queries on these large datasets, much effort has been spent on developing horizontally scalable techniques, which involve data partitioning and parallel query processing. While distribution may provide storage scalability, it may also incur high communication costs for processing queries. In this paper, we present a parallel and distributed query processing approach that explores the existence of data allocation patterns, provided by a controlled data distribution, that determine how RDF triples should be grouped and stored on the same server. Fragments of the RDF datastore follow a given allocation pattern and correspond also to units of communication among servers. Based on this distribution model, we define two communication strategies for query processing: get-frag, which requests remote servers to send fragments that contain data required by a query, and send-result, which forwards intermediate results. These strategies are combined on a method, called 2ways, that chooses the adequate communication strategy whenever queries traverse fragment boundaries. We provide a cost function used to determine this choice and present experimental results. They show that our proposed technique effectively reduces the communication cost and improves the response time for processing SPARQL queries on a distributed RDF datastore.

关键词： data distribution distributed query execution RDF SPARQL

来源：评论

学校读者我要写书评

暂无评论

DHTJoin: processing continuous join queries using DHT networks

引用

distributed AND PARALLEL DATABASES 2009年第2-3期26卷 291-317页

作者： Palma, Wenceslao Akbarinia, Reza Pacitti, Esther Valduriez, Patrick Univ Nantes INRIA Nantes France Univ Nantes LINA Nantes France Univ Montpellier 2 INRIA Montpellier France Univ Montpellier 2 LIRMM Montpellier France

Continuous query processing in data stream management systems (DSMS) has received considerable attention recently. Many applications share the same need for processing data streams in a continuous fashion. For most distributed streaming applications, the centralized processing of continuous queries over distributed data is simply not viable. This paper addresses the problem of computing approximate answers to continuous join queries over distributed data streams. We present a new method, called DHTJoin, which combines hash-based placement of tuples in a distributed Hash Table (DHT) and dissemination of queries by exploiting the embedded trees in the underlying DHT, thereby incurring little overhead. DHTJoin also deals with join attribute value skew which may hurt load balancing and result completeness. We provide a performance evaluation of DHTJoin which shows that it can achieve significant performance gains in terms of network traffic.

关键词： Data stream management Continuous join queries DHT networks distributed query execution Load balancing Result completeness

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：