检索结果-内蒙古大学图书馆

33rd IEEE International Parallel and distributed processing Symposium (IPDPS)

作者： Arnold, Jason Glavic, Boris Raicu, Ioan IIT Chicago IL 60616 USA

ISBN: (纸本)9781728112466

The scalability of systems such as Hive and Spark SQL that are built on top of big data platforms have enabled query processing over very large data sets. However, the per-node performance of these systems is typically low compared to traditional relational databases. Conversely, Massively Parallel processing (MPP) databases do not scale as well as these systems. We present HRDBMS, a fully implemented distributed shared-nothing relational database developed with the goal of improving the scalability of OLAP queries. HRDBMS achieves high scalability through a principled combination of techniques from relational and big data systems with novel communication and work-distribution techniques. While we also support serializable transactions, the system has not been optimized for this use case. HRDBMS runs on a custom distributed and asynchronous execution engine that was built from the ground up to support highly parallelized operator implementations. Our experimental comparison with Hive, Spark SQL, and Greenplum confirms that HRDBMS's scalability is on par with Hive and Spark SQL (up to 96 nodes) while its per-node performance can compete with MPP databases like Greenplum.

关键词： SQL big data distributed query processing

来源：评论

学校读者我要写书评

暂无评论

Compile-Time Code Generation for Embedded Data-Intensive query Languages 7

Compile-Time Code Generation for Embedded Data-Intensive Que...

引用

IEEE International Congress on Big Data (IEEE BigData) Part of the IEEE World Congress on Services

作者： Fegaras, Leonidas Noor, Md Hasanuzzaman Univ Texas Arlington CSE Arlington TX 76019 USA

ISBN: (纸本)9781538672327

Many emerging Big Data programming environments, such as Spark and fink, provide powerful APIs that are inspired by functional programming. However, because of the complexity involved in developing and fine-tuning data analysis applications using the provided APIs, many programmers prefer to use declarative languages, such as Hive and Spark SQL, to code their distributed applications. Unfortunately, current data analysis query languages, which are typically based on the relational model, cannot effectively capture the rich data types and computations required for complex data analysis applications. Furthermore, these query languages are not well-integrated with the host programming language, as they are based on an incompatible data model, and are checked for correctness at runtime, which results in a significantly longer program development time. To address these shortcomings, we introduce a new query language for data-intensive scalable computing, called DIQL, that is deeply embedded in Scala, and a query optimization framework that optimizes and translates DIQL queries to byte code at compile-time. In contrast to other query languages, our query embedding eliminates impedance mismatch as any Scala code can be seamlessly mixed with SQL-like syntax, without having to add any special declaration. DIQL supports nested collections and hierarchical data and allows query nesting at any place in a query. With DIQL, programmers can express complex data analysis tasks, such as PageRank and matrix factorization, using SQL-like syntax exclusively. The DIQL query optimizer can find any possible join in a query, including joins hidden across deeply nested queries, thus unnesting any form of query nesting. Currently, DIQL can run on three Big Data platforms: Apache Spark, Apache fink, and Twitter's Cascading/Scalding.

关键词： Big Data distributed query processing query Optimization Embedded query Languages

来源：评论

学校读者我要写书评

暂无评论

Parallel and distributed Intra query Transient Fault Tolerance Model via Parity Checking 14

Parallel and Distributed Intra Query Transient Fault Toleran...

引用

14th International Conference on Electronics Computer and Computation (ICECCO)

作者： Yusuf, Salisu Ibrahim Junaidu, Sahalu B. Nile Univ Nigeria Dept Comp Sci Abuja Nigeria Ahmadu Bello Univ Dept Comp Sci Abuja Nigeria

ISBN: (纸本)9781728101323

In a distributed parallel query execution, complex queries are executed by splitting them into partially related simple subqueries executing each on a different node, communication between machines is often done by message exchange for shared-nothing architecture based grid. Integrity of messages can be lost by temporary or permanent interference along the communication network. Fault tolerance strategies are used to keep the system running in the presence of fault. This is traditionally done through query restart, replication or check pointing and other variations of these approaches to improve latency, restoration time and reduce cost of execution. These processes include: monitoring, detection and tolerance. Transient faults are caused by interference in the medium of exchange which may pass undetected but yielding an incorrect query result. Moreover, the traditional fault tolerance, there is a strong dependency between the nodes. In this research we proposed a model of a fault tolerance strategy that will allow self-detection and tolerate transient fault with less dependency between nodes. The model will be compared with the tradition strategies in terms of detection ability, inter node dependency, and cost of execution.

关键词： distributed query processing intra-query fault tolerance transient fault shared nothing parallel query execution

来源：评论

学校读者我要写书评

暂无评论

Extended Adaptive Join Operator with Bind-Bloom Join for Federated SPARQL Queries

引用

INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING 2017年第3期13卷 47-72页

作者： Oguz, Damla Yin, Shaoyi Ergenc, Belgin Hameurlain, Abdelkader Dikenelli, Oguz Paul Sabatier Univ Inst Res Comp Sci Toulouse IRIT Toulouse France Izmir Inst Technol Dept Comp Engn Izmir Turkey Ege Univ Dept Comp Engn Izmir Turkey

The goal of query optimization in query federation over linked data is to minimize the response time and the completion time. Communication time has the highest impact on them both. Static query optimization can end up with inefficient execution plans due to unpredictable data arrival rates and missing statistics. This study is an extension of adaptive join operator which always begins with symmetric hash join to minimize the response time, and can change the join method to bind join to minimize the completion time. The authors extend adaptive join operator with bind-bloom join to further reduce the communication time and, consequently, to minimize the completion time. They compare the new operator with symmetric hash join, bind join, bind-bloom join, and adaptive join operator with respect to the response time and the completion time. Performance evaluation shows that the extended operator provides optimal response time and further reduces the completion time. Moreover, it has the adaptation ability to different data arrival rates.

关键词： Adaptive query Optimization Bloom Filter distributed query processing Join Methods Linked Data query Federation

来源：评论

学校读者我要写书评

暂无评论

Procedure Graph Model for Automatic RFID Data processing Service Management

引用

IEEE INTERNET OF THINGS JOURNAL 2017年第3期4卷 713-722页

作者： Xu, Yang Agyemang, Brighter Wu, Shuai Liu, Ming Univ Elect Sci & Technol China Sch Comp Sci & Engn Chengdu 611731 Peoples R China

Process management is practical in the state-of-the-art of Internet of things research. However, this has become a bottleneck in recent years since an extreme amount of heterogeneous items have to be recorded and traced with radio frequency identification (RFID) tags. In a typical application of process synthesis management, each item will be involved with multiple processes but when those processes are interconnected, an extremely complex network emerges and has to be managed. Existing works on processing management systems, however, are always case-based and only focus on specific application domains. Thus, the general applied processing management model is rather limited. In this paper, we summarize the characteristics of the RFID application domains, and by abstraction, we propose an innovation of the RFID processing model. In this model, each RFID data were deemed as an operation record and all their processing services are logically interconnected and organized as a procedure graph. In addition, we abstract and summarize the basic RFID data processes into a few types. The advantage of this design is that the basic RFID data processing logic can be preprogrammed and in a real domain, the system can be dynamically programmed by automatically constructing the procedure graph nodes with those basic processes and mapping the interconnection logic according to the topology of the graph. As the last part of this paper, we designed a prototype system for medical instruments infection control to demonstrate our approach.

关键词： Data integrity validation distributed query processing process modeling theory radio frequency identification (RFID)

来源：评论

学校读者我要写书评

暂无评论

Dear Mobile Agent, Could You Please Find Me a Parking Space? 1

引用

21st European Conference on Advances in Databases and Information Systems (ADBIS)

作者： Urra, Oscar Ilarri, Sergio Univ Zaragoza Dept Comp Sci & Syst Engn I3A Zaragoza Spain

ISBN: (数字)9783319671628

ISBN: (纸本)9783319671628;9783319671611

Vehicular ad hoc networks (VANETs) have attracted a great interest in the last years due to their potential utility for drivers in applications that provide information about relevant events (accidents, emergency brakings, etc.), traffic conditions or even available parking spaces. To accomplish this, the vehicles exchange data among them using wireless communications that can be obtained from different sources, such as sensors or alerts sent by other drivers. In this paper, we propose searching of parking spaces by using a mobile agent that jumps from one vehicle to another to reach the parking area and obtain the required data directly. We perform an experimental evaluation with promising results that show the feasibility of our proposal.

关键词： Vehicular networks Mobile agents distributed query processing Data management Parking spaces

来源：评论

学校读者我要写书评

暂无评论

distributed In-network processing of k-MaxRS in Wireless Sensor Networks 6

Distributed In-network Processing of <i>k</i>-MaxRS in Wirel...

引用

6th International Conference on Sensor Networks (SENSORNETS)

作者： Wongse-ammat, Panitan Hussain, Muhammed Mas-ud Trajcevski, Goce Avci, Besim Khokhar, Ashfaq Northwestern Univ Dept Elect Engn & Comp Sci Evanston IL 60208 USA IIT Dept Elect & Comp Engn Chicago IL 60616 USA

ISBN: (纸本)9789897582110

We address the problem of in-network processing of k-Maximizing Range Sum (k-MaxRS) queries in Wireless Sensor Networks (WSN). The traditional, Computational Geometry version of the MaxRS problem considers the setting in which, given a set of (possibly weighted) 2D points, the goal is to determine the optimal location for a given (axes-parallel) rectangle R to be placed so that the sum of the weights (or, a simple count) of the input points in R's interior is maximized. In WSN, this corresponds to finding the location of region R such that the sum of the sensors' readings inside R is maximized. The k-MaxRS problem deals with maximizing the overall sum over k such rectangular regions. Since centralized processing -i.e., transmitting the raw readings and subsequently determining the k-MaxRS in a dedicated sink - incur communication overheads, we devised an efficient distributed algorithm for in-network computation of k-MaxRS. Our experimental observations show that the novel algorithm provides significant energy/communication savings when compared to the centralized approach.

关键词： k-MaxRS Maximizing Range Sum distributed query processing Wireless Sensor Networks

来源：评论

学校读者我要写书评

暂无评论

Generating distributed query Plans Using Modified Cuckoo Search Algorithm 1

引用

6th International Conference on Soft Computing for Problem Solving (SocProS)

作者： Kumar, T. V. Vijay Yadav, Monika Jawaharlal Nehru Univ Sch Comp & Syst Sci New Delhi 110067 India

ISBN: (数字)9789811033223

ISBN: (纸本)9789811033223;9789811033216

In distributed databases, data is replicated and fragmented across multiple disparate sites spread across a computer network. Consequently, there can exist large numbers of possible query plans for a distributed query. This number increases with increase in the number of sites containing the replicated data. For large numbers of sites, computing an efficient query processing plan becomes a computationally expensive task. This necessitates the devising of a distributed query processing strategy capable of generating good quality query plans, from amongst all possible query plans, which minimize the total cost of processing a distributed query. This distributed query plan generation (DQPG) problem, being a combinatorial optimization problem, has been addressed in this paper using the modified cuckoo search algorithm. Accordingly, a modified CSA (mCSA) based DQPG algorithm (DQPGmCSA), which aims to generate good quality Top-K query plans for a given distributed query, has been proposed herein. Experimental based comparison of DQPGmCSA with the existing GA based DQPG algorithm (DQPGGA) shows that the former is able to generate comparatively better quality Top-K query plans, which, in turn, would result in a reduction in the query response time and thereby enabling efficient decision making.

关键词： distributed query processing distributed query Plan Generation (DQPG) Swarm intelligence Cuckoo Search Algorithm (CSA)

来源：评论

学校读者我要写书评

暂无评论

Managing distributed Queries under Personalized Anonymity Constraints 6

Managing Distributed Queries under Personalized Anonymity Co...

引用

6th International Conference on Data Science, Technology and Applications (DATA)

作者： Michel, Axel Benjamin Nguyen Pucheral, Philippe INSA CVL SDS Team LIFO Blvd Lahitolle Bourges France Inria Saclay Petrus Team Versailles France UVSQ Versailles France

ISBN: (纸本)9789897582554

The benefit of performing Big data computations over individual's microdata is manifold, in the medical, energy or transportation fields to cite only a few, and this interest is growing with the emergence of smart disclosure initiatives around the world. However, these computations often expose microdata to privacy leakages, explaining the reluctance of individuals to participate in studies despite the privacy guarantees promised by statistical institutes. This paper proposes a novel approach to push personalized privacy guarantees in the processing of database queries so that individuals can disclose different amounts of information (i.e. data at different levels of accuracy) depending on their own perception of the risk. Moreover, we propose a decentralized computing infrastructure based on secure hardware enforcing these personalized privacy guarantees all along the query execution process. A performance analysis conducted on a real platform shows the effectiveness of the approach.

关键词： Data Privacy and Security Big Data distributed query processing Secure Hardware

来源：评论

学校读者我要写书评

暂无评论

Management of distributed knowledge encapsulated in embedded devices

引用

INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH 2016年第18期54卷 5434-5451页

作者： Ferrer, Borja Ramis Iarovyi, Sergii Gonzalez, Luis Lobov, Andrei Lastra, Jose L. Martinez Tampere Univ Technol Factory Automat Syst & Technol Lab FAST Lab Tampere Finland

Embedded electronic devices are now to be found everywhere. In general, they can be used to collect different sorts of data (e.g. on temperature, humidity, illumination and locations). In some specific domains, such as industrial automation, embedded devices are used for process control. The devices may have a programme that can respond immediately to environmental changes perceived through sensors. In the control of large sites, where there are many devices, higher level decisions are made or processed in dedicated computers far away from the sources (devices) where the initial data are collected. This article shows how it is possible to manage portions of distributed knowledge, hosted in embedded devices, making it possible for each embedded device to hold and manage its piece of knowledge. In addition, presented approach allows keeping locus of control at the embedded device level, where the embedded device can make decisions knowing the status of the rest of the world, device contributions and their effects in the overall distributed system knowledge base.

关键词： knowledge-based systems ontologies distributed knowledge bases distributed query processing industrial automation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：