Enterprise users at different geographic locations generate large-volume data that is stored at different geographic datacenters. These users may also perform big data analytics on the stored data to identify valuable...
详细信息
Enterprise users at different geographic locations generate large-volume data that is stored at different geographic datacenters. These users may also perform big data analytics on the stored data to identify valuable information in order to make strategic decisions. However, it is well known that performing big data analytics on data in geographical-located datacenters usually is time-consuming and costly. In some delay-sensitive applications, the query result may become useless if answering a query takes too long time. Instead, sometimes users may only be interested in timely approximate rather than exact query results. When such approximate query evaluation is the case, applications must sacrifice timeliness to get more accurate evaluation results or tolerate evaluation result with a guaranteed error bound obtained from analyzing the samples of the data to meet their stringent timeline. In this paper, we study quality-of-service (QoS)-aware data replication and placement for approximate query evaluation of big data analytics in a distributed cloud, where the original (source) data of a query is distributed at different geo-distributed datacenters. We focus on the problems of placing data samples of the source data at some strategic datacenters to meet stringent query delay requirements of users, by exploring a non-trivial trade-off between the cost of query evaluation and the error bound of the evaluation result. We first propose an approximation algorithm with a provable approximation ratio for a single approximate query. We then develop an efficient heuristic algorithm for evaluating a set of approximate queries with the aim to minimize the evaluation cost while meeting the delay requirements of these queries. We finally demonstrate the effectiveness and efficiency of the proposed algorithms through both experimental simulations and implementations in a real test-bed, real datasets are employed. Experimental results show that the proposed algorithms are promisi
We are in the era of big data and cloud computing, large quantity of computing resource is desperately needed to detect invaluable information hidden in the coarse big data through query evaluation. Users demand big d...
详细信息
ISBN:
(纸本)9781450371964
We are in the era of big data and cloud computing, large quantity of computing resource is desperately needed to detect invaluable information hidden in the coarse big data through query evaluation. Users demand big data analytic services with various Quality of Service (QoS) requirements. However, cloud computing is facing new challenges in meeting stringent QoS requirements of users due to the remoteness from its users. Edge computing has emerged as a new paradigm to address such shortcomings by bringing cloud services to the edge of the operation network in proximity of users for performance improvement. To satisfy the QoS requirements of users for big data analytics in edge computing, the data replication and placement problem must be properly dealt with such that user requests can be efficiently and promptly responded. In this paper, we consider data replication and placement for big data analytic query evaluation. We first cast a novel proactive data replication and placement problem of big data analytics in a two-tier edge cloud environment, we then devise an approximation algorithm with an approximation ratio for it, we finally evaluate the proposed algorithm against existing benchmarks, using both simulation and experiment in a testbed based on real datasets, the evaluation results show that the proposed algorithm is promising.
暂无评论