检索结果-内蒙古大学图书馆

CCA: Cost-Capacity-Aware Caching for In-Memory data analytics frameworks

SENSORS 2021年第7期21卷 2321-2321页

作者： Park, Seongsoo Jeong, Minseop Han, Hwansoo Sungkyunkwan Univ Dept Elect & Comp Engn 2066 Seobu Ro Suwon 16419 South Korea Sungkyunkwan Univ Dept Comp Sci & Engn 2066 Seobu Ro Suwon 16419 South Korea

To process data from IoTs and wearable devices, analysis tasks are often offloaded to the cloud. As the amount of sensing data ever increases, optimizing the data analytics frameworks is critical to the performance of processing sensed data. A key approach to speed up the performance of data analytics frameworks in the cloud is caching intermediate data, which is used repeatedly in iterative computations. Existing analytics engines implement caching with various approaches. Some use run-time mechanisms with dynamic profiling and others rely on programmers to decide data to cache. Even though caching discipline has been investigated long enough in computer system research, recent data analytics frameworks still leave a room to optimize. As sophisticated caching should consider complex execution contexts such as cache capacity, size of data to cache, victims to evict, etc., no general solution often exists for data analytics frameworks. In this paper, we propose an application-specific cost-capacity-aware caching scheme for in-memory data analytics frameworks. We use a cost model, built from multiple representative inputs, and an execution flow analysis, extracted from DAG schedule, to select primary candidates to cache among intermediate data. After the caching candidate is determined, the optimal caching is automatically selected during execution even if the programmers no longer manually determine the caching for the intermediate data. We implemented our scheme in Apache Spark and experimentally evaluated our scheme on HiBench benchmarks. Compared to the caching decisions in the original benchmarks, our scheme increases the performance by 27% on sufficient cache memory and by 11% on insufficient cache memory, respectively.

关键词： big data analytics frameworks caching optimization in-memory data

来源：评论

学校读者我要写书评

暂无评论

Cherry: A Distributed Task-Aware Shuffle Service for Serverless analytics

Cherry: A Distributed Task-Aware Shuffle Service for Serverl...

引用

9th IEEE International Conference on big data (IEEE bigdata)

作者： Nikitas, Nikolaos Konstantinou, Ioannis Kalogeraki, Vana Koziris, Nectarios CSLAB NTUA Athens Greece Univ Thessaly Volos Greece Athens Univ Econ & Business Athens Greece NTUA Athens Greece

ISBN: (纸本)9781665439022

While there has been a lot of effort in recent years in optimising big data systems like Apache Spark and Hadoop, the all-to-all transfer of data between a MapReduce computation step, i.e., the shuffle data mechanism between cluster nodes remains always a serious bottleneck. In this work, we present Cherry, an open-source distributed task-aware Caching sHuffle sErvice for seRveRless analytics. Our thorough experiments on a cloud testbed using realistic and synthetic workloads showcase that Cherry can achieve an almost 23% to 39% reduction in completion of the reduce stage with small shuffle block sizes, a 10% reduction in execution time on real workloads, while it can efficiently handle Spark execution failures with a constant task time re-computation overhead compared to existing approaches.

关键词： big data analytics frameworks Distributed Systems Cloud Computing Serverless Architecture

来源：评论

学校读者我要写书评

暂无评论

Exploring Memory Locality for big data analytics in Virtualized Clusters

Exploring Memory Locality for Big Data Analytics in Virtuali...

引用

Symposium on Cloud Computing (SoCC)

作者： Hwang, Eunji Kim, Hyungoo Nam, Beomseok Choi, Young-ri UNIST Sch Elect & Comp Engn Ulsan South Korea

In this work, we investigate techniques to improve the performance of big data analytics in virtualized clusters by effectively increasing the utilization of cached data and effciently using scarce memory resources.

ISBN: (纸本)9781450350280

关键词： Memory locality big data analytics frameworks Hadoop Cloud computing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：