For in-memory computing frameworks such as Apache Spark [5, 6], objects (i.e., the intermediated data) can be accommodated in the main memory for speeding up the execution process. In this paper, we propose a cost-awa...
详细信息
ISBN:
(纸本)9781450351911
For in-memory computing frameworks such as Apache Spark [5, 6], objects (i.e., the intermediated data) can be accommodated in the main memory for speeding up the execution process. In this paper, we propose a cost-aware object management method for in-memory computing frameworks. When the main memory space of any worker node is not enough to accommodate the new computed or the retrieved object,we first pick appreciate objects which are already accommodated in the main memory as candidates for eviction and then evict objects with the minimal sum of the creation cost and the maximum sum of the occupied main memory space. According to the experimental results, we can achieve the goal under the 80/20 and 50/50 principles.
Artificial intelligence applications that greatly depend on deep learning and compute vision processing becomes popular. Their strong demands for low-latency or real-time services make Spark, an in-memory big data com...
详细信息
Artificial intelligence applications that greatly depend on deep learning and compute vision processing becomes popular. Their strong demands for low-latency or real-time services make Spark, an in-memory big data computing framework, the best choice in taking place of previous disk-based big data computing. As an in-memory framework, reasonable data arrangement in storage is the key factor of performance. However, the existing cache replacement strategy and storage selection mechanism based optimizations all rely on an imprecise available memory model and will lead to negative decision. To address this issue, we propose an available memory model to capture the accurate information of to be freed memory space by sensing the dependencies between the data. And we also propose a maximum memory requirement model for execution prediction to exclude the redundancy from inactive blocks. With such two models, we build DASS, a dependency-aware storage selection mechanism for Spark to make dynamic and fine-grained storage decision. Our experiments show that compared with previous methods the DASS could effectively reduce the cost of garbage collection and RDD blocks re-computing, give better computing performance by 77.4%.
暂无评论