The distributedcache system contains a group of servers caching different contents based on consistent hashing. The dynamic provisioning of servers helps to improve the system efficiency, which leads to a reduction o...
详细信息
The distributedcache system contains a group of servers caching different contents based on consistent hashing. The dynamic provisioning of servers helps to improve the system efficiency, which leads to a reduction of energy cost. We first measure the cache hit rate, request batching effect and cache warm-up time of the system through experiments, considering that they can affect the system performance and efficiency. Then we formulate a stochastic network optimization problem, which aims at achieving objectives on the queue stability, energy cost and cache hit rate simultaneously, through the dynamic control of server activeness and request dispatching. The problem is transformed into a minimization problem in each time slot, which is further addressed through the proposed efficient online algorithm based on dynamic programming. Moreover, we improve the scheme with several practical considerations in the scheme implementation. Finally, the proposed algorithm and the improvements are evaluated through extensive experiments.
Although capacities of persistent storage devices evolved rapidly in the last years, the bandwidth between memory and persistent storage devices is still the bottleneck. As loosely coupled data sharing applications ru...
详细信息
ISBN:
(纸本)9780769550886
Although capacities of persistent storage devices evolved rapidly in the last years, the bandwidth between memory and persistent storage devices is still the bottleneck. As loosely coupled data sharing applications running in cluster environment may need an enormous number of files, the access to these files might become the bottleneck. With the quick development of the server and high-speed network, there are many works done on distributed memory cache to minimize data requests to the centralized filesystem. These systems have the drawback that nodes are coupled together to form a distributedcache statically. It is a difficult administrative task for changing environments like clusters. Current high performance computing resources, support batch job submissions using distributed resource management systems like TORQUE. How to use the resource management system to set up a self-organizing distributed memory cache on demand has rarely been studied. In this paper, we design a framework for dynamically setting up distributed memory cache for data sharing applications. Shared files are stored in the distributed memory cache, which can be accessed transparently and deliver data with high bandwidth. We describe the architecture of the framework, and evaluate its performance for a use case scenario.
The integration of Hive, Impala and Spark SQL platforms has achieved to perform rapid data retrieval using SQL query in big data environment. This paper is to design the optimized platform selection for highly improvi...
详细信息
ISBN:
(纸本)9781509057320
The integration of Hive, Impala and Spark SQL platforms has achieved to perform rapid data retrieval using SQL query in big data environment. This paper is to design the optimized platform selection for highly improving the response of data retrieval. It can automatically choose the best-perform platform to best perform SQL commands. In addition, the distributedmemory storage systems using Memcached and the distributed file system Hadoop HDFS have implemented the caching so that the fastest data retrieval has done once the repeated SQL command has applied.
暂无评论