The traditional page-grained buffer manager in database systems has a low hit ratio when only a few tuples within a page are frequently accessed. To handle this issue, this paper proposes a new buffering scheme called...
详细信息
The traditional page-grained buffer manager in database systems has a low hit ratio when only a few tuples within a page are frequently accessed. To handle this issue, this paper proposes a new buffering scheme called the AMG-Buffer (Adaptive Multi-Grained Buffer). AMG-Buffer proposes to use two page buffers and a tuple buffer to organize the whole buffer. In this way, the AMG-Buffer can hold more hot tuples than a single page-grained buffer. Further, we notice that the tuple buffer may cause additional read I/Os when writing dirty tuples into disks. Thus, we introduce a new metric named clustering rate to quantify the hot-tuple rate in a page. The use of the tuple buffer is determined by the clustering rate, allowing the AMG-Buffer to adapt to different workloads. We conduct experiments on various workloads to compare the AMG-Buffer with several existing schemes, including LRU, LIRS, CFLRU, CFDC, and MG-Buffer. The results show that AMG-Buffer can significantly improve the hit ratio and reduce I/Os compared to its competitors. Moreover, the AMG-Buffer achieves the best performance on a dynamic workload as well as on a large data set, suggesting its adaptivity and scalability to changing workloads.
Emerging non-volatile memories (NVMs) are known as promising alternatives to SRAMs in on-chip caches. However, their limited write endurance is a major challenge when NVMs are employed in these highly frequently writt...
详细信息
Emerging non-volatile memories (NVMs) are known as promising alternatives to SRAMs in on-chip caches. However, their limited write endurance is a major challenge when NVMs are employed in these highly frequently written caches. Early wear-out of NVM cells makes the lifetime of the caches extremely insufficient for nowadays computational systems. Previous studies only addressed the lifetime of data part in the cache. This paper first demonstrates that the age bits field of the cache replacement algorithm is the most frequently written part of a cache block and its lifetime is shorter than that of data part by more than 27x. Second, it investigates the effect of age bits wear-out on the cache operation and shows that the performance is severely degraded after even a small portion of age bits become non-operational. Third, a novel cache replacement algorithm, so-called Sleepy-LRU, is proposed to reduce the write activity of the age bits with negligible overheads. The evaluations show that Sleepy-LRU extends the lifetime of instruction and data caches to 3.63x and 3.00x, respectively, with an average of 0.06% performance overhead. In addition, Sleepy-LRU imposes no area and power consumption overhead.
The limited memory capacity of mobile devices leads to the popular use of compressed swap schemes, which reduce the I/O operations involving the swapping in and out of infrequently accessed pages. However, most of the...
详细信息
The limited memory capacity of mobile devices leads to the popular use of compressed swap schemes, which reduce the I/O operations involving the swapping in and out of infrequently accessed pages. However, most of the current compressed swap schemes indiscriminately compress and store all swap-out pages. Considering that both energy and computing power are scarce resources in mobile devices, and modern applications frequently deal with already-compressed multimedia data, this blind approach may cause adverse impacts. In addition, they focus only on anonymous pages and not on file-mapped pages, because the latter are backed by on-disk files. However, our observations revealed that, in mobile devices, file-mapped pages consume significantly more memory than anonymous pages. Last but not least, most of the current compressed swap schemes blindly follow the least-recently-used (LRU) discipline when choosing the victim pages for replacement, not considering the compression ratio or data density of the cached pages. To overcome the aforementioned problems and maximize the memory efficiency, we propose a compressed swap scheme, called enhanced zswap (erswap), for mobile devices. erswap accommodates not only anonymous pages, but also clean file-mapped pages. It estimates the compression ratio of incoming pages with their information entropy, and selectively compresses and caches the pages only with beneficial compression ratios. In addition, its admission control and cache replacement algorithms are based on a cost-benefit model that considers not only the access recency of cached pages but also their information density and expected eviction cost. The proposed scheme was implemented in the Linux kernel for Android. Our evaluation with a series of commercial applications demonstrated that it reduced the amount of flash memory read by up to 55%, thereby improving the application launch time by up to 22% in comparison to the original zswap.
An Efficacious Page replacement Techniques(EPRT) customized for distinct types of NAND flash Memory with various cost ratios of write method to read method, is describe in this paper. In main memory, all dirty victim ...
详细信息
ISBN:
(纸本)9781509032945
An Efficacious Page replacement Techniques(EPRT) customized for distinct types of NAND flash Memory with various cost ratios of write method to read method, is describe in this paper. In main memory, all dirty victim pages is break-down into fixed number of flash-pages. EPRT enroll a weighted value to every victim page and select a less weighted value as a victim page. Dirty page are two types, hot dirty page and cold dirty page. The very recent reference method is used to break-down the dirty victim page into hot dirty page and cold once. When a dirty victim is selected, EPRT writes only the dirty page within the victim page-block.
Video sharing system for mobile devices is an important mobile service which can be used to share videos with others. Scalability is critical to the system. Cache technology, which employs cache servers to offload the...
详细信息
ISBN:
(纸本)9781467329637
Video sharing system for mobile devices is an important mobile service which can be used to share videos with others. Scalability is critical to the system. Cache technology, which employs cache servers to offload the original server, has been proposed to attack the scalability problem. Traditional replacement algorithms for cache technology are not suitable for the video sharing system. In this paper, we proposed a cache replacement algorithm called S-LRU for the system. In S-LRU, we consider the size difference of videos. Experimental results demonstrate that S-LRU outperforms the other algorithms.
In high-end data processing systems, such as databases, the execution concurrency level rises continuously since the introduction of multicore processors. This happens both on premises and in the cloud. For these syst...
详细信息
In high-end data processing systems, such as databases, the execution concurrency level rises continuously since the introduction of multicore processors. This happens both on premises and in the cloud. For these systems, a buffer pool management of high scalability plays an important role on overall system performance. The scalability of buffer pool management is largely determined by its data replacement algorithm, which is a major component in the buffer pool management. It can seriously degrade the scalability if not designed and implemented properly. The root cause is its use of lock-protected data structures that incurs high contention with concurrent accesses. A common practice is to modify the replacement algorithm to reduce the contention on the lock(s), such as approximating the LRU replacement with the CLOCK algorithm or partitioning the data structures and using distributed locks. Unfortunately, the modification usually compromises the algorithm's hit ratio, a major performance goal. It may also involve significant effort on overhauling the original algorithm design and implementation. This paper provides a general solution to improve the scalability of a buffer pool management using any replacement algorithms for the data processing systems on physical on-premises machines and virtual machines in the cloud. Instead of making a difficult trade-off between the high hit ratio of a replacement algorithm and the low lock contention of its approximation, we design a system framework, called BP-Wrapper, that eliminates almost all lock contention without requiring any changes to an existing algorithm. In BP-Wrapper, we use a dynamic batching technique and a prefetching technique to reduce lock contention and to retain high hit ratio. The implementation of BP-Wrapper in PostgreSQL adds only about 300 lines of C code. It can increase the throughput by up to two folds compared with the replacement algorithms with lock contention when running TPC-C-like and TPC-W-l
This work shows how by adapting replacement policies in contemporary cache hierarchies it is possible to extend the lifespan of a write endurance-limited main memory by almost one order of magnitude. The inception of ...
详细信息
This work shows how by adapting replacement policies in contemporary cache hierarchies it is possible to extend the lifespan of a write endurance-limited main memory by almost one order of magnitude. The inception of this idea is that during cache residency 1) blocks are modified in a bimodal way: either most of the content of the block is modified or most of the content of the block never changes, and 2) in most applications, the majority of blocks are only slightly modified. When those facts are considered by the cache replacement algorithms, it is possible to significantly reduce the number of bit-flips per write-back to main memory. Our proposal favors the off-chip eviction of slightly modified blocks according to an adaptive replacement algorithm that operates coordinately in L2 and L3. This way it is possible to improve significantly system memory lifetime, with negligible performance degradation. We found that using a few bits per block to track changes in cache blocks with respect to the main memory content is enough. With a slightly modified sectored LRU and a simple cache performance predictor it is possible to achieve a simple implementation with minimal cost in area and no impact on cache access time. On average, our proposal increases the memory lifetime obtained with an LRU policy up to 10 times (10 x) and 15 times (15 x) when combined with other memory centric techniques. In both cases, the performance degradation could be considered negligible.
Babai and Pak demonstrated a weakness in the product replacement algorithm, a widely used heuristic algorithm intended to rapidly generate nearly uniformly distributed random elements in a finite group G. It was an op...
详细信息
Babai and Pak demonstrated a weakness in the product replacement algorithm, a widely used heuristic algorithm intended to rapidly generate nearly uniformly distributed random elements in a finite group G. It was an open question whether the same weakness can be exhibited if one considers only finite solvable groups. We give an affirmative solution to this problem. We consider the distribution of the first component in a k-tuple chosen uniformly in the set of all the k-tuples generating G and construct an infinite sequence of finite solvable groups G for which this distribution turns out to be very far from uniform.
暂无评论