The Miss Ratio Curve (MRC) is an important metric and effective tool for caching system performance prediction and optimization. Since the Least Recently Used (LRU) replacement policy is the de facto policy for many e...
详细信息
ISBN:
(纸本)9781450390682
The Miss Ratio Curve (MRC) is an important metric and effective tool for caching system performance prediction and optimization. Since the Least Recently Used (LRU) replacement policy is the de facto policy for many existing caching systems, most previous studies on efficient MRC construction are predominantly focused on the LRU replacement policy. Recently, the random samplingbased replacement mechanism, as opposed to replacement relying on the rigid LRU data structure, gains more popularity due to its lightweight and flexibility. To approximate LRU, at replacement times, the system randomly selects K objects and replaces the least recently used object among the sample. Redis implements this approximated LRU policy. We observe that there can exist a significant miss ratio gap between exact LRU and random sampling-based LRU under different sampling size K;therefore existing LRU MRC construction techniques cannot be directly applied to random sampling based LRU cache without loss of accuracy. In this work, we present a new probabilistic stack algorithm named KRR which can be used to accurately model random sampling based-LRU under arbitrary sampling size K. We propose two efficient stack update algorithms which reduce the expected running time of KRR from O(N * M) to O(N * lo.2M) and O(N * lo.M), respectively, where N is the workload length and M is the number of distinct objects. Furthermore, we adopt spatial sampling which further reduces the running time of KRR by several orders of magnitude, and thus enables practical, low overhead online application of KRR.
Current graphics processing units (GPUs) utilize the single instruction multiple thread (SIMT) execution model. With SIMT, a group of logical threads executes such that all threads in the group execute a single common...
详细信息
ISBN:
(纸本)9781467355872
Current graphics processing units (GPUs) utilize the single instruction multiple thread (SIMT) execution model. With SIMT, a group of logical threads executes such that all threads in the group execute a single common instruction on a particular cycle. To enable control flow to diverge within the group of threads, GPUs partially serialize execution and follow a single control flow path at a time. The execution of the threads in the group that are not on the current path is masked. Most current GPUs rely on a hardware reconvergence stack to track the multiple concurrent paths and to choose a single path for execution. Control flow paths are pushed onto the stack when they diverge and are popped off of the stack to enable threads to reconverge and keep lane utilization high. The stack algorithm guarantees optimal reconvergence for applications with structured control flow as it traverses the structured control-flow tree depth first. The downside of using the reconvergence stack is that only a single path is followed, which does not maximize available parallelism, degrading performance in some cases. We propose a change to the stack hardware in which the execution of two different paths can be interleaved. While this is a fundamental change to the stack concept, we show how dual-path execution can be implemented with only modest changes to current hardware and that parallelism is increased without sacrificing optimal (structured) control-flow reconvergence. We perform a detailed evaluation of a set of benchmarks with divergent control flow and demonstrate that the dual-path stack architecture is much more robust compared to previous approaches for increasing path parallelism. Dual-path execution either matches the performance of the baseline single-path stack architecture or outperforms single-path execution by 14.9% on average and by over 30% in some cases.
In this paper, we introduce a novel criterion to rank puncturing patterns for rate-compatible LDPC codes. Specifically, based on Gaussian approximation density evolution, a cost function is devised to characterize the...
详细信息
ISBN:
(纸本)9781424441471
In this paper, we introduce a novel criterion to rank puncturing patterns for rate-compatible LDPC codes. Specifically, based on Gaussian approximation density evolution, a cost function is devised to characterize the degree distribution of the punctured code matrices, which are derived from a mother code matrix by matrix transformation. This cost function allows us to effectively compare the expected performance of candidate puncturing patterns and to sort out good ones. Combined with well-designed search algorithms, the proposed criterion can be applied on both standardized Block-LDPC codes and generic binary LDPC codes to get good puncturing patterns with manageable complexity. Numerical simulation results verify the effectiveness of the proposed ranking criterion, and demonstrate that a series of good rate-compatible LDPC codes can be obtained by the proposed ranking criterion.
We propose complexity efficient tree search approaches to detection and estimation problems, and we consider both communication and target tracking applications in this regard. We formulate the problems of interest us...
详细信息
We propose complexity efficient tree search approaches to detection and estimation problems, and we consider both communication and target tracking applications in this regard. We formulate the problems of interest using a dynamic state space model, where we seek to estimate the system state from sensor observations. The proposed tree search approach initializes a search tree from a given initially estimated system state. With each new observation, the algorithm expands the search tree to find the best possible system state. The paths through the search tree correspond to sequences of system states. They are evaluated by their associated path metrics, which are proportional to the posterior probability of the system state. The stack algorithm and the M-algorithm, two tree search approaches that were originally used for decoding convolutionally-encoded sequences, are applied to reduce the complexity of the tree search.@pqdt@break@In wireless communication applications, we have considered blind channel equalization of time-varying channels and modulation classification of an unknown received signal. For both applications, tree search approaches are implemented to find the transmitted information sequence that maximizes the posterior probability distribution function of the system state. For blind channel equalization of time-varying channels, an exponentially decaying window, matched to the variation rate of the channel, is implemented in the path metric calculation to successfully equalize an unknown time-varying channel. For modulation classification of challenging high-density QAM (Quadrature Amplitude Modulation) schemes, the statistical properties of the mean square error are derived analytically and compared with those of the estimated sequence to identify the modulation scheme of the received signal. The proposed tree-search approaches provide superior performance in comparison with other competitive approaches.@pqdt@break@In target tracking applications, the p
A systolic array processing technique is applied to implementing the stack algorithm form of the sequential decoding algorithm. It is shown that sorting, a key function in the stack algorithm, can be efficiently reali...
详细信息
A systolic array processing technique is applied to implementing the stack algorithm form of the sequential decoding algorithm. It is shown that sorting, a key function in the stack algorithm, can be efficiently realized by a special type of systolic arrays known as systolic priority queues. Compared to the stack-bucket algorithm, this approach is shown to have the advantages that the decoding always moves along the optimal path, that it has a fast and constant decoding speed and that its simple and regular hardware architecture is suitable for VLSI implementation. Three types of systolic priority queues are discussed: random access scheme, shift register scheme and ripple register scheme. The property of the entries stored in the systolic priority queue is also investigated. The results are applicable to many other basic sorting type problems.
A ray cast algorithm utilizing a hierarchical acceleration structure needs to perform a tree traversal in the hierarchy. In its basic form, executing the traversal requires a stack that holds the nodes that are still ...
详细信息
ISBN:
(纸本)9781617827242
A ray cast algorithm utilizing a hierarchical acceleration structure needs to perform a tree traversal in the hierarchy. In its basic form, executing the traversal requires a stack that holds the nodes that are still to be processed. In some cases, such a stack can be prohibitively expensive to maintain or access, due to storage or memory bandwidth limitations. The stack can, however, be eliminated or replaced with a fixed-size buffer using so-called stackless or short stack algorithms. These require that the traversal can be restarted from root so that the already processed part of the tree is not entered again. For kd-tree ray casts, this is accomplished easily by ray shortening, but the approach does not extend to other kinds of hierarchies such as BVHs. In this paper, we introduce restart trail, a simple algorithmic method that makes restarts possible regardless of the type of hierarchy by storing one bit of data per level. This enables stackless and short stack traversal for BVH ray casts, where using a full stack or constraining the traversal order have so far been the only options.
暂无评论