Kernel density estimation is a useful method for estimating the probability distribution of data. It is a challenge to achieve efficient kernel density estimation, especially for large-scale and high-dimension stream ...
详细信息
Kernel density estimation is a useful method for estimating the probability distribution of data. It is a challenge to achieve efficient kernel density estimation, especially for large-scale and high-dimension stream data. We propose rotation kernel, , a novel kernel function for density estimation. The rotation kernel density can be fast estimated by a data structure named Rotation Kernel Density sketch (RKDS). RKDS is a time- and memory- efficient method for kernel density estimation, even over data streams and distributed systems. RKDS is applicable for estimating density at specific points and also for representing data distribution. We provide theoretical analysis for rotation kernel and RKDS. Furthermore, we apply RKDS to outlier detection, concept drift detection, and personalized federated learning. Experiments show that our method improves time efficiency by up to 3 x 10(3) times compared with baselines. RKDS also provides comparable detecting precision and better delay on outlier detection and concept drift detection tasks.
In this paper, we study steady flows in data streams, which refers to the flows whose arrival rate is always non-zero and around a fixed value for several consecutive time windows. To find steady flows in real time, w...
详细信息
In this paper, we study steady flows in data streams, which refers to the flows whose arrival rate is always non-zero and around a fixed value for several consecutive time windows. To find steady flows in real time, we propose a novel sketch-based algorithm, Steadysketch, aiming to accurately report steady flows with limited memory. To the best of our knowledge, this is the first work to define and find steady flows in data streams. The key novelty of Steadysketch is our proposed reborn technique, which reduces the memory requirement by 75%. Our theoretical proofs show that the negative impact of the reborn technique is small. Experimental results show that, compared with the two comparison schemes, Steadysketch improves the Precision Rate (PR) by around 79.5% and 82.8%, and reduces the Average Relative Error (ARE) by around 905.9x and 657.9x, respectively. Finally, we provide three concrete cases: cache prefetch, Redis and P4 implementation. As we will demonstrate, Steadysketch can effectively improve the cache hit ratio while achieving satisfying performance on both Redis and Tofino switches. All related codes of Steadysketch are available at GitHub.
sketch algorithms are apt at detecting Heavy Hitters accurately at real time with low memory overhead. However, traditional wisdom of network measurement mainly concentrates on the two key properties: accuracy and mem...
详细信息
ISBN:
(纸本)9781665423168
sketch algorithms are apt at detecting Heavy Hitters accurately at real time with low memory overhead. However, traditional wisdom of network measurement mainly concentrates on the two key properties: accuracy and memory-efficiency, which only targets the data plane. This paper aims to achieve timeliness and communication-efficiency in Heavy Hitter detection by properly coordinating the interplay of the data plane and the control plane, thus further relieve the control plane burdens. Results show we maintain the accuracy in Heavy Hitter detection while reduce the delay from microseconds to nearly zero and communication overhead by at most 70%.
暂无评论