检索结果-内蒙古大学图书馆

Gecko: Efficient Sliding Window Aggregation With Granular-Based Bulk Eviction Over Big Data streams

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2025年第2期37卷 698-709页

作者： Li, Jianjun Deng, Yuhui Huang, Jiande Yi, Zhou Yang, Qifen Min, Geyong Jinan Univ Dept Comp Sci Guangzhou 510632 Peoples R China Columbus State Univ TSYS Sch Comp Sci Columbus GA 31907 USA Univ Exeter Coll Engn Math & Phys Sci Dept Comp Sci Exeter EX4 4QF England

Sliding window aggregation, which extracts summaries from data streams, is a core operation in streaming analysis. Though existing sliding window algorithms that perform single eviction and insertion operations can achieve a worst-case time complexity of O(1) for in-order streams, real-world data streams often involve out-of-order data and exhibit burst data characteristics, which pose performance challenges to these sliding window algorithms. To address this challenging issue, we propose Gecko - a novel sliding window aggregation algorithm that supports bulk eviction. Gecko leverages a granular-based eviction strategy for various bulk sizes, enabling efficient bulk eviction while maintaining the performance close to that of in-order stream algorithms for single evictions. For large data bulks, Gecko performs coarse-grained eviction at the chunk level, followed by fine-grained eviction using leftward binary tree aggregation (LTA) as a complementary method. Moreover, Gecko partitions data based on chunks to prevent the impacts of out-of-order data on other chunks, thereby enabling efficient handling of out-of-order data streams. We conduct extensive experiments to evaluate the performance of Gecko. Experimental results demonstrate that Gecko exhibits superior performance over other solutions, which is consistent with theoretical expectations. In real-world data scenarios, Gecko improves the average throughput of the state-of-the-art algorithm b_FiBA by 1.7 times, with a maximum improvement of up to 3.5 times. Gecko also demonstrates the best latency performance among all compared schemes.

关键词： streams Windows Out of order Binary trees Throughput Aggregates Partitioning algorithms Time complexity Indexes Computer science Bulk eviction incremental computation out-of-order data streams sliding window aggregation stream algorithm

来源：评论

学校读者我要写书评

暂无评论

Fine-Grained HTTP Web Traffic Analysis Based on Large-Scale Mobile Datasets

引用

IEEE ACCESS 2016年 4卷 4364-4373页

作者： Fang, Cheng Liu, Jun Lei, Zhenming Beijing Univ Posts & Telecommun Sch Informat & Commun Engn Beijing 100876 Peoples R China

Mobile web traffic analytics helps mobile network operators to understand subscriber behaviors and network characteristics. With the increasing prevalence of mobile devices, people are prone to access Internet by mobile devices. Meanwhile, a large amount of applications and services are being moved to the mobile web. These changes have resulted in the elevated complexity of mobile web traffic. Consequently, original web traffic models require an adjustment, and network understanding methods also need to update. In this paper, we proposed a stream algorithm to identify user click requests, and reconstructed user-browser interactions by leveraging Spark streaming framework. The proposed algorithm was tested by massive real HTTP traffic records, which were captured from a cellular core network by high-performance monitoring devices. A statistical analysis was made on the reconstructed data set, and the overall characteristics of mobile web traffic were presented as well. Finally, major improvements in mobile web traffic models were obtained, and the key factors affecting web performance were found. We believe that our models and findings can be useful for mobile network operators to exhaustively understand the mobile web traffic and effectively analyze subscriber behaviors.

关键词： Mobile web web traffic stream algorithm Spark streaming traffic models

来源：评论

学校读者我要写书评

暂无评论

streaming algorithms for Halo Finders 11

Streaming Algorithms for Halo Finders

引用

IEEE International Conference On eScience

作者： Liu, Zaoxing Ivkin, Nikita Yang, Lin F. Neyrinck, Mark Lemson, Gerard Szalay, Alexander S. Braverman, Vladimir Budavari, Tamas Burns, Randal Wang, Xin Johns Hopkins Univ Dept Comp Sci Baltimore MD 21218 USA Johns Hopkins Univ Dept Phys & Astron Baltimore MD 21218 USA

ISBN: (纸本)9781467393256

Cosmological N-body simulations are essential for studies of the large-scale distribution of matter and galaxies in the Universe. This analysis often involves finding clusters of particles and retrieving their properties. Detecting such "halos" among a very large set of particles is a computationally intensive problem, usually executed on the same super-computers that produced the simulations, requiring huge amounts of memory. Recently, a new area of computer science emerged. This area, called streaming algorithms, provides new theoretical methods to compute data analytics in a scalable way using only a single pass over a data sets and logarithmic memory. The main contribution of this paper is a novel connection between the N-body simulations and the streaming algorithms. In particular, we investigate a link between halo finders and the problem of finding frequent items (heavy hitters) in a data stream, that should greatly reduce the computational resource requirements, especially the memory needs. Based on this connection, we can build a new halo finder by running efficient heavy hitter algorithms as a black-box. We implement two representatives of the family of heavy hitter algorithms, the Count-Sketch algorithm (CS) and the Pick-and-Drop sampling (PD), and evaluate their accuracy and memory usage. Comparison with other halo-finding algorithms from [1] shows that our halo finder can locate the largest haloes using significantly smaller memory space and with comparable running time. This streaming approach makes it possible to run and analyze extremely large data sets from N-body simulations on a smaller machine, rather than on supercomputers. Our findings demonstrate the connection between the halo search problem and streaming algorithms as a promising initial direction of further research.

关键词： Cosmology Halo finder N-body simulation stream algorithm

来源：评论

学校读者我要写书评

暂无评论

MONLAD: Money Laundering Agents Detection in Transaction streams 22

MONLAD: Money Laundering Agents Detection in Transaction Str...

引用

15th ACM International Conference on Web Search and Data Mining (WSDM)

作者： Sun, Xiaobing Feng, Wenjie Liu, Shenghua Xie, Yuyang Bhatia, Siddharth Hooi, Bryan Wang, Wenhan Cheng, Xueqi Chinese Acad Sci Inst Comp Technol Beijing Peoples R China Univ Chinese Acad Sci Beijing Peoples R China Natl Univ Singapore Inst Data Sci Singapore Singapore Natl Univ Singapore Singapore Singapore Tsinghua Univ Beijing Peoples R China Tencent Technol Co Ltd Shenzhen Peoples R China

ISBN: (纸本)9781450391320

Given a stream of money transactions between accounts in a bank, how can we accurately detect money laundering agent accounts and suspected behaviors in real-time? Money laundering agents try to hide the origin of illegally obtained money by dispersive multiple small transactions and evade detection by smart strategies. Therefore, it is challenging to accurately catch such fraudsters in an unsupervised manner. Existing approaches do not consider the characteristics of those agent accounts and are not suitable to the streaming settings. Therefore, we propose MONLAD and MONLAD-W to detect money laundering agent accounts in a transaction stream by keeping track of their residuals and other features;we devise ANOSCORE algorithm to find anomalies based on the robust measure of statistical deviation. Experimental results show that MONLAD outperforms the state-of-the-art baselines on real-world data and finds various suspicious behavior patterns of money laundering. Additionally, several detected suspected accounts have been manually-verified as agents in real money laundering scenario.

关键词： Anomaly detection Money laundering stream algorithm

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：