检索结果-内蒙古大学图书馆

A distributed streaming framework for edge-cloud triangle counting in graph streams

KNOWLEDGE-BASED SYSTEMS 2023年第1期278卷

作者： Yang, Xu Song, Chao Gu, Jiqing Li, Ke Li, Hongwei Xian Univ Technol Sch Engn & Comp Sci Xian 710049 Shaanxi Peoples R China Univ Elect Sci & Technol China Sch Comp Sci & Engn Chengdu 610731 Sichuan Peoples R China Chengdu Univ Informat Technol Sch Comp Sci Chengdu 610225 Sichuan Peoples R China

The triangle counting problem in graph streams has been extensively studied in social network analysis, recommendation systems, user portraits and other fields. However, cloud computing based streaming algorithms cause high bandwidth occupation and long transmission latency due to limited bandwidth of the cloud. Recently, edge computing is promising to overcome the issue of transmitting large-scale data for cloud computing. However, directly applying edge computing in streaming triangle counting will reduce the accuracy of the triangle count estimation, due to the limitation of local computing at the edge network. We term the cooperations between edge computing and cloud computing for streaming triangle counting as edge-cloud triangle counting in graph streams. In this paper, we first propose a streaming framework for edge-cloud triangle counting in graph streams. Then, we propose a streaming triangle counting algorithm called Trie-based Edge Compression (TbEC) by using the binary trie at the edge network that enables lossless compression and efficient transmission to the cloud. In addition, to extend our algorithms for triangle counting in multigraphs, we present a dual deduplication strategy collaboratively using the trie-based data structure and a Bloom Filter. Our experiments with real-world datasets show that TbEC is (a) Accurate: yielding up to 3.35x more accurate smaller estimation error than the state-of-the-art distributed streaming algorithm, (b) Fast: yielding up to 10.59x faster than the state-of-the-art distributed streaming algorithm, (c) Scalable: scaling linearly with the number of edges in the input graph stream.& COPY;2023 Elsevier B.V. All rights reserved.

关键词： Triangle counting Approximate algorithms streaming graphs distributed streaming algorithms

来源：评论

学校读者我要写书评

暂无评论

DTC: Real-Time and Accurate distributed Triangle Counting in Fully Dynamic Graph Streams 43

DTC: Real-Time and Accurate Distributed Triangle Counting in...

引用

43rd International Symposium on Reliable distributed Systems

作者： Xuan, Wei Liang, Yan Cao, Huawei Lin, Ning Ye, Xiaochun Fan, Dongrui Chinese Acad Sci Inst Comp Technol Beijing Peoples R China Zhongguancun Lab Beijing Peoples R China Univ Hong Kong Hong Kong Peoples R China

ISBN: (纸本)9798331530044;9798331530037

Triangle counting is a fundamental problem in graph mining, essential for analyzing graph streams with arbitrary edge orders. However, exact counting becomes impractical due to the massive size of real-world graph streams. To address this, approximate algorithms have been developed, but existing distributed streaming algorithms lack adaptability and struggle with edge deletions. In this article, we propose DTC, a novel family of single-pass distributed streaming algorithms for global and local triangle counting in fully dynamic graph streams. Our DTC-AR algorithm accurately estimates triangle counts without prior knowledge of graph size, leveraging multi-machine resources. Additionally, we introduce DTC-FD, an algorithm tailored for fully dynamic graph streams, incorporating edge insertions and deletions. Using Random Pairing and future edge insertion compensation, DTC-FD achieves unbiased and accurate approximations across multiple machines. Experimental results demonstrate significant improvements over baselines. DTC-AR achieves up to 2029.4 x and 27.1x more accuracy, while maintaining the best trade-off between accuracy and storage space. DTC-FD reduces estimation errors by up to 32.5 x and 19.3 x, scaling linearly with graph stream size. These findings highlight the effectiveness of our proposed algorithms in tackling triangle counting in real-world scenarios. The source code and datasets are released and available at https://***/Anonymousview/Real-Time-and-Accurate-distributed-Triangle-Counting-in-Fully-Dynamic-Graph-Streams.

关键词： triangle counting real-time processing fully dynamic graph streams distributed streaming algorithms

来源：评论

学校读者我要写书评

暂无评论

Big Data Stream Learning with SAMOA 14

Big Data Stream Learning with SAMOA

引用

14th IEEE International Conference on Data Mining (IEEE ICDM)

作者： Bifet, Albert De Francisci Morales, Gianmarco HUAWEI Noahs Ark Lab Hong Kong Hong Kong Peoples R China Yahoo Labs Barcelona Spain

ISBN: (纸本)9781479942749

Big data is flowing into every area of our life, professional and personal. Big data is defined as datasets whose size is beyond the ability of typical software tools to capture, store, manage and analyze, due to the time and memory complexity. Velocity is one of the main properties of big data. In this demo, we present SAMOA (Scalable Advanced Massive Online Analysis), an open-source platform for mining big data streams. It provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and regression, as well as programming abstractions to develop new algorithms. It features a pluggable architecture that allows it to run on several distributed stream processing engines such as Storm, S4, and Samza. SAMOA is written in Java and is available at http://*** under the Apache Software License version 2.0.

关键词： Big Data data mining learning (artificial intelligence) pattern classification pattern clustering regression analysis Apache Software License version 2.0 Java S4 SAMOA Samza Storm big data stream learning big data stream mining classification task clustering task distributed stream processing engines distributed streaming algorithms machine learning tasks open-source platform pluggable architecture programming abstractions regression task scalable advanced massive online analysis Algorithm design and analysis Big data Data mining Digital signal processing Engines Machine learning algorithms Storms Classification Clustering Data Streams distributed Systems Machine Learning Regression Toolbox big data storms data mining Samoan Islands Algorithm design and analysis Cluster Analysis Machine learning algorithms Engines tool cases Java Java programming language distributed Systems Digital signal processing Pattern recognition Regression (Psychology)

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：