检索结果-内蒙古大学图书馆

33rd ACM International Conference on Information and Knowledge Management (CIKM)

作者： Gupta, Vipul Chen, Xin Huang, Ruoyun Meng, Fanlong Chen, Jianjun Yan, Yujun Bytedance San Jose CA 95110 USA Bytedance Seattle WA USA Dartmouth Coll Hanover NH 03755 USA

ISBN: (纸本)9798400704369

Graph Neural Networks (GNNs) have emerged as powerful tools for supervised machine learning over graph-structured data, while sampling-based node representation learning is widely utilized in unsupervised learning. However, scalability remains a major challenge in both supervised and unsupervised learning for large graphs (e.g., those with over 1 billion nodes). The scalability bottleneck largely stems from the mini-batch sampling phase in GNNs and the random walk sampling phase in unsupervised methods. These processes often require storing features or embeddings in memory. In the context of distributed training, they require frequent, inefficient random access to data stored across different workers. Such repeated inter-worker communication for each mini-batch leads to high communication overhead and computational inefficiency. We propose graphscale, a unified framework for both supervised and unsupervised learning to store and process large graph data distributedly. The key insight in our design is the separation of workers who store data and those who perform the training. This separation allows us to decouple computing and storage in graph training, thus effectively building a pipeline where data fetching and data computation can overlap asynchronously. Our experiments show that graphscale outperforms state-of-the-art methods for distributed training of both GNNs and node embeddings. We evaluate graphscale both on public and proprietary graph datasets and observe a reduction of at least 40% in end-to-end training times compared to popular distributed frameworks, without any loss in performance. While most existing methods don't support billion-node graphs for training node embeddings, graphscale is currently deployed in production at TikTok enabling efficient learning over such large graphs.

关键词： Distributed Graph Learning node Embedding billion-node graphs

来源：评论

学校读者我要写书评

暂无评论

GStar: an efficient framework for answering top-k star queries on billion-node knowledge graphs

引用

WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS 2019年第4期22卷 1611-1638页

作者： Jin, Jiahui Luo, Junzhou Khemmarat, Samamon Dong, Fang Gao, Lixin Southeast Univ Sch Comp Sci & Engn Nanjing Jiangsu Peoples R China Univ Massachusetts Dept Elect & Comp Engn Amherst MA 01003 USA

Massive knowledge graphs, such as Linked Open Data or Freebase, contain billions of labeled entities and relationships. Star queries aim to identify an entity given a set of related entities, and they are common with massive knowledge graphs. It is important to find the best way to answer star queries, and we can do this by treating it as a graph pattern-matching problem. Because knowledge graphs are noisy and incomplete in nature, we must find answers that match the star pattern closely, and extract a precise match if possible. Thus, here we propose GStar, a framework to identify the top-k best answers for a star query. GStar effectively and efficiently answers top-k star queries on billion-node graphs through a novel query model, an index-free query algorithm, and a distributed query system. We evaluate GStar through experiments on real-world knowledge graphs. Experimental results show that our query model effectively answers real-life star-pattern queries;our query algorithm can answer top-k queries in a near-real-time manner without requiring expensive graph indices;and the distributed system scales well with both the graph size and number of machines used for computation.

关键词： Graph pattern matching Knowledge graphs billion-node graphs Top-k query Big data Distributed system

来源：评论

学校读者我要写书评

暂无评论

Semi-External Memory Sparse Matrix Multiplication for billion-node graphs

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2017年第5期28卷 1470-1483页

作者： Zheng, Da Mhembere, Disa Lyzinski, Vince Vogelstein, Joshua T. Priebe, Carey E. Burns, Randal Johns Hopkins Univ Dept Comp Sci Baltimore MD 21218 USA Johns Hopkins Univ Dept Appl Math & Stat Baltimore MD 21218 USA Johns Hopkins Univ Dept Biomed Engn Baltimore MD 21218 USA

Sparse matrix multiplication is traditionally performed in memory and scales to large matrices using the distributed memory of multiple nodes. In contrast, we scale sparse matrix multiplication beyond memory capacity by implementing sparse matrix dense matrix multiplication (SpMM) in a semi-external memory (SEM) fashion;i.e., we keep the sparse matrix on commodity SSDs and dense matrices in memory. Our SEM-SpMM incorporates many in-memory optimizations for large power-law graphs. It outperforms the in-memory implementations of Trilinos and Intel MKL and scales to billion-node graphs, far beyond the limitations of memory. Furthermore, on a single large parallel machine, our SEM-SpMM operates as fast as the distributed implementations of Trilinos using five times as much processing power. We also run our implementation in memory (IM-SpMM) to quantify the overhead of keeping data on SSDs. SEM-SpMM achieves almost 100 percent performance of IM-SpMM on graphs when the dense matrix has more than four columns;it achieves at least 65 percent performance of IM-SpMM on all inputs. We apply our SpMM to three important data analysis tasks-PageRank, eigensolving, and non-negative matrix factorization-and show that our SEM implementations significantly advance the state of the art.

关键词： Sparse matrix multiplication semi-external memory billion-node graphs SSDs

来源：评论

学校读者我要写书评

暂无评论

Querying Web-Scale Information Networks Through Bounding Matching Scores 15

Querying Web-Scale Information Networks Through Bounding Mat...

引用

24th International Conference on World Wide Web (WWW)

作者： Jin, Jiahui Khemmarat, Samamon Gao, Lixin Luo, Junzhou Southeast Univ Sch Comp Sci & Engn Nanjing Jiangsu Peoples R China Univ Massachusetts Dept Elect & Comp Engn Amherst MA 01003 USA

ISBN: (纸本)9781450334693

Web-scale information networks containing billions of entities are common nowadays. Querying these networks can be modeled as a subgraph matching problem. Since information networks are incomplete and noisy in nature, it is important to discover answers that match exactly as well as answers that are similar to queries. Existing graph matching algorithms usually use graph indices to improve the efficiency of query processing. For web-scale information networks, it may not be feasible to build the graph indices due to the amount of work and the memory/storage required. In this paper, we propose an efficient algorithm for finding the best k answers for a given query without precomputing graph indices. The quality of an answer is measured by a matching score that is computed online. To speed up query processing, we propose a novel technique for bounding the matching scores during the computation. By using bounds, we can efficiently prune the answers that have low qualities without having to evaluate all possible answers. The bounding technique can be implemented in a distributed environment, allowing our approach to efficiently answer the queries on web-scale information networks. We demonstrate the effectiveness and the efficiency of our approach through a series of experiments on real-world information networks. The result shows that our bounding technique can reduce the running time up to two orders of magnitude comparing to an approach that does not use bounds.

关键词： Subgraph Matching Graph Similarity billion-node graphs Index-Free Query Processing Distributed System

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：