We develop a distributed-memory parallel algorithm for performing batch updates on streaming graphs, where vertices and edges are continuously added or removed. Our algorithm leverages distributed sparse matrices as t...
ISBN:
(纸本)9798350355543
We develop a distributed-memory parallel algorithm for performing batch updates on streaming graphs, where vertices and edges are continuously added or removed. Our algorithm leverages distributed sparse matrices as the core data structures, utilizing equivalent sparse matrix operations to execute graph updates. By reducing unnecessary communication among processes and employing shared-memory parallelism, we accelerate updates of distributed graphs. Additionally, we maintain a balanced load in the output matrix by permuting the resultant matrix during the update process. We demonstrate that our streaming update algorithm is at least 25 times faster than alternative linear-algebraic methods and scales linearly up to 4,096 cores (32 nodes) on a Cray EX supercomputer.
The proceedings contain 236 papers. The topics discussed include: harnessing quantum computing for next-generation intelligent systems;bits to qubits: an overview of quantum computing;exploring the potential of artifi...
ISBN:
(纸本)9798331527495
The proceedings contain 236 papers. The topics discussed include: harnessing quantum computing for next-generation intelligent systems;bits to qubits: an overview of quantum computing;exploring the potential of artificial intelligence and machine learning in fog computing, cloud computing and edge computing: benefits and challenges;network coding for fault-tolerant transmission of biomedical data;emotion-aware speech translation: a review;density based traffic control system;distributedmemory fast Fourier transforms in the exascale era;subtitling of crowd sensitive noise and vibration data using cloud computing;intelligent posture detection system for improved ergonomics;electric vehicles: types, advantages, difficulties, and possible remedies for broad adoption: a review;and machine learning with artificial intelligence: towards a broad consensus.
The Spark distributed algorithm is a distributedcomputing implementation based on the map reduce algorithm, which has the advantages of Hadoop MapReduce. However, unlike MapReduce, the intermediate output and results...
详细信息
ISBN:
(数字)9798331504205
ISBN:
(纸本)9798331504212
The Spark distributed algorithm is a distributedcomputing implementation based on the map reduce algorithm, which has the advantages of Hadoop MapReduce. However, unlike MapReduce, the intermediate output and results of jobs can be saved in memory, eliminating the need to read and write HDFS. Therefore, Spark is better suited for iterative map reduce algorithms such as data mining and machine learning. Spark distributed algorithm has been widely used in the Internet, advertising, finance and other industries, but less in the power industry. At present, the research on market power risk of power generators has been very in-depth, but there is little research on simulation platform algorithms for identifying market power risk. This article proposes a market power identification platform for power generation enterprises based on the Spark distributed framework, which can effectively accelerate the calculation speed of market power indicators. Furthermore, using a certain regional power grid as a simulation example, the monopoly situation of various power generation enterprises and the entire wholesale market is analyzed, proving the effectiveness of the Spark distributed framework algorithm. As the number of grid nodes increases, the advantages of the distributed algorithm will become more prominent.
Graph-based approximate nearest neighbor algorithms have shown high neighbor structure representation quality. NN-Descent is a widely known graph-based approximate nearest neighbor (ANN) algorithm. However, graph-base...
详细信息
ISBN:
(纸本)9798350355543
Graph-based approximate nearest neighbor algorithms have shown high neighbor structure representation quality. NN-Descent is a widely known graph-based approximate nearest neighbor (ANN) algorithm. However, graph-based approaches are memory- and *** address the drawbacks, we develop a scalable distributed NN-Descent. Our NEO-DNND (neighbor-checking efficiency optimized distributed NN-Descent) is built on top of MPI and designed to utilize network bandwidth efficiently. NEO-DNND reduces duplicate elements, increases intra-node data sharing, and leverages available DRAM to replicate data that may be sent ***-DNND showed remarkable scalability up to 256 nodes and was able to construct neighborhood graphs from billion-scale datasets. Compared to a leading shared-memory ANN library, NEO-DNND achieved competitive performance even on a single node and exhibited 41.7X better performance by scaling up to 32 nodes. Furthermore, NEO-DNND outperformed a state-of-the-art distributed NN-Descent implementation, achieving up to a 6.0X speedup.
Driving assist applications and connected autonomous vehicle systems are supported using AI models and algorithms, which process and analyze heavy data volumes. High-performance computing units and large memory system...
详细信息
ISBN:
(数字)9798331507695
ISBN:
(纸本)9798331507701
Driving assist applications and connected autonomous vehicle systems are supported using AI models and algorithms, which process and analyze heavy data volumes. High-performance computing units and large memory systems support these models, algorithms, and applications, which results in additional onboard energy consumption. The current trend is also towards full electrification of vehicles and increasing connectivity in the vehicular ecosystem to support collaborative and distributed applications using vehicle-edge-cloud computing. However, with the increased focus on model performance and improving the accuracy of these models and applications, the issue of high-performance computing requirements and resulting energy consumption are overlooked. The problem becomes more challenging and complex for resource-constrained edge devices, which are battery-dependent and have limited memory and computing power. This paper proposes components for an adaptive framework to reduce energy consumption by balancing model accuracy. The contributions include proposing and integrating model partition mechanisms, adaptive deployment across edge devices and approximation strategies for the models. By integrating these components, this framework supports energy-aware development across various platforms. The approach offers a sustainable method for computing and communication-oriented applications within the vehicular ecosystem.
Processing-in-memory (PIM) architectures are promising for accelerating intensive workloads due to their high internal bandwidth. This paper introduces a technique for accelerating Fully Homomorphic Encryption over th...
详细信息
Transitive closure computation is a fundamental operation in graph theory with applications in various domains. However, the increasing size and complexity of real-world graphs make traditional algorithms inefficient,...
详细信息
ISBN:
(数字)9798331524937
ISBN:
(纸本)9798331524944
Transitive closure computation is a fundamental operation in graph theory with applications in various domains. However, the increasing size and complexity of real-world graphs make traditional algorithms inefficient, especially when dealing with large datasets. This paper investigates the optimisation of transitive closure algorithms for high performance computing (HPC) applications. We implement and compare three different methods for computing the adjacency matrix of the transitive closure, based on three different Python libraries (NetworkX, PyTorch and NumPy). Our approach is benchmarked on seven real-world datasets of varying size and density to evaluate performance and scalability. The results show that NumPy achieves the best performance for large and dense graphs. The paper concludes with a discussion of the potential benefits of algorithmic optimization in HPC and security.
Scalable data management is essential for processing large scientific dataset on HPC platforms for distributed deep learning. In-memorydistributed storage is preferred for its speed, enabling rapid, random, and frequ...
详细信息
ISBN:
(纸本)9798350355543
Scalable data management is essential for processing large scientific dataset on HPC platforms for distributed deep learning. In-memorydistributed storage is preferred for its speed, enabling rapid, random, and frequent data access required by stochastic optimizers. Processes use one-sided or collective communication to fetch remote data, with optimal performance depending on (i) dataset characteristics, (ii) training scale, and (iii) interconnection network. Empirical analysis shows collective communication excels with larger mini-batch sizes and/or fewer processes, whereas one-sided communication outperforms at larger scales. We propose MDLoader, a hybrid in-memory data loader for distributed graph neural network training. MDLoader features a model-driven performance estimator that dynamically selects between one-sided and collective communication at the beginning of training using Tree of Parzen Estimators (TPE). Evaluations on NERSC Perlmutter and OLCF Summit show MDLoader outperforms single-backend loaders by up to 2.83× and predicts the suitable communication method with 96.3% (Perlmutter) and 94.3% (Summit) success rate.
Product recommendations have become integral to business development and customer engagement strategies. While traditional techniques like market basket analysis are effective in retail, they are insufficient for the ...
详细信息
ISBN:
(数字)9798331515683
ISBN:
(纸本)9798331515690
Product recommendations have become integral to business development and customer engagement strategies. While traditional techniques like market basket analysis are effective in retail, they are insufficient for the financial sector due to the diversity of banking products and the lack of direct dependency among them. This research paper proposes a novel distributed framework for developing an automated recommendation system leveraging XGBoost to predict banking product purchases based on categorized customer profiles. The framework integrates distributedcomputing technologies such as Hadoop for data storage and PySpark for scalable data processing. The system classifies customers into distinct categories based on their transaction history, demographic attributes, and banking behaviors to deliver personalized product suggestions. Model performance is evaluated using metrics such as the ROC curve, precision, recall, and F1-score. Empirical results demonstrate the system’s effectiveness in enhancing lead generation, targeting high-potential customers, and improving customer satisfaction in real-world financial markets.
暂无评论