检索结果-内蒙古大学图书馆

Pipeline parallelism with reduced network communication for efficient compute-intensive neural network training

JOURNAL OF SUPERcomputing 2025年第5期81卷 1-22页

作者： Yu, Chanhee Park, Kyongseok Univ Sci & Technol Dept Big Data Sci 245 Daehak ro Daejeon 34112 South Korea Univ Sci & Technol Dept Appl AI 245 Daehak ro Daejeon 34112 South Korea Korea Inst Sci & Technol Informat Ctr Supercomp Technol Dev 245 Daehak ro Daejeon 34141 South Korea

Pipeline parallelism is a distributed method used to train deep neural networks and is suitable for tasks that consume large amounts of memory. However, this method entails a large overhead because of the dependency between devices for performing forward and backward steps using multiple accelerator devices. Although a method to remove forward step dependency through the all-to-all approach has been proposed for training compute-intensive models, it incurs a large overhead when training with many devices and is inefficient with respect to weight memory consumption. Alternatively, we propose a pipeline parallelism method that reduces both network communication using a self-generation concept and overhead by minimizing the weight memory used for acceleration. In a DarkNet53 training throughput experiment using six devices, the proposed method outperforms a baseline by approximately 63.7% in reduction of overhead and communication costs and achieves less memory consumption by approximately 17.0%.

关键词： distributed computing Model parallelism Synchronous pipeline parallelism Deep learning

来源：评论

学校读者我要写书评

暂无评论

A MapReduce-Based Approach for Fast Connected Components Detection from Large-Scale Networks

引用

BIG DATA 2025年第3期13卷 243-268页

作者： Bhat, Sajid Yousuf Abulaish, Muhammad Univ Kashmir Dept Comp Sci Srinagar Jammu & Kashmir India South Asian Univ Dept Comp Sci New Delhi India South Asian Univ Dept Comp Sci New Delhi 110023 India

Owing to increasing size of the real-world networks, their processing using classical techniques has become infeasible. The amount of storage and central processing unit time required for processing large networks is far beyond the capabilities of a high-end computing machine. Moreover, real-world network data are generally distributed in nature because they are collected and stored on distributed platforms. This has popularized the use of the MapReduce, a distributed data processing framework, for analyzing real-world network data. Existing MapReduce-based methods for connected components detection mainly struggle to minimize the number of MapReduce rounds and the amount of data generated and forwarded to the subsequent rounds. This article presents an efficient MapReduce-based approach for finding connected components, which does not forward the complete set of connected components to the subsequent rounds;instead, it writes them to the Hadoop distributed File System as soon as they are found to reduce the amount of data forwarded to the subsequent rounds. It also presents an application of the proposed method in contact tracing. The proposed method is evaluated on several network data sets and compared with two state-of-the-art methods. The empirical results reveal that the proposed method performs significantly better and is scalable to find connected components in large-scale networks.

关键词： graph mining network analysis connected components distributed computing MapReduce and contact tracing

来源：评论

学校读者我要写书评

暂无评论

Adaptive incremental transfer learning for efficient performance modeling of big data workloads

引用

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE 2025年 166卷

作者： Garralda-Barrio, Mariano Eiras-Franco, Carlos Bolon-Canedo, Veronica Univ A Coruna CITIC La Coruna Spain

The rise of data-intensive scalable computing systems, such as Apache Spark, has transformed data processing by enabling the efficient manipulation of large datasets across machine clusters. However, system configuration to optimize performance remains a challenge. This paper introduces an adaptive incremental transfer learning approach to predicting workload execution times. By integrating both unsupervised and supervised learning, we develop models that adapt incrementally to new workloads and configurations. To guide the optimal selection of relevant workloads, the model employs the coefficient of distance variation (CdV) and the coefficient of quality correlation (CqC), combined in the exploration-exploitation balance coefficient (EEBC). Comprehensive evaluations demonstrate the robustness and reliability of our model for performance modeling in Spark applications, with average improvements of up to 31% over state-of-the-art methods. This research contributes to efficient performance tuning systems by enabling transfer learning from historical workloads to new, previously unseen workloads. The full source code is openly available.

关键词： Performance modeling Big data Machine learning Apache spark distributed computing

来源：评论

学校读者我要写书评

暂无评论

***: a scalable Julia toolbox for level set-based topology optimisation

引用

STRUCTURAL AND MULTIDISCIPLINARY OPTIMIZATION 2025年第1期68卷 1-23页

作者： Wegert, Zachary J. Manyer, Jordi Mallon, Connor N. Badia, Santiago Challis, Vivien J. Queensland Univ Technol Sch Math Sci 2 George St Brisbane Qld 4000 Australia Monash Univ Sch Math Wellington Rd Clayton Vic 3800 Australia Monash Univ Dept Chem & Biol Engn Wellington Rd Clayton Vic 3800 Australia

In this paper, we present GridapTopOpt, an extendable framework for level set-based topology optimisation that can be readily distributed across a personal computer or high-performance computing cluster. The package is written in Julia and uses the Gridap package ecosystem for parallel finite element assembly from arbitrary weak formulations of partial differential equations (PDEs) along with the scalable solvers from the Portable and Extendable Toolkit for Scientific computing (PETSc). The resulting user interface is intuitive and easy-to-use, allowing for the implementation of a wide range of topology optimisation problems with a syntax that is near one-to-one with the mathematical notation. Furthermore, we implement automatic differentiation to help mitigate the bottleneck associated with the analytic derivation of sensitivities for complex problems. GridapTopOpt is capable of solving a range of benchmark and research topology optimisation problems with large numbers of degrees of freedom. This educational article demonstrates the usability and versatility of the package by describing the formulation and step-by-step implementation of several distinct topology optimisation problems. The driver scripts for these problems are provided and the package source code is available at https://***/zjwegert/***.

关键词： Topology optimisation Level set method distributed computing Automatic differentiation Julia

来源：评论

学校读者我要写书评

暂无评论

On the computational power of energy-constrained mobile robots

引用

INFORMATION AND COMPUTATION 2025年 303卷

作者： Buchin, Kevin Flocchini, Paola Kostitsyna, Irina Peters, Tom Santoro, Nicola Wada, Koichi Tech Univ Dortmund Dortmund Germany Univ Ottawa Ottawa ON Canada KBR NASA Ames Res Ctr Moffett Field CA USA TU Eindhoven Eindhoven Netherlands Carleton Univ Ottawa ON Canada Hosei Univ Tokyo Japan

We consider distributed systems of autonomous robots operating in the plane under synchronous Look-Compute-Move (LCM) cycles. Prior research on four distinct models assumes robots have unlimited energy. We remove this assumption and investigate systems where robots have limited but renewable energy, requiring inactivity for energy restoration. We analyze the computational impact of this constraint, fully characterizing the relationship between energy-restricted and unrestricted robots. Surprisingly, we show that energy constraints can enhance computational power. Additionally, we study how memory persistence and communication capabilities influence computation under energy constraints. By comparing the four models in this setting, we establish a complete characterization of their computational relationships. A key insight is that energy-limited robots can be modeled as unlimited-energy robots controlled by an adversarial activation scheduler. This provides a novel equivalence framework for analyzing energy-constrained distributed systems. (c) 2025 Elsevier Inc. All rights are reserved, including those for text and data mining, AI training, and similar technologies.

关键词： distributed computing Mobile computational entities Energy constraint Synchronous adversarial schedulers

来源：评论

学校读者我要写书评

暂无评论

A scalable energy internet approach for hop regulated peer-to-peer power trading with connectivity and preference constraints

引用

SUSTAINABLE ENERGY GRIDS & NETWORKS 2025年 42卷

作者： Maya, Neethu Poolla, Bala Kameshwar Srinivasan, Seshadhri Parisio, Alessandra Sundararajan, Narasimman Sundaram, Suresh Indian Inst Sci Dept Aerosp Engn Bangalore India TVS Sensing Solut Uduppakkulam India Univ Manchester Sch Engn Dept Elect & Elect Engn Manchester England Nanyang Technol Univ Sch Elect & Elect Engn Singapore Singapore

Incentives to maximize Peer-to-Peer (P2P) power trading and the establishment of consumer-friendly distributed power markets are essential contributions to the decarbonization of the power sector. This paper presents a Connectivity and Preference Constrained Hop-Regulated Approach for Peer-to-Peer Trading (CPHPT) in sparsely connected communities with reduced infrastructure requirements. The CPHPT approach leverages graph theory to optimize P2P subscriber matching by regulating the maximum hops between the nodes in each routed path of P2P exchange. Simulations using real-world datasets in a 10-home community demonstrate that the CPHPT increases community participation by 29.49%, with P2P power exchanges comparable to full connectivity at reduced infrastructure requirements. When scaled to a 100-home community, the CPHPT approach achieves a marginal performance difference of 2.71% compared to full connectivity while lowering the connectivity infrastructure by 93.4%. The CPHPT approach has a mean runtime of 8.9 s for a 3-h window with 30-min intervals in a 100-home community, indicating its scalability and feasibility for real-time implementation.

关键词： Energy internet Energy market Peer-to-peer network Energy sharing Connected community Market clearance Optimization distributed computing

来源：评论

学校读者我要写书评

暂无评论

Co-Simulation of Interconnection Between Smart Power Grid and Smart Cities Platform via Massive Machine-Type Communication

引用

SENSORS 2025年第5期25卷 1517页

作者： Rodrigues, Luiz H. N. Almeida, Carlos F. M. Kagan, Nelson Rosa, Luiz H. L. dos Santos, Milana L. Univ Sao Paulo Elect Engn Dept BR-01000000 Sao Paulo Brazil Inst Fed Sao Paulo Elect Engn Dept BR-01000000 Sao Paulo Brazil

With the advent of Industry 5.0, the electrical sector has been endowed with intelligent devices that are propelling high penetration of distributed energy microgeneration, VPP, smart buildings, and smart plants and imposing new challenges on the sector. This new environment requires a smarter network, including transforming the simple electricity customer into a "smart customer" who values the quality of energy and its rational use. The SPG (smart power grid) is the perfect solution for meeting these needs. It is crucial to understand energy use to guarantee quality of service and meet data security requirements. The use of simulations to map the behavior of complex infrastructures is the best strategy because it overcomes the limitations of traditional analytical solutions. This article presents the ICT laboratory structure developed within the Department of Electrical Engineering of the Polytechnic School of the Universidade de S & atilde;o Paulo (USP). It is based on an architecture that utilizes LTE/EPC wireless technology (4G, 5G, and B5G) to enable machine-to-machine communication (mMTC) between SPG elements using edge computing (MEC) resources and those of smart city platforms. We evaluate this proposal through simulations using data from real and emulated equipment and co-simulations shared by SPG laboratories at POLI-USP. Finally, we present the preliminary results of integration of the power laboratory, network simulation (ns-3), and a smart city platform (InterSCity) for validation and testing of the architecture.

关键词： co-simulation smart power grids smart cities massive machine-to-machine-type communication distributed computing

来源：评论

学校读者我要写书评

暂无评论

Blind Interference Alignment for MapReduce: Exploiting Side-Information With Reconfigurable Antennas

引用

IEEE TRANSACTIONS ON INFORMATION THEORY 2025年第4期71卷 2604-2625页

作者： Lu, Yuxiang Jafar, Syed A. Univ Calif Irvine Ctr Pervas Commun & Comp CPCC Irvine CA 92697 USA

In order to explore how blind interference alignment (BIA) schemes may take advantage of side-information in computation tasks, we study the degrees of freedom (DoF) of a K user wireless network setting that arises in full-duplex wireless MapReduce applications. In this setting the receivers are assumed to have reconfigurable antennas and channel knowledge, while the transmitters have neither, i.e., the transmitters lack channel knowledge and are only equipped with conventional antennas. The central ingredient of the problem formulation is the message structure arising out of the Shuffle phase of MapReduce, whereby each transmitter has a subset of messages that need to be delivered to various receivers, and each receiver has a subset of messages available to it in advance as side-information. We approach this problem by decomposing it into distinctive stages that help identify key ingredients of the overall solution. The novel elements that emerge from the first stage, called broadcast with groupcast messages, include an outer maximum distance separable (MDS) code structure at the transmitter, and an algorithm for iteratively determining groupcast-optimal reconfigurable antenna switching patterns at the receiver to achieve intra-message (among the symbols of the same message) alignment. The next stage, called unicast with side-information, reveals optimal inter-message (among symbols of different messages) alignment patterns to exploit side-information, and by a relabeling of messages, connects to the desired MapReduce setting.

关键词： Transmitters Receiving antennas Coherence Wireless networks Transmitting antennas Interference channels Full-duplex system Signal to noise ratio Training Symbols Reconfigurable devices Broadcasting MISO distributed computing

来源：评论

学校读者我要写书评

暂无评论

ERDA: Evolving Robotic Dragonfly Algorithm for target search in unknown multi-robot environment

引用

EVOLVING SYSTEMS 2025年第1期16卷 1-19页

作者： Joseph, Dani Reagan Vivek Ramapackiyam, Shantha Selvakumari Mepco Schlenk Engn Coll Dept Elect & Commun Engn Sivakasi Tamilnadu India

Target search in an unknown environment is a major challenge in disaster relief, hazardous areas, finding leak sources, and surveillance. This paper proposes an Evolving Robotic Dragonfly Algorithm (ERDA) to conduct the target search using a multi-robot team. It works as the distributed control mechanism for the robots. The swarm behaviors of dragonflies in the Dragonfly Algorithm (DA) are improved to solve the multi-robot target search problem. The robot that exhibits the best fitness acts as the leader of the team. The leader robot utilizes the gradient information to evolve the search direction towards the target. ERDA employs an adaptive inertia weight to improve the diversity in the team. The enemy-eluding behavior of DA is adapted to support obstacle avoidance. These factors enhance the performance of the proposed algorithm. The ERDA is rigorously evaluated and compared with existing algorithms. Experiments are conducted in simple and cluttered environments with varying count of obstacles. Also, experiments are carried out with varying number of robots and different environment sizes to study the efficiency and effectiveness of the proposed method. ERDA improved the success rate by 7.41% and reduced the mean iteration count by 53.29% in the cluttered environment. The results obtained indicate that ERDA exhibits better performance than the existing methods.

关键词： Multi-robot target search distributed computing Adaptive inertia weight Leader robot Gradient information Nature inspired algorithm

来源：评论

学校读者我要写书评

暂无评论

Reproducibility of the DaCe Framework on NPBench Benchmarks

引用

IEEE TRANSACTIONS ON PARALLEL AND distributed SYSTEMS 2025年第5期36卷 841-846页

作者： Govind, Anish Jing, Yuchen Dao, Stefanie Granado, Michael Handran, Rachel Margarian, Davit Mikhailov, Matthew Vo, Danny Gardus, Matei-Alexandru Vu, Khai Bouius, Derek Chin, Bryan Tatineni, Mahidhar Thomas, Mary Univ Calif San Diego Elect Comp Engn La Jolla CA 92093 USA Univ Calif San Diego Comp Sci & Engn La Jolla CA 92093 USA Univ Calif San Diego Cognit Sci La Jolla CA 92093 USA Univ Calif San Diego Math La Jolla CA 92093 USA Adv Micro Devices Inc Santa Clara CA 95054 USA Univ Calif San Diego San Diego Supercomp Ctr La Jolla CA 92093 USA

DaCe is a framework for Python that claims to provide massive speedups with C-like speeds compared to already existing high-performance Python frameworks (e.g. Numba or Pythran). In this work, we take a closer look at reproducing the NPBench work. We use performance results to confirm that NPBench achieves higher performance than NumPy in a variety of benchmarks and provide reasons as to why DaCe is not truly as portable as it claims to be, but with a small adjustment it can run anywhere.

关键词： Benchmark testing Graphics processing units Python Random access memory Hardware Optimization Software distributed computing high performance computing parallel programming performance analysis reproducibility supercomputers

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：