Pipeline parallelism is a distributed method used to train deep neural networks and is suitable for tasks that consume large amounts of memory. However, this method entails a large overhead because of the dependency b...
详细信息
Pipeline parallelism is a distributed method used to train deep neural networks and is suitable for tasks that consume large amounts of memory. However, this method entails a large overhead because of the dependency between devices for performing forward and backward steps using multiple accelerator devices. Although a method to remove forward step dependency through the all-to-all approach has been proposed for training compute-intensive models, it incurs a large overhead when training with many devices and is inefficient with respect to weight memory consumption. Alternatively, we propose a pipeline parallelism method that reduces both network communication using a self-generation concept and overhead by minimizing the weight memory used for acceleration. In a DarkNet53 training throughput experiment using six devices, the proposed method outperforms a baseline by approximately 63.7% in reduction of overhead and communication costs and achieves less memory consumption by approximately 17.0%.
Owing to increasing size of the real-world networks, their processing using classical techniques has become infeasible. The amount of storage and central processing unit time required for processing large networks is ...
详细信息
Owing to increasing size of the real-world networks, their processing using classical techniques has become infeasible. The amount of storage and central processing unit time required for processing large networks is far beyond the capabilities of a high-end computing machine. Moreover, real-world network data are generally distributed in nature because they are collected and stored on distributed platforms. This has popularized the use of the MapReduce, a distributed data processing framework, for analyzing real-world network data. Existing MapReduce-based methods for connected components detection mainly struggle to minimize the number of MapReduce rounds and the amount of data generated and forwarded to the subsequent rounds. This article presents an efficient MapReduce-based approach for finding connected components, which does not forward the complete set of connected components to the subsequent rounds;instead, it writes them to the Hadoop distributed File System as soon as they are found to reduce the amount of data forwarded to the subsequent rounds. It also presents an application of the proposed method in contact tracing. The proposed method is evaluated on several network data sets and compared with two state-of-the-art methods. The empirical results reveal that the proposed method performs significantly better and is scalable to find connected components in large-scale networks.
The rise of data-intensive scalable computing systems, such as Apache Spark, has transformed data processing by enabling the efficient manipulation of large datasets across machine clusters. However, system configurat...
详细信息
The rise of data-intensive scalable computing systems, such as Apache Spark, has transformed data processing by enabling the efficient manipulation of large datasets across machine clusters. However, system configuration to optimize performance remains a challenge. This paper introduces an adaptive incremental transfer learning approach to predicting workload execution times. By integrating both unsupervised and supervised learning, we develop models that adapt incrementally to new workloads and configurations. To guide the optimal selection of relevant workloads, the model employs the coefficient of distance variation (CdV) and the coefficient of quality correlation (CqC), combined in the exploration-exploitation balance coefficient (EEBC). Comprehensive evaluations demonstrate the robustness and reliability of our model for performance modeling in Spark applications, with average improvements of up to 31% over state-of-the-art methods. This research contributes to efficient performance tuning systems by enabling transfer learning from historical workloads to new, previously unseen workloads. The full source code is openly available.
In this paper, we present GridapTopOpt, an extendable framework for level set-based topology optimisation that can be readily distributed across a personal computer or high-performance computing cluster. The package i...
详细信息
In this paper, we present GridapTopOpt, an extendable framework for level set-based topology optimisation that can be readily distributed across a personal computer or high-performance computing cluster. The package is written in Julia and uses the Gridap package ecosystem for parallel finite element assembly from arbitrary weak formulations of partial differential equations (PDEs) along with the scalable solvers from the Portable and Extendable Toolkit for Scientific computing (PETSc). The resulting user interface is intuitive and easy-to-use, allowing for the implementation of a wide range of topology optimisation problems with a syntax that is near one-to-one with the mathematical notation. Furthermore, we implement automatic differentiation to help mitigate the bottleneck associated with the analytic derivation of sensitivities for complex problems. GridapTopOpt is capable of solving a range of benchmark and research topology optimisation problems with large numbers of degrees of freedom. This educational article demonstrates the usability and versatility of the package by describing the formulation and step-by-step implementation of several distinct topology optimisation problems. The driver scripts for these problems are provided and the package source code is available at https://***/zjwegert/***.
We consider distributed systems of autonomous robots operating in the plane under synchronous Look-Compute-Move (LCM) cycles. Prior research on four distinct models assumes robots have unlimited energy. We remove this...
详细信息
We consider distributed systems of autonomous robots operating in the plane under synchronous Look-Compute-Move (LCM) cycles. Prior research on four distinct models assumes robots have unlimited energy. We remove this assumption and investigate systems where robots have limited but renewable energy, requiring inactivity for energy restoration. We analyze the computational impact of this constraint, fully characterizing the relationship between energy-restricted and unrestricted robots. Surprisingly, we show that energy constraints can enhance computational power. Additionally, we study how memory persistence and communication capabilities influence computation under energy constraints. By comparing the four models in this setting, we establish a complete characterization of their computational relationships. A key insight is that energy-limited robots can be modeled as unlimited-energy robots controlled by an adversarial activation scheduler. This provides a novel equivalence framework for analyzing energy-constrained distributed systems. (c) 2025 Elsevier Inc. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Incentives to maximize Peer-to-Peer (P2P) power trading and the establishment of consumer-friendly distributed power markets are essential contributions to the decarbonization of the power sector. This paper presents ...
详细信息
Incentives to maximize Peer-to-Peer (P2P) power trading and the establishment of consumer-friendly distributed power markets are essential contributions to the decarbonization of the power sector. This paper presents a Connectivity and Preference Constrained Hop-Regulated Approach for Peer-to-Peer Trading (CPHPT) in sparsely connected communities with reduced infrastructure requirements. The CPHPT approach leverages graph theory to optimize P2P subscriber matching by regulating the maximum hops between the nodes in each routed path of P2P exchange. Simulations using real-world datasets in a 10-home community demonstrate that the CPHPT increases community participation by 29.49%, with P2P power exchanges comparable to full connectivity at reduced infrastructure requirements. When scaled to a 100-home community, the CPHPT approach achieves a marginal performance difference of 2.71% compared to full connectivity while lowering the connectivity infrastructure by 93.4%. The CPHPT approach has a mean runtime of 8.9 s for a 3-h window with 30-min intervals in a 100-home community, indicating its scalability and feasibility for real-time implementation.
With the advent of Industry 5.0, the electrical sector has been endowed with intelligent devices that are propelling high penetration of distributed energy microgeneration, VPP, smart buildings, and smart plants and i...
详细信息
With the advent of Industry 5.0, the electrical sector has been endowed with intelligent devices that are propelling high penetration of distributed energy microgeneration, VPP, smart buildings, and smart plants and imposing new challenges on the sector. This new environment requires a smarter network, including transforming the simple electricity customer into a "smart customer" who values the quality of energy and its rational use. The SPG (smart power grid) is the perfect solution for meeting these needs. It is crucial to understand energy use to guarantee quality of service and meet data security requirements. The use of simulations to map the behavior of complex infrastructures is the best strategy because it overcomes the limitations of traditional analytical solutions. This article presents the ICT laboratory structure developed within the Department of Electrical Engineering of the Polytechnic School of the Universidade de S & atilde;o Paulo (USP). It is based on an architecture that utilizes LTE/EPC wireless technology (4G, 5G, and B5G) to enable machine-to-machine communication (mMTC) between SPG elements using edge computing (MEC) resources and those of smart city platforms. We evaluate this proposal through simulations using data from real and emulated equipment and co-simulations shared by SPG laboratories at POLI-USP. Finally, we present the preliminary results of integration of the power laboratory, network simulation (ns-3), and a smart city platform (InterSCity) for validation and testing of the architecture.
In order to explore how blind interference alignment (BIA) schemes may take advantage of side-information in computation tasks, we study the degrees of freedom (DoF) of a K user wireless network setting that arises in...
详细信息
In order to explore how blind interference alignment (BIA) schemes may take advantage of side-information in computation tasks, we study the degrees of freedom (DoF) of a K user wireless network setting that arises in full-duplex wireless MapReduce applications. In this setting the receivers are assumed to have reconfigurable antennas and channel knowledge, while the transmitters have neither, i.e., the transmitters lack channel knowledge and are only equipped with conventional antennas. The central ingredient of the problem formulation is the message structure arising out of the Shuffle phase of MapReduce, whereby each transmitter has a subset of messages that need to be delivered to various receivers, and each receiver has a subset of messages available to it in advance as side-information. We approach this problem by decomposing it into distinctive stages that help identify key ingredients of the overall solution. The novel elements that emerge from the first stage, called broadcast with groupcast messages, include an outer maximum distance separable (MDS) code structure at the transmitter, and an algorithm for iteratively determining groupcast-optimal reconfigurable antenna switching patterns at the receiver to achieve intra-message (among the symbols of the same message) alignment. The next stage, called unicast with side-information, reveals optimal inter-message (among symbols of different messages) alignment patterns to exploit side-information, and by a relabeling of messages, connects to the desired MapReduce setting.
Target search in an unknown environment is a major challenge in disaster relief, hazardous areas, finding leak sources, and surveillance. This paper proposes an Evolving Robotic Dragonfly Algorithm (ERDA) to conduct t...
详细信息
Target search in an unknown environment is a major challenge in disaster relief, hazardous areas, finding leak sources, and surveillance. This paper proposes an Evolving Robotic Dragonfly Algorithm (ERDA) to conduct the target search using a multi-robot team. It works as the distributed control mechanism for the robots. The swarm behaviors of dragonflies in the Dragonfly Algorithm (DA) are improved to solve the multi-robot target search problem. The robot that exhibits the best fitness acts as the leader of the team. The leader robot utilizes the gradient information to evolve the search direction towards the target. ERDA employs an adaptive inertia weight to improve the diversity in the team. The enemy-eluding behavior of DA is adapted to support obstacle avoidance. These factors enhance the performance of the proposed algorithm. The ERDA is rigorously evaluated and compared with existing algorithms. Experiments are conducted in simple and cluttered environments with varying count of obstacles. Also, experiments are carried out with varying number of robots and different environment sizes to study the efficiency and effectiveness of the proposed method. ERDA improved the success rate by 7.41% and reduced the mean iteration count by 53.29% in the cluttered environment. The results obtained indicate that ERDA exhibits better performance than the existing methods.
DaCe is a framework for Python that claims to provide massive speedups with C-like speeds compared to already existing high-performance Python frameworks (e.g. Numba or Pythran). In this work, we take a closer look at...
详细信息
DaCe is a framework for Python that claims to provide massive speedups with C-like speeds compared to already existing high-performance Python frameworks (e.g. Numba or Pythran). In this work, we take a closer look at reproducing the NPBench work. We use performance results to confirm that NPBench achieves higher performance than NumPy in a variety of benchmarks and provide reasons as to why DaCe is not truly as portable as it claims to be, but with a small adjustment it can run anywhere.
暂无评论