The load distribution in the local time stepping (LTS) method significantly impacts its computing efficiency. This letter proposes a minimal round-trip (MRT) strategy of the LTS method to balance the communication loa...
详细信息
The load distribution in the local time stepping (LTS) method significantly impacts its computing efficiency. This letter proposes a minimal round-trip (MRT) strategy of the LTS method to balance the communication load of the discontinuous Galerkin time-domain (DGTD) method. By discovering the matching of the connected graph of computing nodes, independent communication with a similar load can be done in the same round trip to minimize waiting between nodes in nonblocking communication, thereby decreasing the communication time of the DGTD-LTS technique. The numerical results indicate that the MRT strategy reduces the communication time between processors by 50% and improves the parallel performance when the LTS method is implemented in the DGTDmethod. The parallel scale of theMRT approachmay be increased to 16 000 nodes (1 040 000 cores) on the supercomputer, and the parallel efficiency is greater than 73.8%.
The work offers to use distributed computing on client-server architecture to solve high dimensional tasks. VLSI topological layout design problem is employed as a high dimensional task. Simulation modelling shows tha...
详细信息
ISBN:
(纸本)9781467368575
The work offers to use distributed computing on client-server architecture to solve high dimensional tasks. VLSI topological layout design problem is employed as a high dimensional task. Simulation modelling shows that the hierarchical architecture allows reducing the design time by an order of magnitude compared to parallel-serial client-server architecture.
Edge computing has recently garnered significant interest in many Internet of Things (IoT) applications. However, the excessive overhead during data exchange still remains an open challenge, especially for large-scale...
详细信息
Edge computing has recently garnered significant interest in many Internet of Things (IoT) applications. However, the excessive overhead during data exchange still remains an open challenge, especially for large-scale data processing tasks. This paper considers a master-aided distributed computing system with multiple edge computing nodes and a master node, where the master node helps edge nodes compute output functions. We propose a coded scheme to reduce the communication latency by exploiting computation and communication capabilities of all nodes and creating coded multicast opportunities. More importantly, we prove that the proposed scheme is always optimal, i.e., achieving the minimum communication latency, for arbitrary computing and storage abilities at the master. This extends the previous optimality results in the extreme cases (either the master could compute all input files or compute nothing) to the general case. Finally, numerical results and TeraSort experiments demonstrate that our schemes can greatly reduce the communication latency compared with the existing schemes.
Modern society is experiencing a data explosion thanks to rapid IT development and the increasing intelligence of devices. The vast and complex data can be utilized to extract actionable insight using a big data proce...
详细信息
Modern society is experiencing a data explosion thanks to rapid IT development and the increasing intelligence of devices. The vast and complex data can be utilized to extract actionable insight using a big data processing framework. Hadoop is a popular big data processing framework on heterogeneous commodity hardware. While Hadoop offers a robust framework for large-scale, data-intensive tasks via its MapReduce paradigm, hardware heterogeneity across nodes often leads to straggler effects that degrade Hadoop cluster performance. This paper introduces the Adaptive Node-Oriented Data placement for Efficient Hadoop Execution (ANODE) method, which leverages historical job execution data to dynamically assess each node's processing capability. By employing an agent-based mechanism, ANODE optimizes block allocation within the data node, alleviating imbalances caused by Hadoop's default uniform placement strategy. Experimental results on a heterogeneous eleven-node Hadoop cluster demonstrate that ANODE reduces job completion times by up to 25%, significantly enhancing data locality and resource utilization compared to the default approach.
The ARBITRARY PATTERN FORMATION (APF) is widely studied in distributed computing for swarm robots. This paper deals with the APF problem in an infinite grid under an asynchronous scheduler. In [Bose K, Adhikary R, Kun...
详细信息
The ARBITRARY PATTERN FORMATION (APF) is widely studied in distributed computing for swarm robots. This paper deals with the APF problem in an infinite grid under an asynchronous scheduler. In [Bose K, Adhikary R, Kundu MK, et al. Arbitrary pattern formation on infinite grid by asynchronous oblivious robots. Theor Comput Sci. 2020;815:213-227], the authors proposed an algorithm for APF problem in OBLOT model under an asynchronous scheduler, but the proposed algorithm was neither time optimal nor move optimal. This work provides two algorithms that solve APF problem in an asynchronous scheduler. The first algorithm is move optimal considering OBLOT model and the second algorithm is move and time optimal considering the LUMI model, where each robot has one light having three distinct colours.
distributed computing is the method of running CPU intensive computations on multiple computers collectively in order to achieve a common objective. Common problems that can be solved on the distributed systems includ...
详细信息
distributed computing is the method of running CPU intensive computations on multiple computers collectively in order to achieve a common objective. Common problems that can be solved on the distributed systems include climate/weather modeling, earthquake simulation, evolutionary computing problems and so on. These type of problems may involve billions or even trillions of computations. A single computer is not capable to finish these computations in short span of time, which is typically in days. distributed computation helps to solve these problems in hours, which could take weeks to solve on a single computer. distributed computing generally uses the existing resources of the organization. Traffic simulation is the process of simulating transportation systems through software on a virtual road network. Traffic simulation helps in analyzing city traffic at different time intervals of a single day. Common use cases could be analyzing city wide traffic, estimating traffic demand at a particular traffic junction and so on. This paper discusses about the approach to use distributed computing paradigm for optimizing the traffic simulations. Optimizing simulations involves running a number of traffic simulations followed by estimating the nearness of that simulation to the real available traffic data. This real data could be obtained by either manual counting at traffic junctions, or using the probes such as loop inductors, CCTV cameras etc. This distributed computing based approach works to find the best traffic simulation corresponding to the real data in hand, using evolutionary computing technique.
In collaborative learning, learners coordinate to enhance each of their learning performances. From the perspective of any learner, a critical challenge is to filter out unqualified collaborators. We propose a framewo...
详细信息
In collaborative learning, learners coordinate to enhance each of their learning performances. From the perspective of any learner, a critical challenge is to filter out unqualified collaborators. We propose a framework named meta clustering to address the challenge. Unlike the classical problem of clustering data points, meta clustering categorizes learners. Assuming each learner performs a supervised regression on a standalone local dataset, we propose a Select-Exchange-Cluster (SEC) method to classify the learners by their underlying supervised functions. We theoretically show that the SEC can cluster learners into accurate collaboration sets. Empirical studies corroborate the theoretical analysis and demonstrate that SEC can be computationally efficient, robust against learner heterogeneity, and effective in enhancing single-learner performance. Also, we show how the proposed approach may be used to enhance data fairness. Supplementary materials for this article are available online.
In this paper, we propose a new coded computation scheme that can alleviate straggler effects in distributed computing. We consider data security and master's privacy for matrix multiplication tasks. The proposed ...
详细信息
In this paper, we propose a new coded computation scheme that can alleviate straggler effects in distributed computing. We consider data security and master's privacy for matrix multiplication tasks. The proposed scheme, called fully private and secure coded matrix multiplication (FPSCMM), ensures data security and master's privacy on two data matrices for multiplication tasks from colluding workers. We also show that the storage overhead at workers can be reduced by FPSCMM, since it is enough for workers to store the encoded matrices with sub-blocks. Lastly, we compare FPSCMM with the existing master's privacy-preserving coded matrix multiplication schemes.(c) 2023 The Author(s). Published by Elsevier B.V. on behalf of The Korean Institute of Communications and Information Sciences. This is an open access article under the CC BY license (http://***/licenses/by/4.0/).
The current industrial automation landscape faces considerable challenges due to the increasing growth of Industrial Internet of Things devices, cloud services, information technology (IT)/operational technology (OT) ...
详细信息
The current industrial automation landscape faces considerable challenges due to the increasing growth of Industrial Internet of Things devices, cloud services, information technology (IT)/operational technology (OT) convergence, along with evolving hyperscaler technologies, such as Kubernetes and distributed computing. This article provides an in-depth review of the existing Instrumentation Society of America (ISA)-95 Model and its current role in supporting manufacturing systems and their interactions. Additionally, it examines how emerging technologies are impacting the security, design, and management of OT networks. As existing perimeter-based models, such as ISA-95, are pushed to their limits, the concepts of zero-trust architectures, and policy-based or software-defined networks are being explored as the next generation of OT network design. This article aims to provide a high-level introduction to the concepts and disruptive technologies, and introduce the potential implementation risks and challenges of these principles within traditional IT/OT converged solutions.
Modern distributed systems are complex. They include hundreds of components that implement complex protocols such as scheduling, replication, and access control. These systems are expected to offer high availability a...
详细信息
ISBN:
(纸本)9798331531317;9798331531300
Modern distributed systems are complex. They include hundreds of components that implement complex protocols such as scheduling, replication, and access control. These systems are expected to offer high availability and preserve their data even in the face of external environmental faults. Testing is the primary approach for improving system reliability. Testing against environmental faults such as hardware failures, memory corruption, and network problems is complicated since they can happen at any step in the protocol and affect any component. We present Slicify, a generic framework to test the network partition resilience of distributed systems. Slicify injects network partitions during unit tests to analyze system behavior in their presence. Slicify reduces the test space in an application-agnostic fashion with its novel connection tracking mechanism. We verify Slicify's capabilities by reproducing previously documented failures in two production systems. In addition, we demonstrate its effectiveness by uncovering new failures in three popular distributed systems.
暂无评论