Many simulators in several fields use the finite difference method and they must solve the large sparse linear equations related. Particularly, if we use the direct solution method because of the convergency problem, ...
详细信息
Many simulators in several fields use the finite difference method and they must solve the large sparse linear equations related. Particularly, if we use the direct solution method because of the convergency problem, it is necessary to adopt a method that can reduce the CPU time greatly. The Multi-Step Diakoptics (MSD) method is proposed as a parallel computation method with a direct solution which is based on Diakoptics, that is, a tearing-based parallel computation method for the sparse linear equations. We have applied the MSD algorithm for one, two and three dimensional finite difference methods. We require a parallel schedule that automatically partitions the desired object's region for study, assigns the processor elements to the partitioned regions according to the MSD method, and controls communications among the processor elements. This paper describes a parallel scheduling that was extended from a one dimensional case to a three dimensional case for the MSD method, and the evaluation of the algorithm using a massively parallel computer with distributed memory (AP1000).
Workloads with precedence constraints due to data dependencies are common in various applications. These workloads can be represented as directed acyclic graphs (DAG), and are often data-intensive, meaning that data l...
详细信息
ISBN:
(纸本)9783030576752;9783030576745
Workloads with precedence constraints due to data dependencies are common in various applications. These workloads can be represented as directed acyclic graphs (DAG), and are often data-intensive, meaning that data loading cost is the dominant factor and thus cache misses should be minimized We address the problem of parallel scheduling of a DAG of data-intensive tasks to minimize makespan. To do so, we propose greedy online scheduling algorithms that take load balancing, data dependencies, and data locality into account. Simulations and an experimental evaluation using an Apache Spark cluster demonstrate the advantages of our solutions.
To enhance the capacity of wireless mesh networks,a key technique is widely investigated which is the usage of multi-radio and multi-channel *** this paper,a new parallel scheduling system is proposed which exploits M...
详细信息
To enhance the capacity of wireless mesh networks,a key technique is widely investigated which is the usage of multi-radio and multi-channel *** this paper,a new parallel scheduling system is proposed which exploits MAC diversities by transmitting packets on the radios *** with conventional packet transmission which follows 'one flow one radio',the new system uses radio diversity to transmit the packets on different radios *** kernel components of this system are selection module and schedule module.A localized selecting algorithm is implemented in the selection model to choose the right radios based on the quality of wireless links; two distributed packet-scheduling algorithms are optional with the schedule ***,a routing metric adapting this system is *** have carried out a comprehensive performance evaluation of this system using *** results show that it can successfully harness diversity of multi-radio and multi-channel to provide considerable improvements over a baseline multi-channel system in several situations.
The growing development of the Internet of Things(IoT)is accelerating the emergence and growth of new IoT services and applications,which will result in massive amounts of data being generated,transmitted and pro-cess...
详细信息
The growing development of the Internet of Things(IoT)is accelerating the emergence and growth of new IoT services and applications,which will result in massive amounts of data being generated,transmitted and pro-cessed in wireless communication *** Edge Computing(MEC)is a desired paradigm to timely process the data from IoT for value *** MEC,a number of computing-capable devices are deployed at the network edge near data sources to support edge computing,such that the long network transmission delay in cloud computing paradigm could be *** an edge device might not always have sufficient resources to process the massive amount of data,computation offloading is significantly important considering the coop-eration among edge ***,the dynamic traffic characteristics and heterogeneous computing capa-bilities of edge devices challenge the *** addition,different scheduling schemes might provide different computation delays to the offloaded ***,offloading in mobile nodes and scheduling in the MEC server are coupled to determine service *** paper seeks to guarantee low delay for computation intensive applica-tions by jointly optimizing the offloading and scheduling in such an MEC *** propose a Delay-Greedy Computation Offloading(DGCO)algorithm to make offloading decisions for new tasks in distributed computing-enabled mobile devices.A Reinforcement Learning-based parallel scheduling(RLPS)algorithm is further designed to schedule offloaded tasks in the multi-core MEC *** an offloading delay broadcast mechanism,the DGCO and RLPS cooperate to achieve the goal of delay-guarantee-ratio ***,the simulation results show that our proposal can bound the end-to-end delay of various *** under slightly heavy task load,the delay-guarantee-ratio given by DGCO-RLPS can still approximate 95%,while that given by benchmarked algorithms is reduced to intolerable *** simulation resul
Due to the nature of a dynamic group of picture (GOP) structure, parallel scheduling for video encoding becomes challenging. To address this, the balanced frame-level parallel scheduling algorithms are developed. The ...
详细信息
Due to the nature of a dynamic group of picture (GOP) structure, parallel scheduling for video encoding becomes challenging. To address this, the balanced frame-level parallel scheduling algorithms are developed. The proposed approaches first determine the frame priority and then the thread priority assignment for scheduling. The concept of the algorithms lies in the analysis of coding complexity, temporal influence, and the required temporal burden to finish coding. To complete the scheduling with the dynamic GOP structure, a block-based abrupt and gradual scene change detection algorithm is also proposed to determine the GOP structure adaptively. The experiments show that the scheduling performance is close to the optimal. In addition, the concept of batch processing is incorporated so that the required buffer can be reduced.
This paper deals with a particular scheduling problem. We consider unit-time jobs and in-tree precedence constraints while minimizing the mean flow time. This problem is observed as P vertical bar p(j) = 1, in-tree ve...
详细信息
This paper deals with a particular scheduling problem. We consider unit-time jobs and in-tree precedence constraints while minimizing the mean flow time. This problem is observed as P vertical bar p(j) = 1, in-tree vertical bar C-j with the use of the 3-filed notation. To the best of our knowledge, its complexity is still open. Through a reduction from 3-PARTITION, we show that this problem is strongly NP-hard.
In this paper we present a study of the job arrival patterns from a parallel computing system and the impact of such arrival patterns on the performance of parallel scheduling strategies. Using workload data from the ...
详细信息
In this paper we present a study of the job arrival patterns from a parallel computing system and the impact of such arrival patterns on the performance of parallel scheduling strategies. Using workload data from the Cornell Theory Center, we develop a class of traffic models to characterize these arrival patterns. Our analysis of the job arrival data illustrates traffic patterns that exhibit heavy-tailed behavior and other characteristics which are quite different from the arrival processes used in previous studies of parallel scheduling. We then investigate the impact of these arrival traffic patterns on the performance of parallel space-sharing strategies, including the derivation of some scheduling optimality results. (C) 1999 Published by Elsevier Science B.V. All rights reserved.
Minimizing execution time, energy consumption, and network load through scheduling algorithms is challenging for multi-processor-on-chip (MPSoC) based network-on-chip (NoC) systems. MPSoC based systems are prevalent i...
详细信息
Minimizing execution time, energy consumption, and network load through scheduling algorithms is challenging for multi-processor-on-chip (MPSoC) based network-on-chip (NoC) systems. MPSoC based systems are prevalent in high performance computing systems. With the increase in computing capabilities of computing hardware, application requirements have increased many folds, particularly for real world scientific applications. scheduling large scientific workflows consisting hundreds and thousands of tasks consume significant amount of time and resources. In this article, energy aware parallel scheduling techniques are presented primarily aimed at reducing the algorithm execution time while considering network load. Experimental results reveal that the proposed parallel scheduling algorithms achieve significant reduction in execution time.
Runtime Incremental parallel scheduling (RIPS) is an alternative strategy to the commonly used dynamic scheduling. in this scheduling strategy, the system scheduling activity alternates with the underlying computation...
详细信息
Runtime Incremental parallel scheduling (RIPS) is an alternative strategy to the commonly used dynamic scheduling. in this scheduling strategy, the system scheduling activity alternates with the underlying computation work. RIPS utilizes the advanced parallel scheduling technique to produce a low-overhead, high-quality load balancing as well as adapting to irregular applications. This paper presents methods for scheduling a single job on a dedicated parallel machine.
The Chained-Cubic Tree (CCT) interconnection network topology was recently proposed as a continuation for the extended efforts in the area of interconnection networks' performance improvement. This topology, which...
详细信息
The Chained-Cubic Tree (CCT) interconnection network topology was recently proposed as a continuation for the extended efforts in the area of interconnection networks' performance improvement. This topology, which promises to exhibit the best properties of the hypercube and tree topologies, needs to be deeply investigated in order to evaluate its performance among other interconnection networks' topologies. This work comes as a complementary effort, in which the load balancing technique is investigated as one of the most important aspects of performance improvement. This paper proposes a new load balancing algorithm on CCT interconnection networks. The proposed algorithm, which is called Hybrid Dynamic parallel scheduling Algorithm (HD-PSA), is a combination of two common load balancing strategies;dynamic load balancing and parallel scheduling. The performance of the proposed algorithm is evaluated both, analytically and experimentally, in terms of various performance metrics;including, execution time, load balancing accuracy, communication cost, number of tasks hops, and tasks locality.
暂无评论