In this work, we introduce slot selection and co-allocation algorithms for parallel jobs in distributed computing with non-dedicated and heterogeneous resources (clusters, CPU nodes equipped with multicore processors,...
详细信息
ISBN:
(纸本)9781450365239
In this work, we introduce slot selection and co-allocation algorithms for parallel jobs in distributed computing with non-dedicated and heterogeneous resources (clusters, CPU nodes equipped with multicore processors, networks etc.). A single slot is a time span that can be assigned to a task, which is a part of a parallel job. The job launch requires a co-allocation of a specified number of slots starting and finishing synchronously. The challenge is that slots associated with different heterogeneous resources of distributed computing environments may have arbitrary start and finish points, different performance, latency, pricing policies. Some existing algorithms assign a job to the first set of slots matching the resource request without any optimization (the first fit type), while other algorithms are based on an exhaustive search. In this paper, algorithms for efficient slot selection are studied and compared with known approaches. The novelty of the proposed approach is in a general algorithm selecting a set of slots efficiently according to the specified criterion.
An upcoming frontier for rapidly reconfigurable distributed analytics and services might literally save lives in future military operations. In civilian scenarios, significant efficiencies were gained from interconnec...
详细信息
ISBN:
(纸本)9781538668719
An upcoming frontier for rapidly reconfigurable distributed analytics and services might literally save lives in future military operations. In civilian scenarios, significant efficiencies were gained from interconnecting devices into networked services and applications that automate much of everyday life from smart homes to intelligent transportation. The ecosystem of such applications and services is collectively called the Internet of Things (JOT). Can similar benefits be gained in a military context by developing an IoT for the battlefield? This paper describes unique challenges in such a context as well as potential risks, mitigation strategies, and benefits.
A training model was constructed with convolution neural network. Its policy network was trained by supervised learning. Then the parameters of the policy network were constantly adjusted by self-play and the game str...
详细信息
ISBN:
(纸本)9781538612446
A training model was constructed with convolution neural network. Its policy network was trained by supervised learning. Then the parameters of the policy network were constantly adjusted by self-play and the game strength was enhanced. So it made it surpassed the existing game programs that simply used - pruning algorithm or Monte Carlo algorithm, By using distributed computing to update the data asynchronously, multiple game groups were allocated to different machines at the same time. In this way, a number of common configuration computers could be made full use of.
distributed real-time applications often require an optimal assignment of resources to improve performance. This paper proposes a methodology to optimally assign system resources that enables to minimize the processin...
详细信息
distributed real-time applications often require an optimal assignment of resources to improve performance. This paper proposes a methodology to optimally assign system resources that enables to minimize the processing and communication time. In this study, we defined a case of study partitioned into six subsystems to be simulated with four available processing units. We used undirected graphs to obtain a representation of the system and subsequently solved the resulting NP-hard problem as a mixed-integer quadratic program (MIQP). We also implemented a comprehensive search as groundwork and compared both methods using computational and global time as metrics. Numerical simulations showed that our methodology obtained both a better assignment of computational resources and significant solution time reduction than the comprehensive search. Moreover, the solution increased the rates of shared information between units during the reconciliation process. This methodology can thus be used in applications like distributed state estimation, distributed control or co-simulation.
This work explores a distributed computing setting where K nodes are assigned fractions (subtasks) of a computational task in order to perform the computation in parallel. In this setting, a well-known main bottleneck...
详细信息
ISBN:
(纸本)9781538647813
This work explores a distributed computing setting where K nodes are assigned fractions (subtasks) of a computational task in order to perform the computation in parallel. In this setting, a well-known main bottleneck has been the internode communication cost required to parallelize the task, because unlike the computational cost which could keep decreasing as K increases, the communication cost remains approximately constant, thus bounding the total speedup gains associated to having more computing nodes. This bottleneck was substantially ameliorated by the recent introduction of coded techniques in the context of MapReduce which allowed each node - at the computational cost of having to preprocess approximately t times more subtasks - to reduce its communication cost by approximately t times. In reality though, the associated speed up gains were severely limited by the requirement that larger t and K necessited that the original task be divided into an extremely large number of subtasks. In this work we show how node cooperation, along with a novel assignment of tasks, can help to dramatically ameliorate this limitation. The result applies to wired as well as wireless distributed computing and it is based on the idea of having groups of nodes compute identical mapping tasks and then employing a here-proposed novel D2D coded caching algorithm. In this context, the new approach here manages to achieve a virtual decomposition of the fully connected D2D setting into parallel ones, which significantly reduces the required subpacketization.
Coding for distributed computing supports low-latency computation by relieving the burden of straggling workers. While most existing works assume a simple master-worker model, we consider a hierarchical computational ...
详细信息
ISBN:
(纸本)9781538647813
Coding for distributed computing supports low-latency computation by relieving the burden of straggling workers. While most existing works assume a simple master-worker model, we consider a hierarchical computational structure consisting of groups of workers, motivated by the need to reflect the architectures of real-world distributed computing systems. In this work, we propose a hierarchical coding scheme for this model, as well as analyze its decoding cost and expected computation time. Specifically, we first provide upper and lower bounds on the expected computing time of the proposed scheme. We also show that our scheme enables efficient parallel decoding, thus reducing decoding costs by orders of magnitude over non-hierarchical schemes. When considering both decoding cost and computing time, the proposed hierarchical coding is shown to outperform existing schemes in many practical scenarios.
Low-cost, real-time digital signal processors (DSPs) embedded in generic Internet of Things (IoT) edge devices can make significant contributions to distributed edge computing for industrial IoT (IIoT) networks. The D...
详细信息
ISBN:
(纸本)9781538673744
Low-cost, real-time digital signal processors (DSPs) embedded in generic Internet of Things (IoT) edge devices can make significant contributions to distributed edge computing for industrial IoT (IIoT) networks. The DSP considered in this paper is the Texas Instruments (TI) TMS320C28x DSP (C28x). At the edge of the network, the C28x is programmed using low-level Embedded C programming language to construct the Morlet wavelet. Our implementation at this layer is the first known construction of the Morlet wavelet for C28x DSP using Embedded C. At the fog layer, near the edge of the IoT network, where more computing resources exist, the wavelet is then convolved with healthcare (electrocardiogram) and electrical network signals, using Matlab to reduce signal noise, and to identify important parts of examined signals. Convolution results indicates that the distributed computing approach for low-cost generic devices considered in this paper is suitable for use in large IIoT networks.
In this work, we introduce slot selection and co-allocation algorithms for parallel jobs in distributed computing with non-dedicated and heterogeneous resources (clusters, CPU nodes equipped with multicore processors,...
详细信息
ISBN:
(纸本)9783319937014;9783319937007
In this work, we introduce slot selection and co-allocation algorithms for parallel jobs in distributed computing with non-dedicated and heterogeneous resources (clusters, CPU nodes equipped with multicore processors, networks etc.). A single slot is a time span that can be assigned to a task, which is a part of a parallel job. The job launch requires a co-allocation of a specified number of slots starting and finishing synchronously. The challenge is that slots associated with different heterogeneous resources of distributed computing environments may have arbitrary start and finish points, different pricing policies. Some existing algorithms assign a job to the first set of slots matching the resource request without any optimization (the first fit type), while other algorithms are based on an exhaustive search. In this paper, algorithms for effective slot selection are studied and compared with known approaches. The novelty of the proposed approach is in a general algorithm selecting a set of slots efficient according to the specified criterion.
Modern distributed computing frameworks for cloud computing and high performance computing typically accelerate job performance by dividing a large job into small tasks for execution parallelism. Some tasks, however, ...
详细信息
ISBN:
(纸本)9783319920405;9783319920399
Modern distributed computing frameworks for cloud computing and high performance computing typically accelerate job performance by dividing a large job into small tasks for execution parallelism. Some tasks, however, may run far behind others, which jeopardize the job completion time. In this paper, we present Zeno, a novel system which automatically identifies and diagnoses stragglers for jobs by machine learning methods. First, the system identifies stragglers with an unsupervised clustering method which groups the tasks based on their execution time. It then uses a supervised rule learning algorithm to learn diagnosis rules inferring the stragglers with their resource assignment and usage data. Zeno is evaluated on traces from a Google's Borg system and an Alibaba's Fuxi system. The results demonstrate that our system is able to generate simple and easy-to-read rules with both valuable insights and decent performance in predicting stragglers.
Many data analyzing applications highly rely on timely response from execution, and are referred as time critical data analyzing applications. Due to frequent appearing of gigantic amount of data and analytical comput...
详细信息
ISBN:
(纸本)9781538650356
Many data analyzing applications highly rely on timely response from execution, and are referred as time critical data analyzing applications. Due to frequent appearing of gigantic amount of data and analytical computations, running them on large scale distributed computing environments is often advantageous. The workload of big data applications is often hybrid, i.e., contains a combination of time-critical and regular non-time-critical applications. Resource management for hybrid workloads in complex distributed computing environment is becoming more critical and needs more studies. However, it is difficult to design rule-based approaches best suited for such complex scenarios because many complicated characteristics need to be taken into account. Therefore, we present an innovative reinforcement learning (RL) based resource management approach for hybrid workloads in distributed computing environment. We utilize neural networks to capture desired resource management model, use reinforcement learning with designed value definition to gradually improve the model and use 6-greedy methodology to extend exploration along the reinforcement process. The extensive experiments show that our obtained resource management solution through reinforcement learning is able to greatly surpass the baseline rule-based models. Specifically, the model is good at reducing both the missing deadline occurrences for time-critical applications and lowering average job delay for all jobs in the hybrid workloads. Our reinforcement learning based approach has been demonstrated to be able to provide an efficient resource manager for desired scenarios.
暂无评论