The study of information security and privacy is currently quite popular. In parallel, several computing paradigms, such as cloud and edge computing, are already creating a unique ecosystem with various designs, stora...
详细信息
The hierarchical cooperative resource scheduling architecture provides a promising direction for efficient collaborative processing of edge computing under the dynamic and intricate landscape of the Industrial Interne...
详细信息
In the parameter-server-based distributed deep learning system, the workers simultaneously communicate with the parameter server to refine model parameters, easily resulting in severe network contention. To solve this...
详细信息
ISBN:
(纸本)9781450397339
In the parameter-server-based distributed deep learning system, the workers simultaneously communicate with the parameter server to refine model parameters, easily resulting in severe network contention. To solve this problem, Asynchronous parallel (ASP) strategy enables each worker to update the parameter independently without synchronization. However, due to the inconsistency of parameters among workers, ASP experiences accuracy loss and slow convergence. In this paper, we propose Hybrid Synchronous parallelism (HSP), which mitigates the communication contention without excessive degradation of convergence speed. Specifically, the parameter server sequentially pulls gradients from workers to eliminate network congestion and synchronizes all up-to-date parameters after each iteration. Meanwhile, HSP cautiously lets idle workers to compute with out-of-date weights to maximize the utilizations of computing resources. We provide theoretical analysis of convergence efficiency and implement HSP on popular deep learning (DL) framework. The test results show that HSP improves the convergence speedup of three classical deep learning models by up to 67%.
Joint Nonnegative Matrix Factorization (JointNMF) is a hybrid method for mining information from datasets that contain both feature and connection information. We propose distributed-memory parallelizations of three a...
详细信息
ISBN:
(纸本)9798400700569
Joint Nonnegative Matrix Factorization (JointNMF) is a hybrid method for mining information from datasets that contain both feature and connection information. We propose distributed-memory parallelizations of three algorithms for solving the JointNMF problem based on Alternating Nonnegative Least Squares, Projected Gradient Descent, and Projected Gauss-Newton. We extend well-known communication-avoiding algorithms using a single processor grid case to our coupled case on two processor grids. We demonstrate the scalability of the algorithms on up to 960 cores (40 nodes) with 60% parallel efficiency. The more sophisticated Alternating Nonnegative Least Squares (ANLS) and Gauss-Newton variants outperform the first-order gradient descent method in reducing the objective on large-scale problems. We perform a topic modelling task on a large corpus of academic papers that consists of over 37 million paper abstracts and nearly a billion citation relationships, demonstrating the utility and scalability of the methods.
In recent years, supercomputers have experienced significant advancements in performance and have grown in size, now comprising several thousands nodes. To unlock the full potential of these machines, efficient resour...
详细信息
ISBN:
(纸本)9783031488023;9783031488030
In recent years, supercomputers have experienced significant advancements in performance and have grown in size, now comprising several thousands nodes. To unlock the full potential of these machines, efficient resource management and job scheduling-assigning parallel programs to nodes-are crucial. Traditional job scheduling approaches employ rigid jobs that use the same set of resources throughout their lifetime, resulting in significant resource under-utilization. By employing malleable jobs that are capable of changing their number of resources during execution, the performance of supercomputers has potential to increase. However, designing algorithms for scheduling malleable jobs is challenging since it requires complex strategies to determine when and how to reassign resources among jobs while maintaining fairness. In this work, we extend a recently proposed malleable job scheduling algorithm by introducing new strategies. Specifically, we propose three priority orders to determine which malleable job to consider for resource reassignments and the number of nodes when starting a job. Additionally, we propose three reassignment approaches to handle the delay between scheduling decisions and the actual transfer of resources between jobs. This results in nine algorithm variants. We then evaluate the impact of deploying malleable jobs scheduled by our nine algorithm variants. For that, we simulate the scheduling of job sets containing varying proportions of rigid and malleable jobs on a hypothetical supercomputer. The results demonstrate significant improvements across several metrics. For instance, with 20% of malleable jobs, the overall completion time is reduced by 11% while maintaining high node utilization and fairness.
Cloud computing can be online based network engineering which contributed with a rapid advancement at the progress of communication technological innovation by supplying assistance to clients of assorted conditions wi...
详细信息
ISBN:
(纸本)9781665466431
Cloud computing can be online based network engineering which contributed with a rapid advancement at the progress of communication technological innovation by supplying assistance to clients of assorted conditions with aid from online computing sources. It's terms of hardware and software apps together side software growth testing and platforms applications because tools. Large-scale heterogeneous distributedcomputing surroundings give the assurance of usage of a huge quantity of computing tools in a comparatively low price. As a way to lessen the software development and setup onto such complicated surroundings, high speed parallel programming languages exist which have to be encouraged by complex operating techniques. There are numerous advantages for consumers in terms of cost and flexibility that come with Cloud computing's anticipated uptake. Building on well-established research in Internet solutions, networks and utility computing, virtualization et cetera Service-Oriented Architectures and the Internet of Services (IoS) have implications for a wide range of technological issues such as parallelcomputing and load balancing as well as high availability and scalability. Effective load balancing methods are essential to solving these issues. Since such systems' size and complexity make it impossible to concentrate job execution on a few select servers, a paralleldistributed solution is required. Adaptive task load model is the name of the method wesuggest in our article for balancing the workload (ATLM). We developed an adaptive paralleldistributedcomputing paradigm as a result of this (ADPM). While still maintaining the model's integrity, ADPM employs a more flexible synchronization approach to cut down on the amount of time synchronous operations use. As well as the ATLM load balancing technique, which solves the straggler issue caused by the performance disparity between nodes, ADPM also applies it to ensure model correctness. The results indicate that
In today's urban transportation landscape, the accurate and timely prediction of traffic conditions holds immense significance for optimizing traffic flow and resource allocation. However, conventional centralized...
详细信息
We show that a constant number of self-attention layers can efficiently simulate—and be simulated by—a constant number of communication rounds of Massively parallel Computation, a popular model of distributed comput...
详细信息
We show that a constant number of self-attention layers can efficiently simulate—and be simulated by—a constant number of communication rounds of Massively parallel Computation, a popular model of distributedcomputing with wide-ranging algorithmic results. As a consequence, we show that logarithmic depth is sufficient for transformers to solve basic computational tasks that cannot be efficiently solved by several other neural sequence models and sub-quadratic transformer approximations. We thus establish parallelism as a key distinguishing property of transformers. Copyright 2024 by the author(s)
Floating-point exceptions occurring during numerical computations can be a serious threat to the validity of the computed results if they are not caught and diagnosed. Unfortunately, on NVIDIA GPUs-today's most wi...
详细信息
ISBN:
(纸本)9798400701559
Floating-point exceptions occurring during numerical computations can be a serious threat to the validity of the computed results if they are not caught and diagnosed. Unfortunately, on NVIDIA GPUs-today's most widely used type and which do not have hardware exception traps-this task must be carried out in software. Given the prevalence of closed-source kernels, efficient binary-level exception tracking is essential. It is also important to know how exceptions flow through the code, whether they alter the code's behavior and additionally whether these exceptions can be detected at the program outputs or are killed inside program flow-paths. In this paper, we introduce GPU-FPX, a tool that has low overhead, allows for deep understanding of the origin and flow of exceptions, and also how exceptions are modified by code optimizations. We measure GPU-FPX's performance over 151 widely used GPU programs, detecting 26 serious exceptions that were previously not reported. Our results show that GPU-FPX is 16x faster with respect to the geometric-mean runtime in relation to the only comparable prior tool, while also helping debug a larger class of codes more effectively. GPU-FPX and its benchmarks have been released.
Mobile edge computing can effectively meets the resource requirements of complex vehicle applications with high computing power and low delay. When a vehicle terminal task is offloaded to the edge servers, most works ...
详细信息
暂无评论