To cope with the rapid growth in available data, the efficiency of data analysis and machine learning libraries has recently received increased attention. Although great advancements have been made in traditional arra...
详细信息
ISBN:
(纸本)9781728162515
To cope with the rapid growth in available data, the efficiency of data analysis and machine learning libraries has recently received increased attention. Although great advancements have been made in traditional array-based computations, most are limited by the resources available on a single computation node. Consequently, novel approaches must be made to exploit distributed resources, e.g. distributed memory architectures. To this end, we introduce IleAT, an array-based numerical programming framework for large-scale parallel processing with an easy-to-use NumPy-like API. HeAT utilizes PyTorch as a node-local eager execution engine and distributes the workload on arbitrarily large high-performance computing systems via MPI. It provides both low-level array computations, as well as assorted higher-level algorithms. With HeAT, it is possible for a NumPy user to take full advantage of their available resources, significantly I owering the bartier to distributed data analysis. When compared to similar frameworks, HeAT achieves speedups of up to two orders of magnitude.
Although it is possible to apply traditional optimization algorithms to determine the Pareto front of a multiobjective optimization problem, the computational cost is extremely high, when the objective function evalua...
详细信息
ISBN:
(纸本)9781613997475
Although it is possible to apply traditional optimization algorithms to determine the Pareto front of a multiobjective optimization problem, the computational cost is extremely high, when the objective function evaluation requires solving a complex reservoir simulation problem and optimization cannot benefit from adjoint-based gradients. This paper proposes a novel workflow to solve bi-objective optimization problems using the distributed quasi-Newton (DQN) method, which is a well-parallelized and derivative-free optimization (DFO) method. Numerical tests confirm that the DQN method performs efficiently and robustly. The efficiency of the DQN optimizer stems from a distributedcomputing mechanism which effectively shares the available information discovered in prior iterations. Rather than performing multiple quasi-Newton optimization tasks in isolation, simulation results are shared among distinct DQN optimization tasks or threads. In this paper, the DQN method is applied to the optimization of a weighted average of two objectives, using different weighting factors for different optimization threads. In each iteration, the DQN optimizer generates an ensemble of search points (or simulation cases) in parallel and a set of non-dominated points is updated accordingly. Different DQN optimization threads, which use the same set of simulation results but different weighting factors in their objective functions, converge to different optima of the weighted average objective function. The non-dominated points found in the last iteration form a set of Pareto optimal solutions. Robustness as well as efficiency of the DQN optimizer originates from reliance on a large, shared set of intermediate search points. On the one hand, this set of searching points is (much) smaller than the combined sets needed if all optimizations with different weighting factors would be executed separately;on the other hand, the size of this set produces a high fault tolerance. Even if some simulati
In the aerospace sciences we produce huge amounts of data. This data must be arranged in a meaningful order, so that we can analyze or visualize it. In this paper we focus on data that is distributed among computer pr...
详细信息
ISBN:
(纸本)9783030483401;9783030483395
In the aerospace sciences we produce huge amounts of data. This data must be arranged in a meaningful order, so that we can analyze or visualize it. In this paper we focus on data that is distributed among computer processes and then needs to be sorted by a single root process for further analysis. We assume that the memory on the root process is too small to hold all sorted data at once, so that we have to perform the sorting and processing of data chunk-wise. We prove the efficiency of our approach in weak scaling tests, where we achieve a near constant bandwidth. Additionally, we obtain a considerable speed up compared to the standard parallel external sort. We also demonstrate the usefulness of our algorithm in a real-life aviation application.
Routing is one of the most time-consuming stages in the FPGA design flow. parallelization can accelerate the routing process but suffering from load imbalance, further resulting in a low scalability. In this paper, we...
详细信息
Routing is one of the most time-consuming stages in the FPGA design flow. parallelization can accelerate the routing process but suffering from load imbalance, further resulting in a low scalability. In this paper, we propose a load balance-centric parallel router in a distributedcomputing environment. First, we explore regular and irregular region partitioning so that routing tasks are assigned to different cores for static load balance before parallel routing. Second, we explore message propagation and task migration between underloaded and overloaded cores so that load balance can be dynamically maintained at parallel routing runtime. Finally, we demonstrate the effectiveness of the parallel router using large-scale Titan designs. Experimental results show that our parallel router achieves about 17 × speedup on average using 32 cores, compared with VTR 8 router.
Design a power consumption information acquisition system simulation electric energy meter, use communication technology, computer technology and automatic control technology to monitor and manage the power load compr...
详细信息
ISBN:
(数字)9781665455411
ISBN:
(纸本)9781665455428
Design a power consumption information acquisition system simulation electric energy meter, use communication technology, computer technology and automatic control technology to monitor and manage the power load comprehensive system, collect, process and real-time monitor the power consumption information of power users, realize the use of Automatic collection of electrical information, monitoring of abnormal metering, power quality monitoring, power consumption analysis and management, related information release, distributed energy monitoring, information exchange of intelligent electrical equipment and other functions. The system user interface integrates application logic through services, improves the data coordination ability of the system, effectively reduces the data access load, and improves the system expansion ability through load balancing.
The Large Hadron Collider (LHC) experiments soon step into the next period of run-3 data taking with an increased data rate and high pileup requiring an excellent working computing infrastructure. In the future High-L...
详细信息
The Large Hadron Collider (LHC) experiments soon step into the next period of run-3 data taking with an increased data rate and high pileup requiring an excellent working computing infrastructure. In the future High-Luminosity LHC (HL-LHC) data-taking period, the compute, storage and network facilities have to be further extended by large factors and flexible and sophisticated computing models are essential. New techniques of modern state-of-the-art methods in physics analysis and data science, Deep Learning and Big Data tools, are crucial to handle high-dimensional and more complex problems. Beside flexible cloud computing technologies the usage of High Performance computing (HPC) at the LHC experiments are explored. In this presentation, I will discuss the LHC run-3 and future HL-LHC runs computing technologies and the utilization of modern physics analysis and data science methods for the increasing and complex demands of large-scale scientific computing.
At present, large amount distributed energy storages (DESs) connected to the distribution network lack of effective scheduling methods. An centralized control strategy of DESs with random access and output can be util...
详细信息
In this paper, we present GRAPHTM, an efficient and scalable framework for processing transactions in a distributed environment. The distributed environment is modeled as a graph where each node of the graph is a proc...
详细信息
ISBN:
(纸本)9781450377515
In this paper, we present GRAPHTM, an efficient and scalable framework for processing transactions in a distributed environment. The distributed environment is modeled as a graph where each node of the graph is a processing node that issues transactions. The objects that transactions use to execute are also on the graph nodes (the initial placement may be arbitrary). The transactions execute on the nodes which issue them after collecting all the objects that they need following the data-flow model of computation. This collection is done by issuing the requests for the objects as soon as transaction starts and wait until all required objects for the transaction come to the requesting node. The challenge is on how to schedule the transactions so that two crucial performance metrics, namely (i) total execution time to commit all the transactions, and (ii) total communication cost involved in moving the objects to the requesting nodes, are minimized. We implemented GRAPHTM in Java and assessed its performance through 3 micro-benchmarks and 5 complex benchmarks from STAMP benchmark suite on 5 different network topologies, namely, clique, line, grid, cluster, and star, that make an underlying communication network for a representative set of distributed systems commonly used in practice. The results show the efficiency and scalability of our approach.
parallel and distributed operation of pulsed power network with potential gradient method is confirmed with moderately large scale simulation model. The pulsed power network is already proposed for seamless integratio...
详细信息
ISBN:
(纸本)9781728116648
parallel and distributed operation of pulsed power network with potential gradient method is confirmed with moderately large scale simulation model. The pulsed power network is already proposed for seamless integration of distributed generations. PG method brings the scalability on the network. To confirm the scalability of this power grid and autonomous clustering around the generations, computer simulations are executed.
Today's supercomputers offer massive computation resources to execute a large number of user jobs. Effectively managing such large-scale hardware parallelism and workloads is essential for supercomputers. However,...
详细信息
ISBN:
(纸本)9781665454452
Today's supercomputers offer massive computation resources to execute a large number of user jobs. Effectively managing such large-scale hardware parallelism and workloads is essential for supercomputers. However, existing HPC resource management (RM) systems fail to capitalize on the hardware parallelism by following a centralized design used decades ago. They give poor scalability and inefficient performance on today's supercomputers, which will worsen in exascale computing. We present ESlurm, a better RM for supercomputers. As a departure from existing HPC RMs, ESlurm implements a distributed communication structure. It employs a new communication tree strategy and uses job runtime estimation to improve communications and job scheduling efficiency. ESlurm is deployed into production in a real supercomputer. We evaluate ESlurm on up to 20K nodes. Compared to state-of-the-art RM solutions, ESlurm exhibits better scalability, significantly reducing the resource usage of master nodes and improving data transfer and job scheduling efficiency by a large margin.
暂无评论