In recent years, Neural Networks (NNs) have become one of the most prevailing topics in computers science, both in research and in industry. NNs are used for data analysis, natural language processing, autonomous driv...
详细信息
ISBN:
(纸本)9798400713965
In recent years, Neural Networks (NNs) have become one of the most prevailing topics in computers science, both in research and in industry. NNs are used for data analysis, natural language processing, autonomous driving and more. As such, NNs also see more application and use in High-Performance computing (HPC). At the same time, energy efficiency has become an increasingly critical topic. NNs use large amounts of energy for operation, which in return results in large amounts of CO2 emissions. This work presents a comprehensive evaluation of current NN inference soft- and hardware configurations within High-Performance computing (HPC) environments, with a focus on both performance metrics and energy consumption. NN quantization and accelerators such as FPGAs allow for an increased inference efficiency, both in terms of throughput and energy. Therefore, this work focuses on FINN, an efficient NN inference framework for FPGAs, highlighting its current lack of support for HPC systems. We provide an in-depth analysis of FINN in order to implement extensions to optimize the end-to-end execution for the usage in the HPC environment. We thoroughly evaluate the performance and energy efficiency gains using newly implemented optimizations and compare it against existing NN accelerators for HPC. With our extensions of FINN, we were able to achieve a 1847× higher throughput, while also decreasing the latency on average by 0.9978× and EDP by 0.9979× on an Alveo U55C FPGA. Data flow based NN inference accelerators on an FPGA should be used if the performance and energy footprint of the inference process is crucial, and the batch sizes are small to medium. For extremely large batch sizes and a very limited time for network-to-accelerator (less than a few days), using GPUs is still the way to go. Our results show that with the newly developed driver, we outperform a high-end Nvidia A100 GPU by up to 7.81x in throughput, while having a 0.87x lower latency and 0.88x lower energy de
Utilizing a collection of workstations and supercomputers in a metacomputing environment does not only offer an enormous amount of computing power, but also raises new problems. The true potential of WAN-based distrib...
详细信息
Utilizing a collection of workstations and supercomputers in a metacomputing environment does not only offer an enormous amount of computing power, but also raises new problems. The true potential of WAN-based distributed computing can only be exploited if the application-to-architecture mapping reflects the different processor speeds, network performances and the application's communication characteristics. In this paper, we present the Metacomputer Adaptive Runtime System (MARS), a framework for minimizing the execution time of distributed applications on a WAN metacomputer. Work-load balancing and task migration is based on dynamic information on the processor load and network performance. Moreover, MARS uses accumulated statistical data on previous execution runs of the same application to derive an improved task-to-process mapping. Migration decisions are based on: (1) the current system load;(2) the network load;and (3) previously obtained application-specific characteristics. Our current implementation supports C applications with MPI message passing calls, but the general framework is also applicable to other programing environments like PVM, PARMACS and Express.
computing resources which are transparently available to the user via networked environments are commonly called a metacomputer. In this sense, a metacomputer is a network of heterogeneous, computational resources lin...
详细信息
computing resources which are transparently available to the user via networked environments are commonly called a metacomputer. In this sense, a metacomputer is a network of heterogeneous, computational resources linked by software in such a way that they can be used as easily as a single computational unit. During the last few years our work has been concentrated on developing methods and tools to provide a transparent and vendor independent hardware management system to the users. Solving this problem up to a high abstraction level will bring the idea of metacomputing a large step closer to its fruition. After reviewing the metacomputing approaches in Europe and the States, we will break down the task force into almost independent units. One of these, the resource access and allocation problem, the project computingcenter Software was focused on. This paper takes a closer look at CCS. Its underlying model which uses abstract views for specifying system components and the general purpose Resource Description Language will be sketched. We will explain how it is possible to support Wide-Area Network access and unstable connection lines. Afterwards, we will present the system and vendor independent batch processing facility usable for arbitrary programming environments. On-going activities and an enhancement of the CCS methodology to solve a core problem in wide-area metacomputing will conclude this paper.
GO is a very popular board game, especially in the Asian world. In contrast to chess programs that are able to compete with human top players, GO programs are still rather weak. Game theory classifies GO and chess as ...
详细信息
ISBN:
(纸本)1601320647
GO is a very popular board game, especially in the Asian world. In contrast to chess programs that are able to compete with human top players, GO programs are still rather weak. Game theory classifies GO and chess as deterministic two-person zero-sum games with perfect information, which allows to address them with game tree search techniques such as the α/β algorithm. In principle, these games can be solved exactly. Practically, the high number of possible moves and the depth of the search tree prohibit exact solutions and require us to resort to a partial analysis of the search tree, leading to runtime consuming heuristic position evaluations. This paper presents the GOmputer project which aims at accelerating GO through aggressively parallelized game tree search combined with FPGA-based position evaluation. We first briefly discuss the algorithmic approach for playing GO, and then focus on FPGA accelerators for several position evaluation functions. The game board is mapped as a cellular automa on directly into hardware;position evaluation functions are turned into cellular algorithms. We show the hardware implementation of several functions and report on the achieved speedups. Finally, we discuss the current state of the GOmputer project.
SLAs were developed in order to guarantee the customer's desired Quality of Service. To prepossess SLAs even in the case of system failures, migrating the job to an alternative resource is a well-known fault-toler...
详细信息
Risk management (RM) processes are used in various application fields since often possible threats should be identified, evaluated, and avoided. In the Grid resource failures are common and likely threats which slow d...
详细信息
In large-scale distributed systems, information is typically generated decentralized. However, for many applications it is desirable to have a unified view on this knowledge, allowing to query it without regarding the...
详细信息
We present a scheme that derives task migration decisions for a WAN-metacomputer environment based on previously acquired information on a program's runtime behavior and the current network and computing load. Our...
暂无评论