The proceedings contain 13 papers. The special focus in this conference is on Performance modeling, Benchmarking and simulation of High Performance Computing systems. The topics include: Performance and energy usage o...
ISBN:
(纸本)9783319729701
The proceedings contain 13 papers. The special focus in this conference is on Performance modeling, Benchmarking and simulation of High Performance Computing systems. The topics include: Performance and energy usage of workloads on KNL and haswell architectures;A survey of application memory usage on a national supercomputer: An analysis of memory requirements on ARCHER;Comparison of parallelisation approaches, languages, and compilers for unstructured mesh algorithms on GPUs;periodic I/O scheduling for super-computers;A performance study of quantum ESPRESSO’s PWscf Code on multi-core and GPU systems;modeling large compute nodes with heterogeneous memories with cache-aware roofline model;A scalable analytical memory model for CPU performance prediction;modeling UGAL on the dragonfly topology;resilient N-body tree computations with algorithm-based focused recovery: Model and performance analysis;multi-fidelity Surrogate modeling for Application/Architecture Co-design.
Becoming a ubiquitous part of a huge number of various applications, image processing algorithms and underling architectures have to meet many different requirements. Some have real-time performance constraints combin...
详细信息
ISBN:
(纸本)9783319749471;9783319749464
Becoming a ubiquitous part of a huge number of various applications, image processing algorithms and underling architectures have to meet many different requirements. Some have real-time performance constraints combined with demands on efficient implementation for limited or various hardware resources. This poses particular challenges for design, implementation, and evaluation of efficient image processing systems. In this paper, we present a model-based approach to address these issues using our framework SimTAny. Founded on the standard modeling language UML, we propose the UML Image Proccessing Language (UIPL) to facilitate expressing image processing application algorithms directly in UML, which is especially beneficial for rapid modeling. With the help of SimTAny, such design models can be simulated in order to investigate the performance of a modeled system, to determine optimal design solutions, and to validate the required properties. We extend SimTAny to enable the generation of efficient implementation code of image processing algorithms for different target architectures. The code generated is then directly integrated in the simulation environment to increase the accuracy of our performance evaluations.
Practical deep learning applications require more and more computing power. New computing architectures emerge, specifically designed for the artificial intelligence applications, including the IBM Power System AC922....
详细信息
ISBN:
(数字)9781728144849
ISBN:
(纸本)9781728144856
Practical deep learning applications require more and more computing power. New computing architectures emerge, specifically designed for the artificial intelligence applications, including the IBM Power System AC922. In this paper we confront an AC922 (8335-GTG) server equipped with 4 NVIDIA Volta V100 GPUs with selected deep neural network training applications, including four convolutional and one recurrent model. We report performance results depending on batch sizes and GPU selection and compare them with the results from another contemporary workstation based on the same set of GPUs - NVIDIA® DGX Station ™ . The results show that the AC922 performs better in all tested configurations, achieving improvements up to 10.3%. Profiling indicates that the improvement is due to the efficient I/O pipeline. The performance differences depend on the specific model, rather than on the model class (RNN/CNN). Both systems offer good scalability up to 4 GPUs. In certain cases there is a significant difference in performance depending on exactly which GPUs are used for computations.
CPU-GPU platforms possess the potential of enhancing the performance of applications through some unique and diverse capabilities of both CPU-GPU devices. As a result, the methodologies for CPU/GPU system design space...
详细信息
ISBN:
(数字)9781728144849
ISBN:
(纸本)9781728144856
CPU-GPU platforms possess the potential of enhancing the performance of applications through some unique and diverse capabilities of both CPU-GPU devices. As a result, the methodologies for CPU/GPU system design space exploration for various applications are now considerably more challenging on these heterogeneous platforms. In this paper, we present a heuristic algorithm for partitioning the computation of applications between a CPU and GPU, while satisfying the user-defined constraints. Our methodology leverages the SIMD-related computing and hierarchical memory model of GPUs to optimize application mapping and allocation to CPU-GPU systems. The algorithm partitions the application, which is specified as a Directed Acyclic Graph (DAG), for a CPU-GPU platform to meet the objectives specified by the user. The effectiveness of our methodology is demonstrated by efficiently partitioning and executing MJPEG decoder and benchmark applications on a CPU-GPU system.
computersystems have been a source of high power dissipation either from microprocessors, memory and support chips and mass storage. A high thermal density data center using these computersystems has resulted in a v...
详细信息
For the problems related to market equilibrium in complex market environments, analyses are conducted in the past, using some mathematical models and the game theory. These methods are based on the economic structural...
详细信息
For the problems related to market equilibrium in complex market environments, analyses are conducted in the past, using some mathematical models and the game theory. These methods are based on the economic structural equations themselves, ignoring the interactions between economic subjects, and the hypothesis of subject homogeneity has no reference in the real world. On contrast, this paper proposes a multi-agent simulation model, from the microscopic point of view. In such simulation, agents interact with each other, and the decisions are made by agent-embedded AI systems, the Q-network. Therefore, there is no need to elaborate the behavioral rule for each agent, or manually set up too many assumptions. This paper assumes that the simulated market operates in a hypothetical way, in which there are two types of economic entities, namely, banks and enterprises. Banks and enterprises lending behaviors lead to a symbiotic relationship between the banks and the enterprises, while business-to-business transactions make the enterprises symbiotically compete with each other. In the experiment, the observed behavior of each agent can be reasonably explained. Agents endogenously generate intelligent behavioral patterns compatible with the environment. Therefore, this AI-based method can replace the artificially designated decision-making strategy in simulations of market, thus facilitating related economic researches.
MPI reductions are widely used in many scientific applications and often become the scaling performance bottleneck. When performing reductions on vectors, different algorithms have been developed to balance messaging ...
详细信息
This paper gives (i) an overview of the reconfiguration capabilities of modern hardware devices and (ii) a summary and comparison of hardware architectures that use random reconfiguration as a countermeasure against s...
详细信息
ISBN:
(纸本)9781538634370
This paper gives (i) an overview of the reconfiguration capabilities of modern hardware devices and (ii) a summary and comparison of hardware architectures that use random reconfiguration as a countermeasure against side-channel attacks. We categorize the architectures according to their suitability for specic hardware platforms. Further, we compare the reconfiguration methods and the attacks that the countermeasures protect against. Although the presented randomization countermeasures can usually be applied to a broad range of algorithms, evaluation results are presented for specific cryptographic algorithms. In most cases, randomization countermeasures can be combined with other countermeasures that are tailored to specific algorithms.
The SIGNAL is a high-level synchronous data-flow language for the design and implementation of safety-critical embeddedsystems. It provides a unified framework for specification, modeling, formal analysis, and automa...
详细信息
ISBN:
(数字)9781728158235
ISBN:
(纸本)9781728158242
The SIGNAL is a high-level synchronous data-flow language for the design and implementation of safety-critical embeddedsystems. It provides a unified framework for specification, modeling, formal analysis, and automatic code generation for different general-purpose languages like Java, C, and C++. However, fully implemented and verified open source tool for code generation from SIGNAL to Hardware Description Language (HDL) is not available. This paper describes the formal verification of the generated Verilog code from the SIGNAL language. Proving the correctness of generated code is very important when it is for safety-critical embeddedsystems. We use the translation validation technique for verifying the correctness of the generated code. In this approach, the Polychrony Toolset builds the models of source SIGNALprograms with its associated model checker SIGALI. The open source tool Yosys generates models for target Verilog programs in the SMT-LIB standard format. We transform the model generated by Yosys to the model accepted by the SIGALI model checker. Finally, we use the SIGALI model checker to validate the translation by symbolic simulation between both source and target program models. The target program may have fewer behaviors than the source program therefore if the model of the target program implies the model of the source program, it means the target program preserves the semantics of the source program, and the translation is correct.
POSTER PAPER Data-centers are commonly used by most important cloud providers worldwide in order to provide storage and computing resources, and, based on these resources, advanced IT services and applications. With t...
详细信息
ISBN:
(数字)9781728144849
ISBN:
(纸本)9781728144856
POSTER PAPER Data-centers are commonly used by most important cloud providers worldwide in order to provide storage and computing resources, and, based on these resources, advanced IT services and applications. With the expected explosion of data in the next few years, Data-centers will require new architectures to cope with the new requirements of applications and users. One of the crucial subsystems within the Data-center architecture that must evolve accordingly to the new requirements is the interconnection network or Data-center network (DCN). The DCN performance (basically, high communication bandwidth and low latency) must be guaranteed, otherwise the DCN becoming the system bottleneck. There are several key issues that DCN designers must make decisions on, such as the network topology, routing algorithm, congestion management, etc. An important aspect that impacts on the DCN design are the network communication patterns generated by applications and services. In that sense, an accurate modeling of these traffic workloads would help network designers to make better decisions. In this paper, we present an analysis of the few available studies on traffic modeling for DCNs, in order to gather a set of parameters that define the behaviour of common traffic workloads. Based on these parameters, we have implemented a synthetic DCN traffic generator, which has been included in our simulation framework in order to feed the network with the inferred traffic workloads. We have conducted extensive simulations to test the impact of the parameter variation on the network performance. From the obtained results, we can conclude that the destination distribution is crucial for the network performance. Higher oversubscription of destinations generates incast scenarios that lead to congestion situations and head-of-line blocking, affecting other flows that do not contribute to the incast situation and so spoiling the network performance.
暂无评论