Many medical control devices used in case of critical patients have model predictive controllers (MPC). MPC estimate the drug level in the parts of patients body based on their human physiology model to either alarm t...
详细信息
ISBN:
(纸本)9781479959754
Many medical control devices used in case of critical patients have model predictive controllers (MPC). MPC estimate the drug level in the parts of patients body based on their human physiology model to either alarm the medical authority or change the drug infusion rate. This model prediction has to be completed before the drug infusion rate is changed i.e. every few seconds. Instead of mathematical models like the Pharmacokinetic models more accurate models such as spatio-temporal drug diffusion can be used for improving the prediction and prevention of drug overshoot and undershoot. However, these models require high computation capability of platforms like recent many core GPUs or Intel Xeon Phi (MIC) or IntelCore i7. This work explores thread level and data level parallelism and computation versus communication times of such different model predictive applications used in multiple patient monitoring in hospital data centers exploiting the many core platforms for maximizing the throughput (i.e. patients monitored simultaneously). We also study the energy and performance of these applications to evaluate them for architecture suitability. We show that given a set of MPC applications, mapping on heterogeneous platforms can give performance improvement and energy savings.
Many-core CPUs and GPUs, present mainstream architectures for HPC, are facing difficulty in maintaining the same performance improvement rate because of the recent slow-down in the semiconductor scaling, the dark sili...
详细信息
ISBN:
(数字)9781665453363
ISBN:
(纸本)9781665453363
Many-core CPUs and GPUs, present mainstream architectures for HPC, are facing difficulty in maintaining the same performance improvement rate because of the recent slow-down in the semiconductor scaling, the dark silicon problem, and wasteful mechanisms required for accelerating general-purpose computing such as a branch predictor and an out-of-order mechanism. Also, the power efficiency of HPC systems is significantly important to achieve higher performance.
Four hypercube architectures that are designed to use hardware resources more efficiently and that produce computers with high throughput and high reliability are evaluated. Spare nodes in three of the architectures a...
详细信息
ISBN:
(纸本)0818621508
Four hypercube architectures that are designed to use hardware resources more efficiently and that produce computers with high throughput and high reliability are evaluated. Spare nodes in three of the architectures are configured so that the entire computer has the topology of an incomplete hypercube. Here, the nodes of an incomplete hypercube are capable of providing different levels of fault detection, hardware reconfiguration, and routing. In the other architecture, the hypercube topology uses conventional switches capable only of establishing connections. End-of-mission dependability models and performance simulation models were developed. Results of performance degradation studies of the four architectures under reconfiguration in terms of throughput, response time, and communication utilization are presented for three workloads. The evaluations addressed performance-related dependability based on hardware failures and reconfiguration using hardware.
Being comprised of resource-constrained edge devices, live migration is a necessary feature in edge clusters for migrating state of an entire machine in case of machine failures and network partitions without disrupti...
详细信息
ISBN:
(纸本)9781728195865
Being comprised of resource-constrained edge devices, live migration is a necessary feature in edge clusters for migrating state of an entire machine in case of machine failures and network partitions without disrupting continued availability of services. While most of the prior work in this area has provided solutions for live migration on clusters comprised of resource-rich servers or fog servers with highcomputing power, there is a general lack of research in live migration on the low-end ARM devices comprised in edge clusters. To that end, we propose a lightweight algorithm for performing live migration on resource constrained edge clusters. We provide an open source implementation of the above algorithm for migrating containers in Linux. We demonstrate that our algorithm outperforms state-of-the-art live migration algorithms on resource constrained edge clusters with network partitions and device failures.
Discusses a method of handling the configuration of kernels and ramdisk images for compute nodes in an OSCAR-integrated way. Providing a kernel and ramdisk which differs from the server allows to handle specific hardw...
详细信息
A design approach is proposed to automatically identify and exploit run-time reconfiguration opportunities while optimising resource utilisation. We introduce Reconfiguration Data Flow Graph, a hierarchical graph stru...
详细信息
ISBN:
(纸本)9780769549699;9781467360050
A design approach is proposed to automatically identify and exploit run-time reconfiguration opportunities while optimising resource utilisation. We introduce Reconfiguration Data Flow Graph, a hierarchical graph structure enabling reconfigurable designs to be synthesised in three steps: function analysis, configuration organisation, and run-time solution generation. Three applications, based on barrier option pricing, particle filter, and reverse time migration are used in evaluating the proposed approach. The run-time solutions approximate the theoretical performance by eliminating idle functions, and are 1.31 to 2.19 times faster than optimised static designs. FPGA designs developed with the proposed approach are up to 28.8 times faster than optimised CPU reference designs and 1.55 times faster than optimised GPU designs.
In this paper, we seek to guide optimization and tuning strategies by identifying the application's I/O access pattern. We evaluate three machine learning techniques to automatically detect the I/O access pattern ...
详细信息
ISBN:
(纸本)9781728141947
In this paper, we seek to guide optimization and tuning strategies by identifying the application's I/O access pattern. We evaluate three machine learning techniques to automatically detect the I/O access pattern of HPC applications at runtime: decision trees, random forests, and neural networks. We focus on the detection using metrics from file-level accesses as seen by the clients, I/O nodes, and parallel file system servers. We evaluated these detection strategies in a case study in which the accurate detection of the current access pattern is fundamental to adjust a parameter of an I/O scheduling algorithm. We demonstrate that such approaches correctly classify the access pattern, regarding file layout and spatiality of accesses - into the most common ones used by the community and by I/O benchmarking tools to test new I/O optimization - with up to 99% precision. Furthermore, when applied to our study case, it guides a tuning mechanism to achieve 99% of the performance of an Oracle solution.
Extreme-scale computing systems are required to solve some of the grand challenges in science and technology. From astrophysics to molecular biology, supercomputers are an essential tool to accelerate scientific disco...
详细信息
ISBN:
(纸本)9781728141947
Extreme-scale computing systems are required to solve some of the grand challenges in science and technology. From astrophysics to molecular biology, supercomputers are an essential tool to accelerate scientific discovery. However, large computing systems are prone to failures due to their complexity. It is crucial to develop an understanding of how these systems fail to design reliable supercomputing platforms for the future. This paper examines a five-year failure and workload record of a leadership-class supercomputer. To the best of our knowledge, five years represents the vast majority of the lifespan of a supercomputer. This is the first time such analysis is performed on a top 10 modern supercomputer. We performed a failure categorization and found out that: i) most errors are GPU-related, with roughly 37% of them being double-bit errors on the cards;ii) failures are not evenly spread across the physical machine, with room temperature presumably playing a major role;and iii) software errors of the system bring down several nodes concurrently. Our failure rate analysis unveils that: i) the system consistently degrades, being at least twice as reliable at the beginning, compared to the end of the period;ii) Weibull distribution closely fits the mean-time-between-failure data;and iii) hardware and software errors show a markedly different pattern. Finally, we correlated failure and workload records to reveal that: i) failure and workload records are weakly correlated, except for certain types of failures when segmented by the hours of the day;ii) several categories of failures make jobs crash within the first minutes of execution;and iii) a significant fraction of failed jobs exhaust the requested time with a disregard of when the failure occurred during execution.
The proceedings contain 45 papers. The topics discussed include: Denseflex: a low rank factorization methodology for adaptable dense layers in DNNs;a lightweight architecture for real-time neuronal-spike classificatio...
ISBN:
(纸本)9798400705977
The proceedings contain 45 papers. The topics discussed include: Denseflex: a low rank factorization methodology for adaptable dense layers in DNNs;a lightweight architecture for real-time neuronal-spike classification;PEARL: enabling portable, productive, and high-performance deep reinforcement learning using heterogeneous platforms;mini-batching with fused training and testing for data streams processing on the edge;energy-aware IoT deployment planning;register blocking: an analytical modelling approach for affine loop kernels;hardware support for balanced co-execution in heterogeneous processors;HLS taking flight: toward using high-level synthesis techniques in a space-borne instrument;an ANN-guided multi-objective framework for power-performance balancing in HPC systems;and clustering and allocation of spiking neural networks on crossbar-based neuromorphic architecture.
Exponential increase and global access to read/write memory states in quantum computing simulation limit both the number of qubits and quantum transformations that can be currently simulated. Although quantum computin...
详细信息
ISBN:
(纸本)9781538648193
Exponential increase and global access to read/write memory states in quantum computing simulation limit both the number of qubits and quantum transformations that can be currently simulated. Although quantum computing simulation is parallel by nature, spatial and temporal complexity are major performance hazards, making this an important application for HPC. A new methodology employing reduction and decomposition optimizations has shown great results, but its GPU implementation could be further improved. In this work, we intend to do a new implementation for in-situ GPU simulation that better explores its resources without requiring further HPC hardware. Shor's and Grover's algorithms are simulated and compared to the previous version and to LIQUi vertical bar >'s simulator, showing better results with relative speedups up to 15.5x and 765.76x respectively.
暂无评论