In this paper, we evaluate a partitioning and placement technique for mapping concurrent applications over a globally asynchronous locally synchronous (GALS) multi-core architecture designed for simulating a spiking n...
详细信息
ISBN:
(纸本)9781538655641
In this paper, we evaluate a partitioning and placement technique for mapping concurrent applications over a globally asynchronous locally synchronous (GALS) multi-core architecture designed for simulating a spiking neural network (SNN) in real-time. We designed a task placement pipeline capable of analysing the network of neurons and producing a placement configuration that enables a reduction of communication between computational nodes. The neuron-to-core mapping problem has been formalised as a two phases problem: Partitioning and Placement. The Partitioning phase aims at grouping together the most connected network components, maximising the amount of self-connections within each identified group. For this purpose we used a multilevel k-way graph partitioning strategy capable of generating network-partitions. The Placement phase aims at placing groups of neurons over the chip mesh minimising the communication between computational nodes. For implementing this step, we designed and evaluate the performances of three placement variants. In the results, we point out the importance of using a partitioning algorithm for the SNN graph. We were able to achieve an increase in self-connections of 19% and an improvement of the final overall post-placement synaptic elongation of 29% using the simulated annealing placement technique, compared to 22% obtained without partitioning.
In this paper we investigate the relation between energy efficiency model and workload type executed in modern embeddedarchitectures. From the energy efficiency model obtained in our previous work we select a few con...
详细信息
ISBN:
(纸本)9781538649756
In this paper we investigate the relation between energy efficiency model and workload type executed in modern embeddedarchitectures. From the energy efficiency model obtained in our previous work we select a few configuration points to verify that the prediction in terms of relative energy efficiency is maintained through different workload scenarios. A configuration point is defined as a set of platform tunable metrics, such as DVFS point, DPM level and utilization rate. As workloads, we use a combination of synthetic generators and real world applications from the embedded domain. In our experiments we use two different architectures for testing the model generality, which provide examples of real systems. First we have a comparison of the efficiency obtained by the two architecturally different chips (ARM and INTEL) in different configuration points and different workload scenarios. Second we try to explain the different results through the thermal management done by the two different chips. At the end we show that only in the case of workloads highly composed by integer instructions the results from the two architectures converge and show the need for a specific model trained with integer operations.
Model-based and simulation-supported engineering based on the formalism of synchronous block diagrams is among the best practices in software development for embedded and real-time systems. As the complexity of such m...
详细信息
ISBN:
(数字)9781728132839
ISBN:
(纸本)9781728120522
Model-based and simulation-supported engineering based on the formalism of synchronous block diagrams is among the best practices in software development for embedded and real-time systems. As the complexity of such models and the associated computational demands for their simulation steadily increase, efficient execution strategies are needed. Although there is an inherent concurrency in most models, tools are not always capable of taking advantage of multi-core architectures of simulation host computers to simulate blocks in parallel. In this paper, we outline the conceptual obstacles in general and discuss them specifically for the widely used simulation environment Simulink. We present an execution mechanism that harnesses multi-core hosts for accelerating individual simulation runs through parallelization. The approach is based on a model transformation. It does not require any changes in the simulation engine, but introduces minimal data propagation delays in the simulated signal chains. We demonstrate its applicability in an automotive case study.
The Unified modeling Language (UML) has been widely adopted for modeling different sorts of applications. Despite having several kinds of diagrams, they were not designed verifying the execution of real-time embedded ...
详细信息
ISBN:
(纸本)9781538658789
The Unified modeling Language (UML) has been widely adopted for modeling different sorts of applications. Despite having several kinds of diagrams, they were not designed verifying the execution of real-time embeddedsystems with time and energy constraints. There are UML profiles that capture this information, but it is necessary to rely on a separated validation framework. The main approach to fill this gap is to translate UML the models into representations such as Petri nets. However, existing works have little support for addressing energy and time constraints at the same time. This paper presents a technique for transforming UML sequence diagrams with energy and time constraints into timed Petri net models. These Petri net models are then used as input into software verification tools like Tina and GTT.
The proceedings contain 13 papers. The topics discussed include: a metric for evaluating supercomputer performance in the era of extreme heterogeneity;evaluating SLURM simulator with real-machine SLURM and vice versa;...
ISBN:
(纸本)9781728101828
The proceedings contain 13 papers. The topics discussed include: a metric for evaluating supercomputer performance in the era of extreme heterogeneity;evaluating SLURM simulator with real-machine SLURM and vice versa;automated instruction stream throughput prediction for Intel and AMD microarchitectures;deep learning at scale on NVIDIA V100 accelerators;algorithm selection of MPI collectives using machine learning techniques;miniVite: a graph analytics benchmarking tool for massively parallel systems;and improving MPI reduction performance for manycore architectures with OpenMP and data compression.
Discrete manufacturing systems are complex cyberphysical systems (CPS) and their availability, performance, and quality have a big impact on the economy. Smart manufacturing promises to improve these aspects. One key ...
详细信息
ISBN:
(纸本)9781538653012
Discrete manufacturing systems are complex cyberphysical systems (CPS) and their availability, performance, and quality have a big impact on the economy. Smart manufacturing promises to improve these aspects. One key approach that is being pursued in this context is the creation of centralized software-defined control (SDC) architectures and strategies that use diverse sensors and data sources to make manufacturing more adaptive, resilient, and programmable. In this paper, we present SDCWorks-a modeling and simulation framework for SDC. It consists of the semantic structures for creating models, a baseline controller, and an open source implementation of a discrete event simulator for SDCWorks models. We provide the semantics of such a manufacturing system in terms of a discrete transition system which sets up the platform for future research in a new class of problems in formal verification, synthesis, and monitoring. We illustrate the expressive power of SDCWorks by modeling the realistic SMART manufacturing testbed of University of Michigan. We show how our open source SDCWorks simulator can be used to evaluate relevant metrics (throughput, latency, and load) for example manufacturing systems.
Vector extensions are a popular mean to exploit data parallelism in applications. Over recent years, the most commonly used extensions have been growing in vector length and amount of vector instructions. However, cod...
详细信息
ISBN:
(数字)9781728144849
ISBN:
(纸本)9781728144856
Vector extensions are a popular mean to exploit data parallelism in applications. Over recent years, the most commonly used extensions have been growing in vector length and amount of vector instructions. However, code portability remains a problem when speaking about a compute continuum. Hence, vector length agnostic (VLA) architectures have been proposed for the future generations of ARM and RISC-V processors. With these architectures, code is vectorized independently of the vector length of the target hardware platform. It is therefore possible to tune software to a generic vector length. To understand the performance impact of VLA code compared to vector length specific code, we analyze the current capabilities of code generation for ARM's SVE architecture. Our experiments show that VLA code reaches about 90% of the performance of vector length specific code, i.e. a 10% overhead is inferred due to global predication of instructions. Furthermore, we show that code performance is not increasing proportionally with increasing vector lengths due to the higher memory demands.
Human supervisory control (HSC) is a widely used knowledge-based control scheme, in which human operators are in charge of planning and making high-level decisions for systems with embedded autonomy. With the variabil...
Human supervisory control (HSC) is a widely used knowledge-based control scheme, in which human operators are in charge of planning and making high-level decisions for systems with embedded autonomy. With the variability of operators' behaviors in such systems, the stability of an operator modeling technique, i.e., that a modeling approach produces similar results across repeated applications, is critical to the extensibility and utility of such a model. Using an unmanned vehicle simulation testbed where such vehicles can be hacked, we compared two operator behavioral models from two different experiments using a hidden Markov modeling (HMM) approach. The resulting HMM models revealed operators' dominant strategies when conducting hacking detection tasks. The similarity between these two models was measured via multiple aspects, including model structure, state distribution, divergence distance, and co-emission probability distance. The similarity measure results demonstrate the stability of modeling human operators in HSC scenarios using HMM models. These results indicate that even when operators perform differently on specific tasks, such an approach can reliably detect whether strategies change across different experiments.
An accurate prediction of scheduling and execution of instruction streams is a necessary prerequisite for predicting the in-core performance behavior of throughput-bound loop kernels on out-of-order processor architec...
详细信息
暂无评论