In this paper we present a Multithreaded programming methodology for multi-core systems that utilizes Data-Flow concurrency. The programmer augments the program with macros that define threads and their data dependenc...
详细信息
With the Internet-of-things revolution, embedded devices are in charge of an ever increasing number of tasks ranging from sensing, up to Artificial Intelligence (AI) functions. In particular, AI is gaining importance ...
详细信息
ISBN:
(纸本)9783030609399;9783030609382
With the Internet-of-things revolution, embedded devices are in charge of an ever increasing number of tasks ranging from sensing, up to Artificial Intelligence (AI) functions. In particular, AI is gaining importance since it can dramatically improve the QoS perceived by the final user and it allows to cope with problems whose algorithmic solution is hard to find. However, the associated computational requirements, mostly made of floating-point processing, impose a careful design and tuning of the computing platforms. In this scenario, there is a need for a set of benchmarks representative of the emerging AI applications and useful to compare the efficiency of different architectural solutions and computing platforms. In this paper we present a suite of benchmarks encompassing computer Graphics, computer Vision and Machine Learning applications, which are greatly used in many AI scenarios. Such benchmarks, differently from other suites, are kernels tailored to be effectively executed in bare-metal and specifically stress the floating-point support offered by the computing platform.
Model-Driven Development (MDD) is a software development paradigm that promotes the use of models at different levels of abstraction and perform transformations between them to derive one or more concrete application ...
详细信息
ISBN:
(纸本)3540364102
Model-Driven Development (MDD) is a software development paradigm that promotes the use of models at different levels of abstraction and perform transformations between them to derive one or more concrete application implementations. In this paper we analyze the current status of MDD regarding its applicability for the development of Real-Time embedded Software. We discuss different modeling framework approaches used to specify the various models, and compare OMG/MDA-based approaches (MOF, UML Profiles and executable UML) with a generic MDD-based approach (GME). Finally, we identify the key challenges for future MDD research in order to successfully apply MDD within RTES Development. These challenges are mainly situated in the field of modeling and standardization of abstraction levels, model transformations and code generation, traceability, and integration of existing software within the MDD development process
simulation tools are indispensable to computer architects. Detailed execution-driven CPU models offer high accuracy, but at the cost of simulation speed. Trace-driven simulation is widely adopted to alleviate this pro...
详细信息
ISBN:
(纸本)9781509030767
simulation tools are indispensable to computer architects. Detailed execution-driven CPU models offer high accuracy, but at the cost of simulation speed. Trace-driven simulation is widely adopted to alleviate this problem, especially for studies focusing on memory-system exploration. Ideally, trace-driven core models will mimic out-of-order processors executing full-system workloads to enable computer architects to evaluate modern systems. Additionally, to be useful to the broader community the tracing and replay models should be publicly available. However, existing trace-driven approaches are limited in their applicability and availability. We propose elastic traces in which we accurately capture data and load/store order dependencies by instrumenting a detailed out-of-order processor model. In contrast to existing work, we do not rely on offline analysis of timestamps, and instead use accurate dependency information tracked inside the processor pipeline. We thereby account for the effects of speculation and branch misprediction resulting in a more accurate trace playback. We provide a trace player that honours the dependencies and thus adapts its execution time to memory-system changes, as would the actual CPU. Compared to the detailed CPU, our trace player achieves a speed-up of 6-8 times. When modifying the memory-system parameters, the average error in absolute execution time is 7% for SPEC 2006 benchmarks on a bare metal system and 17% for HPC benchmarks on Linux. Relative performance is predicted with less than 3% error, achieving fast and accurate system performance exploration. We make this functionality available to the broader community via a widely-used open source full-system simulator.
Geometric Algebra (GA), a generalization of quaternions, is a very powerful form for intuitively expressing and manipulating complex geometric relationships common to engineering problems. The actual evaluation of GA ...
详细信息
A common problem when developing signal processing applications is to expose and exploit parallelism in order to improve both throughput and latency. Many programming paradigms and models have been introduced to serve...
详细信息
ISBN:
(纸本)9783030275624;9783030275617
A common problem when developing signal processing applications is to expose and exploit parallelism in order to improve both throughput and latency. Many programming paradigms and models have been introduced to serve this purpose, such as the Synchronous DataFlow (SDF) Model of Computation (MoC). SDF is used especially to model signal processing applications. However, the main difficulty when using SDF is to choose an appropriate granularity of the application representation, for example when translating imperative functions into SDF actors. In this paper, we propose a method to model the parallelism of perfectly nested for loops with any bounds and explicit parallelism, using SDF. This method makes it possible to easily adapt the granularity of the expressed parallelism, thanks to the introduced concept of SDF iterators. The usage of SDF iterators is then demonstrated on the Scale Invariant Feature Transform (SIFT) image processing application.
Traditional software testing methods are inefficient in cases where data inputs alone do not determine the outcome of a program's execution. In order to verify such software, testing is often complemented by analy...
详细信息
ISBN:
(纸本)9781450364942
Traditional software testing methods are inefficient in cases where data inputs alone do not determine the outcome of a program's execution. In order to verify such software, testing is often complemented by analysis of the execution trace. For monitoring the execution trace, most approaches today insert additional instructions at the binary level, making the monitoring intrusive. Binary instrumentation operate on a low level, making it difficult to properly modify a program's states and to quantify its code coverage. In this paper, we present a framework for testing complex embedded multithreaded software on the logical level. Testing software on this level avoids dependency on concrete compilers and relates the execution to the source code, thus enabling coverage. Our non-intrusive execution monitoring and control is implemented using the LLVM interpreter compiler infrastructure. Instead of forcing thread interleaving, we suggest simulating interleaving effects through non-intrusive changes of shared variables. This makes it possible to test a single thread without executing the full software stack, which is especially useful in situations where the full software stack is not available (e.g., pre-integration testing). We complement existing approaches with new features such as dynamic configuration of monitoring and execution roll-back to the checkpoints. Our approach introduces acceptable overhead without any complex setup.
This paper presents a low-power implementation of the asynchronous 8051 processor, called A8051 and it employs a new data encoding method, RT/NRT encoding, to reduce switching activities. The paper focuses on power an...
详细信息
ISBN:
(纸本)3540364102
This paper presents a low-power implementation of the asynchronous 8051 processor, called A8051 and it employs a new data encoding method, RT/NRT encoding, to reduce switching activities. The paper focuses on power analysis of the proposed data encoding based on the experimental design of A8051. The proposed data encoding method is devised to meet the DI assumption using Ternary logic. This method reduces not only the number of wires but also the switching activities. In terms of switching activities, the proposed ternary encoding can reduce 26% comparing to conventional ternary encoding. A8051 using RT/NRT encoding shows 24% higher instruction per energy metric comparing to A8051 using dual-rail encoding.
In this work, we present a minimalistic, energy efficient implementation of instruction buffer. We use loop detection and execution trace analysis to find most commonly executed loops in already scheduled application ...
详细信息
The Modular Microserver DataCenter (M2DC) project provides low-energy, configurable, heterogeneous servers for applications that focus on the elaboration of large data sets, but can take advantage of performance enhan...
详细信息
ISBN:
(纸本)9783030275624;9783030275617
The Modular Microserver DataCenter (M2DC) project provides low-energy, configurable, heterogeneous servers for applications that focus on the elaboration of large data sets, but can take advantage of performance enhancement provided by transparent acceleration techniques. In this paper, we exemplify the M2DC approach through one of the project's use cases, namely automotive Internet of Things analytics. We present the main goals of the use case and we show how an appropriate M2DC microserver can be used to accelerate the application without significant modifications to its code.
暂无评论