In this work, we present a minimalistic, energy efficient implementation of instruction buffer. We use loop detection and execution trace analysis to find most commonly executed loops in already scheduled application ...
详细信息
A common problem when developing signal processing applications is to expose and exploit parallelism in order to improve both throughput and latency. Many programming paradigms and models have been introduced to serve...
详细信息
ISBN:
(纸本)9783030275624;9783030275617
A common problem when developing signal processing applications is to expose and exploit parallelism in order to improve both throughput and latency. Many programming paradigms and models have been introduced to serve this purpose, such as the Synchronous DataFlow (SDF) Model of Computation (MoC). SDF is used especially to model signal processing applications. However, the main difficulty when using SDF is to choose an appropriate granularity of the application representation, for example when translating imperative functions into SDF actors. In this paper, we propose a method to model the parallelism of perfectly nested for loops with any bounds and explicit parallelism, using SDF. This method makes it possible to easily adapt the granularity of the expressed parallelism, thanks to the introduced concept of SDF iterators. The usage of SDF iterators is then demonstrated on the Scale Invariant Feature Transform (SIFT) image processing application.
This paper investigates the relationship between two ways of analyzing streaming systems: trace analysis for dataflow programs with firing, and network calculus for network flows. While the former focuses on the struc...
详细信息
The growth in embeddedsystems applications and sophistication increased the need for rapid development and modeling of embedded processors. embedded processors are usually application specific. This causes the strong...
详细信息
ISBN:
(纸本)9781424402717
The growth in embeddedsystems applications and sophistication increased the need for rapid development and modeling of embedded processors. embedded processors are usually application specific. This causes the strong need for modeling environments that can be used for rapid generation of detailed micro-architecture processor simulators. However, existing simulation tools in this category are far less mature and mostly commercial. This paper presents a generic cycle-accurate micro-architecture simulation framework for embedded processors. The framework is designed to generate an RTL (Register Transfer Level) cycle accurate simulator. The framework is built in Java to provide features like extensibility, ability to be changed easily and platform independence. It provides the above features while being as fast as most known available frameworks. The paper uses ARM1022E as an example for embedded processors due to its wide range of applications like modems, cellular phones and automobiles. It simulates its two instruction set architectures (ISA): ARM (32-bit ISA) and THUMB (16-bit ISA). The paper verifies the framework by comparing the ARM simulator with ARMulator (from ARM Ltd.). It also compares the current simulation speed with available known frameworks. Lastly, the paper provides a study of ADPCM (Adaptive Differential Pulse Code Modulation) decode performance on the ARM1022E processor using the framework.
Model-Driven Development (MDD) is a software development paradigm that promotes the use of models at different levels of abstraction and perform transformations between them to derive one or more concrete application ...
详细信息
ISBN:
(纸本)3540364102
Model-Driven Development (MDD) is a software development paradigm that promotes the use of models at different levels of abstraction and perform transformations between them to derive one or more concrete application implementations. In this paper we analyze the current status of MDD regarding its applicability for the development of Real-Time embedded Software. We discuss different modeling framework approaches used to specify the various models, and compare OMG/MDA-based approaches (MOF, UML Profiles and executable UML) with a generic MDD-based approach (GME). Finally, we identify the key challenges for future MDD research in order to successfully apply MDD within RTES Development. These challenges are mainly situated in the field of modeling and standardization of abstraction levels, model transformations and code generation, traceability, and integration of existing software within the MDD development process
With the Internet-of-things revolution, embedded devices are in charge of an ever increasing number of tasks ranging from sensing, up to Artificial Intelligence (AI) functions. In particular, AI is gaining importance ...
详细信息
ISBN:
(纸本)9783030609399;9783030609382
With the Internet-of-things revolution, embedded devices are in charge of an ever increasing number of tasks ranging from sensing, up to Artificial Intelligence (AI) functions. In particular, AI is gaining importance since it can dramatically improve the QoS perceived by the final user and it allows to cope with problems whose algorithmic solution is hard to find. However, the associated computational requirements, mostly made of floating-point processing, impose a careful design and tuning of the computing platforms. In this scenario, there is a need for a set of benchmarks representative of the emerging AI applications and useful to compare the efficiency of different architectural solutions and computing platforms. In this paper we present a suite of benchmarks encompassing computer Graphics, computer Vision and Machine Learning applications, which are greatly used in many AI scenarios. Such benchmarks, differently from other suites, are kernels tailored to be effectively executed in bare-metal and specifically stress the floating-point support offered by the computing platform.
The cache memory plays a crucial role in the performance of any processor. The cache memory (SRAM), especially the on chip cache, is 3-4 times faster than the main memory (DRAM). It can vastly improve the processor pe...
详细信息
ISBN:
(纸本)1424401550
The cache memory plays a crucial role in the performance of any processor. The cache memory (SRAM), especially the on chip cache, is 3-4 times faster than the main memory (DRAM). It can vastly improve the processor performance and speed. Also the cache consumes much less energy than the main memory. That leads to a huge power saving which is very important for embedded applications. In today's processors, although the cache memory reduces the energy consumption of the processor, however the energy consumption in the on-chip cache account to almost 40% of the total energy consumption of the processor. In this paper, we propose a cache architecture, for the instruction cache, that is a modification of the hotspot architecture. Our proposed architecture consists of a small filter cache in parallel with the hotspot cache, between the L1 cache and the main memory. The small filter cache is to hold the code that was not captured by the hotspot cache. We also propose a prediction mechanism to steer the memory access to either the hotspot cache, the filter cache, or the L1 cache. Our design has both a faster access time and less energy consumption compared to both the filter cache and the hotspot cache architectures. We use Mibench and Mediabench benchmarks, together with the simplescalar simulator in order to evaluate the performance of our proposed architecture and compares it with the filter cache and the hotspot cache architectures. The simulation results show that our design outperforms both the filter cache and the hotspot cache in both the average memory access time and the energy consumption.
The emergence of programmable logic devices as processing platforms for digital signal processing applications poses challenges concerning rapid implementation and high level optimization of algorithms on these platfo...
详细信息
ISBN:
(纸本)354026969X
The emergence of programmable logic devices as processing platforms for digital signal processing applications poses challenges concerning rapid implementation and high level optimization of algorithms on these platforms. This paper describes Abhainn, a rapid implementation methodology and toolsuite for translating an algorithmic expression of the system to a working implementation on a heterogeneous multiprocessor/field programmable gate array platform, or a standalone system on programmable chip solution. Two particular focuses for Abhainn are the automated but configurable realisation of inter-processor communuication fabrics, and the establishment of novel dedicated hardware component design methodologies allowing algorithm level transformation for system optimization. This paper outlines the approaches employed in both these particular instances.
Geometric Algebra (GA), a generalization of quaternions, is a very powerful form for intuitively expressing and manipulating complex geometric relationships common to engineering problems. The actual evaluation of GA ...
详细信息
A vector algorithm for computing the two-dimensional Discrete Cosine Transform (2D-VDCT) is presented. The formulation of 2D-VDCT is stated under the framework provided by elements of multilinear algebra. This algebra...
详细信息
ISBN:
(纸本)354026969X
A vector algorithm for computing the two-dimensional Discrete Cosine Transform (2D-VDCT) is presented. The formulation of 2D-VDCT is stated under the framework provided by elements of multilinear algebra. This algebraic framework provides not only a formalism for describing the 2D-VDCT, but it also enables the derivation by pure algebraic manipulations of an algorithm that is well suited to be implemented in SIMD-vector signal processors with a scalable level of parallelism. The 2D-VDCT algorithm can be implemented in a matrix oriented language and a suitable compiler generates code for our fan-lily of STA (Synchronous Transfer Architecture) vector architectures with different amounts of SIMD-parallelism. We show in this paper how important speedup factors are achieved by this methodology.
暂无评论