One of the most computationally intensive parts in modern recognition systems is an inference of deep neural networks that are used for image classification, segmentation, enhancement, and recognition. The growing pop...
详细信息
ISBN:
(纸本)9781728188089;9781728188096
One of the most computationally intensive parts in modern recognition systems is an inference of deep neural networks that are used for image classification, segmentation, enhancement, and recognition. The growing popularity of edge computing makes us look for ways to reduce its time for mobile and embedded devices. One way to decrease the neural network inference time is to modify a neuron model to make it more efficient for computations on a specific device. The example of such a model is a bipolar morphological neuron model. The bipolar morphological neuron is based on the idea of replacing multiplication with addition and maximum operations. This model has been demonstrated for simple image classification with LeNet-like architectures [1]. In the paper, we introduce a bipolar morphological ResNet (BM-ResNet) model obtained from a much more complex ResNet architecture by converting its layers to bipolar morphological ones. We apply BM-ResNet to image classification on MNIST and CIFAR-10 datasets with only a moderate accuracy decrease from 99.3% to 99.1 % and from 85.3% to 85.1 %. We also estimate the computational complexity of the resulting model. We show that for the majority of ResNet layers, the considered model requires 2.1-2.9 times fewer logic gates for implementation and 15-30 % lower latency.
This paper applies Model-Based Design methodologies to develop a high-performance device driver model suitable for advanced smart solutions. These modeling techniques are mandatory to use when designing advanced smart...
详细信息
ISBN:
(数字)9781728166469
ISBN:
(纸本)9781728166476
This paper applies Model-Based Design methodologies to develop a high-performance device driver model suitable for advanced smart solutions. These modeling techniques are mandatory to use when designing advanced smart applications based on embedded devices. The paper proposes an innovative approach to accurately models the operations of the STM32 analog converter peripheral for both simulation purpose and to generate an optimized code that integrates on the Low-Level STM32 firmware driver. A new Simulink toolbox has been developed to replicate the ADC functions. Finally, an example has been presented to show the effectiveness of this new modeling approach.
The study of program behavior in unreliable hardware infrastructure via fault injection is of great interest to researchers. By modeling the effect of faults on program behavior, programs can be tuned to be more fault...
详细信息
ISBN:
(纸本)9781728124377
The study of program behavior in unreliable hardware infrastructure via fault injection is of great interest to researchers. By modeling the effect of faults on program behavior, programs can be tuned to be more fault resilient via fault-tolerant approaches, or to be tuned to saved energy via approximate computing techniques. Existing frameworks, however, do not adequately address the need of fault injection/detection simulation. To fill this gap, this paper proposes ComFIDet, a low-overhead and comprehensive instruction-level fault injection/detection framework. ComFIDet allows for the identification of both reliable and unreliable regions of a program via the simulation of a variety of hardware faults and can further be utilized in the tuning of a reliable design or an approximate computing technique through the implementation of run-time fault detection and effortless experiment replay. ComFIDet offers an easy to reconfigure process, iterative fault injection experiments, and requires zero modifications to the program under test. ComFIDet is evaluated through a comprehensive set of tests, exploring the effects of one possible configuration on program behavior.
Reconfiguration capability in nowadays embeddedsystems such as Reconfigurable Computing (RC) systems improves the execution of applications efficiently. However, the reconfiguration overhead in the mapping process of...
详细信息
ISBN:
(数字)9781728159379
ISBN:
(纸本)9781728159386
Reconfiguration capability in nowadays embeddedsystems such as Reconfigurable Computing (RC) systems improves the execution of applications efficiently. However, the reconfiguration overhead in the mapping process of application compilation degrades the performance of these systems. In this paper, a novel distributed application graph mapping has been proposed to reduce the heavy computations of mapping problem analytically. For this purpose, matrix modifications have been used to derive a distance model in resource graph. Using this model, it is possible to remove heavy-weight values from the search space of solutions and achieve a low-cost solution faster, as well. This model classifies the distance matrix of resource graph into independent regions to transform the mapping problem into suboptimal problems. simulation results show that the proposed approach for application graph mapping outperformed the stateof-art methods in terms of complexity and time overhead, especially for large-scale application graphs.
General circulation models are essential tools in weather and hydrodynamic simulation. They solve discretized, complex physical equations in order to compute evolutionary states of dynamical systems, such as the hydro...
详细信息
Reinforcement learning (RL) has achieved some impressive recent successes in various computer games and simulations. Most of these successes are based on having large numbers of episodes from which the agent can learn...
详细信息
Reinforcement learning (RL) has achieved some impressive recent successes in various computer games and simulations. Most of these successes are based on having large numbers of episodes from which the agent can learn. In typical robotic applications, however, the number of feasible attempts is very limited. In this paper we present a sample-efficient RL algorithm applied to the example of a table tennis robot. In table tennis every stroke is different, with varying placement, speed and spin. An accurate return therefore has to be found depending on a high-dimensional continuous state space. To make learning in few trials possible the method is embedded into our robot system. In this way we can use a one-step environment. The state space depends on the ball at hitting time (position, velocity, spin) and the action is the racket state (orientation, velocity) at hitting. An actor-critic based deterministic policy gradient algorithm was developed for accelerated learning. Our approach performs competitively both in a simulation and on the real robot in a number of challenging scenarios. Accurate results are obtained without pre-training in under 200 episodes of training. The video presenting our experiments is available at https://***/uRAtdoL6Wpw.
Increased uncertainties in design parameters undermine the accuracy of the mapping of embedded applications to Network-on-Chip (NoC) based manycore architectures. In this paper, we attempt for the first time to apply ...
详细信息
ISBN:
(数字)9781728180588
ISBN:
(纸本)9781728180595
Increased uncertainties in design parameters undermine the accuracy of the mapping of embedded applications to Network-on-Chip (NoC) based manycore architectures. In this paper, we attempt for the first time to apply the info-gap theory to uncertainty modeling in the context of embeddedsystems design. We first propose a novel info-gap based uncertainty-aware reliability model for NoC based manycore platforms. We then develop an uncertainty-aware solution to the problem of mapping in embeddedsystems. The solution is implemented as a computer program that can generate robust Pareto frontiers. simulation results indicate that the proposed info-gap based uncertainty-aware mapping generates Pareto frontiers that have significant differences from the ones obtained with a traditional deterministic approach. Identifying and quantifying these differences is an important first step towards the development of better mapping optimization processes in order to arrive to optimal rather than suboptimal solutions.
The proceedings contain 51 papers. The topics discussed include: methods for design and implementation of dynamic signal processing systems;supercomputing: past, present, and a possible future;on STM concurrency contr...
ISBN:
(纸本)9781457708008
The proceedings contain 51 papers. The topics discussed include: methods for design and implementation of dynamic signal processing systems;supercomputing: past, present, and a possible future;on STM concurrency control for multicore embedded real-time software;accelerating collective communication in message passing on manycore system-on-chip;on the impact of dynamic task scheduling in heterogeneous MPSoCs;skeleton-based automatic parallelization of image processing algorithms for GPUs;power adaptive computing system design in energy harvesting environment;Smart Cache: a self adaptive cache architecture for energy efficiency;power proportional characteristics of an energy manager for web clusters;thermal optimization for micro-architectures through selective block replication;and design metrics and visualization techniques for analyzing the performance of MOEAs in DSE.
PXI FPGA Peripheral Modules by National Instruments are meant to be used in LabView even without any previous knowledge of Hardware Description Languages (HDL) and let users to hardware-accelerate their own test and m...
详细信息
gem5 has been extensively used in computer architecture simulations and in the evaluation of new architectures for HPC (high performance computing) systems. Previous work has validated gem5 against ARM platforms. Howe...
详细信息
ISBN:
(纸本)9781728159775
gem5 has been extensively used in computer architecture simulations and in the evaluation of new architectures for HPC (high performance computing) systems. Previous work has validated gem5 against ARM platforms. However, gem5 still shows high inaccuracy when modeling x86 based processors. In this work, we focus on the simulation of a single node high performance system and study the sources of inaccuracies of gem5. Then we validate gem5 simulator against an Intel processor, Core-i7 (Haswell microarchitecture). We configured gem5 as close as possible to match Core-i7 Haswell microarchitecture configurations and made changes to the simulator to add some features, modified existing code, and tuned built-in configurations. As a result, we validated the simulator by fixing many sources of errors to match real hardware results with less than 6% mean error rate for different control, memory, dependency and execution microbenchmarks.
暂无评论