Next generation deep neural networks for classification hosted on embedded platforms will rely on fast, efficient, and accurate learning algorithms. Initialization of weights in learning networks has a great impact on...
详细信息
ISBN:
(纸本)9781509030767
Next generation deep neural networks for classification hosted on embedded platforms will rely on fast, efficient, and accurate learning algorithms. Initialization of weights in learning networks has a great impact on the classification accuracy. In this paper we focus on deriving good initial weights by modeling the error function of a deep neural network as a high-dimensional landscape. We observe that due to the inherent complexity in its algebraic structure, such an error function may conform to general results of the statistics of large systems. To this end we apply some results from Random Matrix Theory to analyse these functions. We model the error function in terms of a Hamiltonian in N-dimensions and derive some theoretical results about its general behavior. These results are further used to make better initial guesses of weights for the learning algorithm.
This paper discusses efficiency measures for the evaluation of high performance multimedia systems on a chip (SOC), considering a throughput rate R, chip size A, power dissipation P, and a flexibility criterion F. Bas...
详细信息
ISBN:
(纸本)9783540736226
This paper discusses efficiency measures for the evaluation of high performance multimedia systems on a chip (SOC), considering a throughput rate R, chip size A, power dissipation P, and a flexibility criterion F. Based on the analysis of recently published multimedia chips, the paper shows equivalences between the ratio of R over AP, a weighted sum on 1/R, A P, and a fuzzy multicriteria analysis on R, A, P. The paper indicates the fuzzy multicriteria analysis as generalization of the other efficiency measures, which can be easily applied to multiple cost and performance criteria. Because of the application of fuzzy set theory, the multicriteria approach supports quantitative criteria with a physical background as well as qualitative criteria by linguistic variables.
Recently introduced processors such as Tilera's Tile Gx100 and Intel's 48-core SCC have delivered on the promise of high performance per watt in manycore processors, making these architectures ostensibly as at...
详细信息
ISBN:
(纸本)9781479901036
Recently introduced processors such as Tilera's Tile Gx100 and Intel's 48-core SCC have delivered on the promise of high performance per watt in manycore processors, making these architectures ostensibly as attractive for low-power embedded processors as for cloud services. However, these architectures space-multiplex the microarchitectural resources between many threads to increase utilization, which leads to potentially large and varying levels of interference. This decorrelates CPU-time from actual application progress and decreases the ability of traditional software to accurately track and finely control application progress, hindering the adoption of manycore processors in embedded computing. In this paper we propose Progress Time as the counterpart of CPU-time in space-multiplexed systems and show how it can be used to track application progress. We also introduce TimeCube, a manycore embedded processor that uses dynamic execution isolation and shadow performance modeling to provide an accurate online measurement of each application's Progress Time. Our evaluation shows that a 32-core TimeCube processor can track application progress with less than 1% error even in the presence of a 6x average worst-case slowdown. TimeCube also uses Progress Times to perform online architectural resource management that leads to a 36% improvement in throughput compared to existing microarchitectural resource allocation schemes. Overall, the results argue for adding the requisite microarchitectural structures to support Progress Time in manycore chips for embeddedsystems.
In this paper we evaluate the promise held by low-power GPUs for non-graphic workloads that arise in embeddedsystems. Towards this, we map and implement 5 benchmarks, that find utility in very different application d...
详细信息
ISBN:
(纸本)9781479901036
In this paper we evaluate the promise held by low-power GPUs for non-graphic workloads that arise in embeddedsystems. Towards this, we map and implement 5 benchmarks, that find utility in very different application domains, to an embedded GPU. Our results show that apart from accelerated performance, embedded GPUs are promising also because of their energy efficiency which is an important design goal for battery-driven mobile devices. We show that adopting the same optimization strategies as those used for programming high-end GPUs might lead to worse performance on embedded GPUs. This is due to restricted features of embedded GPUs, such as, limited or no user-defined memory, small instruction-set, limited number of registers, among others. We propose techniques to overcome such challenges, e.g., by distributing the workload between GPUs and multi-core CPUs, similar to the spirit of heterogeneous computation.
This paper introduces a methodology for forward error correction (FEC) architectures prototyping, oriented to system verification and characterization. A complete design flow is described, which satisfies the requirem...
详细信息
ISBN:
(纸本)9781467322973;9781467322966
This paper introduces a methodology for forward error correction (FEC) architectures prototyping, oriented to system verification and characterization. A complete design flow is described, which satisfies the requirement for error-free hardware design and acceleration of FEC simulations. FPGA devices give the designer the ability to observe rare events, due to tremendous speed-up of FEC operations. A Matlab-based system assists the investigation of the impact of very rare decoding failure events on the FEC system performance and the finding of solutions which aim to parameters optimization and BER performance improvement of LDPC codes in the error floor region. Furthermore, the development of an embedded system, which offers remote access to the system under test and verification process automation, is explored. The presented here prototyping approach exploits the high-processing speed of FPGA-based emulators and the observability and usability of software-based models.
Attacks on embedded devices using the electromagnetic (EM) side channel have proliferated. Predicting software vulnerability to such attacks requires an ability to simulate EM fields during software development rather...
详细信息
ISBN:
(纸本)9783031150746;9783031150739
Attacks on embedded devices using the electromagnetic (EM) side channel have proliferated. Predicting software vulnerability to such attacks requires an ability to simulate EM fields during software development rather than relying on expensive lab-based measurements. We propose a modeling approach capable of synthesizing instruction-level EM traces for arbitrary software, using a one-time pre-characterization of a processor. Reducing the cost of dictionary construction is a major contribution of this paper. Results on a set of benchmarks show that synthesized traces are accurate in estimating EM emanations with less than 5% mean absolute percentage error (MAPE) compared to measurements. Furthermore, synthesized traces predict control flow leakage with an accuracy of 87% or more based on the side-channel vulnerability factor (SVF) metric.
In this article we implement a stochastic modeling technique for simulating the communication between processors and arbitration among buses for an embedded SoC. The stochastic models implemented with queues have been...
详细信息
In this article we implement a stochastic modeling technique for simulating the communication between processors and arbitration among buses for an embedded SoC. The stochastic models implemented with queues have been used to estimate, through simulation of different arbitration policies, the power consumption and delays, as well as estimate average or worst case scenarios that could occur with different architectures and arbitration policies. This idea could then be extended to writing probabilistic test benches to analyze the performance of different architectures as well as device and test arbitration policies which would attempt to optimize the power consumption and buffer lengths with constraints on the average delay. (c) 2006 Elsevier B.V. All rights reserved.
Due to energy efficiency requirements of modern embeddedsystems, chip vendors are inclined towards multicore architectures with different types of processing engines and non-uniform interconnect fabrics. At the same ...
详细信息
ISBN:
(纸本)9781467322973;9781467322966
Due to energy efficiency requirements of modern embeddedsystems, chip vendors are inclined towards multicore architectures with different types of processing engines and non-uniform interconnect fabrics. At the same time multiple applications are intended to run concurrently on the devices with such heterogeneous architectures. This rapid growth in the complexity of the hardware and its use cases imposes new challenges on the software development tools. To overcome this complexity, model of computation based approaches are becoming increasingly promising. Synchronous Data Flow (SDF) is a popular specification formalism for streaming applications with inherently concurrent nature. However, the parallelism expressed in the original representation is often not sufficient to maximally exploit the potential of multicore platforms. In this paper we present a holistic methodology for improving the throughput of streaming applications while mapping them onto heterogeneous architectures. The approach uses transformations that adapt the parallelism in SDF according to available platform resources. We use a genetic algorithm to explore SDF instances with the objective of maximizing throughput on a target platform. Our model supports architecture heterogeneity and multi-application scenarios. The experiments indicate that our approach outperforms other techniques for exploiting parallelism on a single application in most of the test cases and enables concurrent applications optimization.
The computational demand of signal processing algorithms is rising continuously. Heterogeneous embedded multiprocessor systems-on-chips are one solution to tackle this demand. But to be able to take advantage of the b...
详细信息
ISBN:
(纸本)9783540736226
The computational demand of signal processing algorithms is rising continuously. Heterogeneous embedded multiprocessor systems-on-chips are one solution to tackle this demand. But to be able to take advantage of the benefits of these systems, new strategies are required how to map applications to such a system and how to evaluate the system's performance at a very early design stage. We will present a static, analytical, bottom-up methodology for temporal and spatial mapping of applications to MP-SoCs based on packing. Furthermore we will demonstrate how the result can be used for performance evaluation and system improvement without the need for simulations.
In this paper, we present a new approach for mapping LLVM IR to binary machine code for overcoming the current limitations of host-based simulations of performance-critical embedded software imposed by compiler optimi...
详细信息
ISBN:
(纸本)9783031045806;9783031045790
In this paper, we present a new approach for mapping LLVM IR to binary machine code for overcoming the current limitations of host-based simulations of performance-critical embedded software imposed by compiler optimizations. Our novel, fully automated mapping approach even copes with aggressive compiler optimizations without requiring any modification to the compiler or the need of expert supervision. Experimental results show that accurate mappings are produced even when compiling with the highest level of optimization (average error below 2%). The proposed simulation methodology provides a speedup of at least 26 compared to the widely used gem5 simulator.
暂无评论