As modern processors are becoming increasingly complex, fast and accurate performance prediction is crucial during the early phases of hardware and software co-development. To accurately and efficiently predict the pe...
详细信息
ISBN:
(纸本)9781467373111
As modern processors are becoming increasingly complex, fast and accurate performance prediction is crucial during the early phases of hardware and software co-development. To accurately and efficiently predict the performance of a given software workload is, however, a challenging problem. Traditional cycle-accurate simulation is often too slow, while analytical models are not sufficiently accurate or still require target-specific execution statistics that may be slow or difficult to obtain. In this paper, we propose a novel learning-based approach for synthesizing analytical models that can accurately predict the performance of a workload on a target platform from various performance statistics obtained directly on a host platform using built-in hardware counters. Our learning approach relies on a one-time training phase using a cycle-accurate reference of the chosen target processor. We train our models on over 15,000 program instances from the ACM-ICPC programming contest database, and demonstrate the prediction accuracy on standard benchmark suites. Result show that our approach achieves on average more than 90% accuracy at 160x the speed compared to a cycle-accurate reference simulation.
Image processing algorithms which only work on a local neighbourhood are nearly used in every image processing application. Very often several iterations are performed on a fixed neighbourhood which leads to the descr...
详细信息
ISBN:
(纸本)9781467373111
Image processing algorithms which only work on a local neighbourhood are nearly used in every image processing application. Very often several iterations are performed on a fixed neighbourhood which leads to the description of stencil codes. A promising approach in embeddedsystems is to use the massively parallel computation power of an FPGA for this kind of algorithms. This not only speeds up processing time, if the FPGA is directly placed inside the image acquisition unit forming a smart camera, but also reduces or even eliminates the PC based hardware which saves space and power. However, most designers begin from scratch when they have to implement stencil computations into smart cameras. This leads to a not fully utilized FPGA because the most efficient usage of the given resources is only secondary alongside functional correctness. Therefore, we are presenting in this paper a framework for stencil code applications which immediately delivers the best architecture regarding prominent resource criteria. An analytical model is used to find an optimized parameter set (degree of parallelism, usage of buffers, etc.) for a highly flexible FPGA implementation. A graphical tool allows to further evaluate the effects of certain parameters. Our results show, that we are able to create an optimized hardware architecture for this application domain.
Current robot software architectures use the publish/subscribe messaging protocol to enable communication between components. The messages published by a component have to meet the specifications of components subscri...
详细信息
ISBN:
(纸本)9781509046171
Current robot software architectures use the publish/subscribe messaging protocol to enable communication between components. The messages published by a component have to meet the specifications of components subscribing to these messages. Because of this, components can often not be used directly together and either have to be modified first or need to be wrapped using connector components. This increases the amount of work required to develop robot software. In this paper we propose Complex Events Processing (CEP) with Procedural Parameters as an alternative solution. CEP allows a developer to use various operators besides subscribe to define the communication between components. These operators for example allow mapping, filtering and sampling of messages. To be able to provide a generic set of operators which can be used in any robot application, we allow developers to define procedures as parameters to the operators. The procedures act as a strategy for the computation to be performed, the operator defining what should be done and the procedural parameter defining how to do it. Through an example we show that CEP can be used for creating robot behaviors.
Further development of high performance computing hardware and software is focused on energy and parallel efficiency that are both crucial for future exascale level of supercomputer performance. Real applications test...
详细信息
Further development of high performance computing hardware and software is focused on energy and parallel efficiency that are both crucial for future exascale level of supercomputer performance. Real applications tests as well as small-scale benchmarks of new architectures are important for the choice of the best development strategies. ARM CPUs and NVIDIA GPUs are among the most energy efficient hardware. Recently both architectures have been combined in the NVIDIA Tegra systems-on-chip. In this work we benchmark the development boards with Tegra K1 and X1 both with the Roofline model toolkit as well as with different classical molecular dynamics algorithms implemented in LAMMPS. We consider the utilization of the single and double peak floating-point performance and the power and energy consumption of the corresponding Cortex-A15, Cortex-A57, Kepler and Maxwell cores.
Decades of technology scaling has brought the threat of soft errors to modern embedded processors. Though several methods have been proposed to protect systems from soft errors, their effectiveness in ensuring error-f...
详细信息
ISBN:
(纸本)9781509015047
Decades of technology scaling has brought the threat of soft errors to modern embedded processors. Though several methods have been proposed to protect systems from soft errors, their effectiveness in ensuring error-free computing cannot be guaranteed; without accurate and quantitative estimation of system reliability. The metric vulnerability - which defines the likelihood of device failure by accurately evaluating the time it is exposed to soft errors - provides the most effective means to perform early design space explorations to estimate system reliability in the presence of transient soft errors. In this paper, we present gemV - the first accurate and comprehensive vulnerability estimation toolset, which is configurable and extendible to analyse future/novel architecture and microarchitecture designs. Some of the key features of gemV are: (1) all possible microarchitecture components that store bits, even temporarily, are modeled for their vulnerability in the gem5 cycle-accurate simulation platform, (2) its models have been validated (
Micro-grid systems can support distribution network to avoid insufficient electricity supply by effectively integrating renewable energy sources and energy storage systems. This paper studies the modeling of a micro-g...
详细信息
ISBN:
(纸本)9781509061914
Micro-grid systems can support distribution network to avoid insufficient electricity supply by effectively integrating renewable energy sources and energy storage systems. This paper studies the modeling of a micro-grid system using SimPowersystems in Matlab/Simulink environment. The Micro-grid consists of ten Electric Vehicle Service Equipment (EVSE), a Photovoltaics (PV) farm, an Energy Storage System (ESS), and a commercial building. To minimize charging cost as well as limit the micro-grid peak load, the Non-Integer Generic Algorithm (NIGA) optimization method is used to obtain optimized Plug-in Electric Vehicle (PEV) charging/discharging schedule with time-varying charging rate. The time-of-use (TOU) price and discharge incentive are applied to implement the cost minimization. The simulation results show that the total load is flattened corresponding to TOU price structures. The optimization that considers both discharge incentive and micro-grid load limit can generate a cost-power win-win result.
The paper describes method of formal modeling object event handling as it is implemented in UML. The resulted Petri net allows to check UML model properties not only by simulation but also formally. For possibly close...
详细信息
ISBN:
(纸本)9781509030996
The paper describes method of formal modeling object event handling as it is implemented in UML. The resulted Petri net allows to check UML model properties not only by simulation but also formally. For possibly closest congruence between UML and Petri net model an event queue is defined. Each state machine assigned to an object has its own event queue which is available as long as the machine is. That allows to model not only a simple message passing but also cases, when state machine cannot handle an event. A higher priority of submachine's event queue was also taken into consideration. The presented solutions are part of bigger conversion algorithm from UML model to Petri nets. However, the paper was intended to describe the issue in such a detailed way it could be used outside the whole algorithm.
In many cases, applications are not optimized for the hardware on which they run. Several reasons contribute to this unsatisfying situation, including legacy code, commercial code distributed in binary form, or deploy...
详细信息
ISBN:
(纸本)9781467373111
In many cases, applications are not optimized for the hardware on which they run. Several reasons contribute to this unsatisfying situation, including legacy code, commercial code distributed in binary form, or deployment on compute farms. In fact, backward compatibility of ISA guarantees only the functionality, not the best exploitation of the hardware. In this work, we focus on maximizing the CPU efficiency for the SIMD extensions and propose to convert automatically, and at runtime, loops vectorized for an older version of the SIMD extension to a newer one. We propose a lightweight mechanism, that does not include a vectorizer, but instead leverages what a static vectorizer previously did. We show that many loops compiled for x86 SSE can be dynamically converted to the more recent and more powerful AVX;as well as, how correctness is maintained with regards to challenges such as data dependences and reductions. We obtain speedups in line with those of a native compiler targeting AVX. The re-vectorizer is implemented inside a dynamic optimization platform;it is completely transparent to the user, does not require rewriting binaries, and operates during program execution.
Hybrid integration of opto-electronic integrated circuits (OEICs) with CMOS electronics requires the modeling and characterization of thermal interactions. The thermal-gradient generated by the electronic layer causes...
详细信息
Hybrid integration of opto-electronic integrated circuits (OEICs) with CMOS electronics requires the modeling and characterization of thermal interactions. The thermal-gradient generated by the electronic layer causes modulation of the refractive index of the optical layers. This requires a thermal characterization for thermal-aware integration of OEICs. Full-scale, coupled, 3D electromagnetic and heat transfer simulation of the optical design is computationally infeasible. This paper describes a methodology and an abstraction model that, given external temperature hot-spots, computes a thermal gradient across the optical layout which can be used to estimate the deviation in operation of OEICs. The relative sparseness of optical layouts is exploited to employ a thermal resistance model. Our abstractions enable a compact 2.5D model for thermal computations, and techniques such as the Alternating Direction Implicit (ADI) method can be employed for numerical computations. The approach is applied on a fabricated silicon (Si) photonic optical logic chip.
The FlexTiles Platform has been developed within a Seventh Framework Programme project which is co-funded by the European Union with ten participants of five countries. It aims to create a self-adaptive heterogeneous ...
详细信息
ISBN:
(纸本)9781467373111
The FlexTiles Platform has been developed within a Seventh Framework Programme project which is co-funded by the European Union with ten participants of five countries. It aims to create a self-adaptive heterogeneous many-core architecture which is able to dynamically manage load balancing, power consumption and faulty modules. Its focus is to make the architecture efficient and to keep programming effort low. Therefore, the concept contains a dedicated automated tool-flow for creating both the hardware and the software, a simulation platform that can execute the same binaries as the FPGA prototype and a virtualization layer to manage the final heterogeneous many-core architecture for run-time adaptability. With this approach software development productivity can be increased and thus, the time-to-market and development costs can be decreased. In this paper we present the FlexTiles Development Platform with a many-core architecture demonstration. The steps to implement, validate and integrate two use-cases are discussed.
暂无评论