Exascale computing systems will require sufficient resilience to tolerate numerous types of hardware faults while still assuring correct program execution. Such extreme-scale machines are expected to be dominated by p...
详细信息
ISBN:
(纸本)9781467322645
Exascale computing systems will require sufficient resilience to tolerate numerous types of hardware faults while still assuring correct program execution. Such extreme-scale machines are expected to be dominated by processors driven at lower voltages (near the minimum 0.5 volts for current transistors). At these voltage levels, the rate of transient errors increases dramatically due to the sensitivity to transient and geographically localized voltage drops on parts of the processor chip. To achieve power efficiency, these processors are likely to be streamlined and minimal, and thus they cannot be expected to handle transient errors entirely in hardware. Here we present an open, compiler-based framework to automate the armoring of High Performance Computing (HPC) software to protect it from these types of transient processor errors. We develop an open infrastructure to support research work in this area, and we define tools that, in the future, may provide more complete automated and/or semi-automated solutions to support software resiliency on future exascale architectures. Results demonstrate that our approach is feasible, pragmatic in how it can be separated from the software development process, and reasonably efficient (0% to 30% overhead for the Jacobi iteration on common hardware;and 20%, 40%, 26%, and 2% overhead for a randomly selected subset of benchmarks from the Livermore Loops [1]).
The E/E (Electrical/Electronic) Architecture is the key enabler for new innovative user functions in the automotive domain. The E/E Architecture needs to manage the complexity of the E/E System in a cost-efficient man...
详细信息
The E/E (Electrical/Electronic) Architecture is the key enabler for new innovative user functions in the automotive domain. The E/E Architecture needs to manage the complexity of the E/E System in a cost-efficient manner. This is a fact for all domains developing mass-produced distributed systems containing embedded software. However, many companies in these domains are missing a clear description of the development process for E/E Architectures. In addition, the relation between development of E/E Architectures and the development of E/E Systems is not clear. This paper proposes a development process for E/E Architectures in the automotive domain. Furthermore, it shows the relation between development of E/E Architectures and development of E/E Systems. The development process is based on post mortem analysis of E/E System development projects conducted at an automotive company during the period 1998-2009, and it was validated in an E/E Architecture development project conducted during the period 2010-2011. The contribution of this paper is a detailed E/E System development process describing how to create and maintain an E/E Architecture and how to refine this into a Product-specific Architecture in Product development projects. Furthermore, the paper reports on experiences from working with RA development both as a small stand-alone company with few different products, and as part of a large global company with several different products.
When performing narrow-width computations, power gating of unused arithmetic circuit portions can significantly reduce leakage power. We deploy coarse-grain power gating in 32-bit integer arithmetic circuits that freq...
详细信息
When performing narrow-width computations, power gating of unused arithmetic circuit portions can significantly reduce leakage power. We deploy coarse-grain power gating in 32-bit integer arithmetic circuits that frequently will operate on narrow-width data. Our contributions include a design framework that automatically implements coarse-grain power-gated arithmetic circuits considering a narrow-width input data mode, and an analysis of the impact of circuit architecture on the efficiency of this data-width-driven power gating scheme. As an example, with a performance penalty of 6.7%, coarse-grain power gating of a 45-nm 32-bit multiplier is demonstrated to yield an 11.6× static leakage energy reduction per 8×8-bit operation.
We present a novel architecture for a lightweight Viterbi accelerator that can be tightly integrated inside an embedded processor datapath. We investigate the accelerator's impact on processor performance by using...
详细信息
We present a novel architecture for a lightweight Viterbi accelerator that can be tightly integrated inside an embedded processor datapath. We investigate the accelerator's impact on processor performance by using the EEMBC Viterbi benchmark and the in-house Viterbi Branch Metric kernel. Our evaluation based on the EEMBC benchmark shows that an accelerated 65-nm 2.7-ns processor datapath is 20% larger but 90% more cycle efficient than a datapath lacking the Viterbi accelerator, leading to an 87% overall energy reduction and a data throughput of 3.52 Mbit/s.
Transistor geometries are well into the nanometer regime, keeping with Moore's Law. With this scaling in geometry, problems not significant in the larger geometries have come to the fore. These problems, collectiv...
详细信息
Transistor geometries are well into the nanometer regime, keeping with Moore's Law. With this scaling in geometry, problems not significant in the larger geometries have come to the fore. These problems, collectively termed variability, stem from second-order effects due to the small geometries themselves and engineering limitations in creating the small geometries. The engineering obstacles have a few solutions which are yet to be widely adopted due to cost limitations in deploying them. Addressing and mitigating variability due to second-order effects comes largely under the purview of device engineers and to a smaller extent, design practices. Passive layout measures that ease these manufacturing limitations by regularizing the different layout pitches have been explored in the past. However, the question of the best design practice to combat systematic variations is still open. In this work we explore considerations for the regular layout of the exclusive-OR gate, the half-adder and full-adder cells implemented with varying degrees of regularity. Tradeoffs like complete interconnect unidirectionality, and the inevitable introduction of vias are qualitatively analyzed and some factors affecting the analysis are presented. Finally, results from the Calibre Critical Feature Analysis (CFA) of the cells are used to evaluate the qualitative analysis.
Business and design decisions regarding software development should be based on data, not opinions among developers, domain experts or managers. The company running the most and fastest experiments among the customer ...
详细信息
Business and design decisions regarding software development should be based on data, not opinions among developers, domain experts or managers. The company running the most and fastest experiments among the customer base against the lowest cost per experiment outcompetes others by having the data to engineer products with outstanding qualities such as power consumption and user experience. Innovation experiment systems for mass-produced devices with embedded software is an evolution of current R&D practices, going from where innovations are internally evaluated by the original equipment manufacturer to where they are tried by real users in a scale relevant to the full customer base. The turnaround time from developing and deploying an embedded product to getting customer feedback is decreased to weeks, the limit being the speed of the software development teams. The paper presents an embedded architecture for realising such a novel innovation experiment system based on a set of scenarios of what to evaluate in the experiments. A case is presented implementing an architecture in a prototype in-vehicle infotainment system where comparative testing between two software alternatives was performed.
The Lovász θ function of a graph, a fundamental tool in combinatorial optimization and approximation algorithms, is computed by solving a SDP. In this paper we establish that the Lovász θ function is equiv...
The Lovász θ function of a graph, a fundamental tool in combinatorial optimization and approximation algorithms, is computed by solving a SDP. In this paper we establish that the Lovász θ function is equivalent to a kernel learning problem related to one class SVM. This interesting connection opens up many opportunities bridging graph theoretic algorithms and machine learning. We show that there exist graphs, which we call SVM – θ graphs, on which the Lovász θ function can be approximated well by a one-class SVM. This leads to novel use of SVM techniques for solving algorithmic problems in large graphs e.g. identifying a planted clique of size Θ(√n) in a random graph G(n, 1/2). A classic approach for this problem involves computing the θ function, however it is not scalable due to SDP computation. We show that the random graph with a planted clique is an example of SVM – θ graph. As a consequence a SVM based approach easily identifies the clique in large graphs and is competitive with the state-of-the-art. We introduce the notion of common orthogonal labelling and show that it can be computed by solving a Multiple Kernel learning problem. It is further shown that such a labelling is extremely useful in identifying a large common dense subgraph in multiple graphs, which is known to be a computationally difficult problem. The proposed algorithm achieves an order of magnitude scalability compared to state of the art methods.
Model-based testing presents new challenges in how to perform software testing due to the fact that models offer testing on several abstraction levels. This is largely an unexplored area. We propose a pattern to const...
详细信息
Model-based testing presents new challenges in how to perform software testing due to the fact that models offer testing on several abstraction levels. This is largely an unexplored area. We propose a pattern to construct test actors which can be used to test Platform-Independent Models. In addition, these test actors can also be automatically transformed to Platform-Specific Model level to test the implementation deployed on target. Our work is one step in the direction of permitting early testing without any waste, since the test models can be reused at a lower level of abstraction.
technology scaling of integrated circuits is making transistors increasingly sensitive to process variations, wear-out effects and ionizing particles. This may lead to an increasing rate of transient and intermittent ...
详细信息
An Improved Maximum A Posteriori (IMAP) decoder for turbo-coded decode-and-forward (DF) relay channels which takes into account the decoding errors at the relay is analyzed. This decoder is implemented in an iterative...
详细信息
An Improved Maximum A Posteriori (IMAP) decoder for turbo-coded decode-and-forward (DF) relay channels which takes into account the decoding errors at the relay is analyzed. This decoder is implemented in an iterative manner similar to the traditional iterative decoder (TID) used for turbo codes. The performance of this IMAP decoder is compared with the performance of an iterative decoder normally used in the literature which does not take into account the decoding errors at the relay, by simulation. The comparison shows that although the proposed IMAP decoder provides a better performance, especially when many decoding errors occur at the relay, the improvement is not significant. Then, another heuristic modification for the iterative decoder is proposed. In spite of the lack of theoretical analysis, the numerical results show that the proposed heuristically modified iterative decoder (HMID) gives significantly better performance than the traditional one.
暂无评论