A transient hardware fault occurs when an energetic particle strikes a transistor, causing it to change state. Although transient faults do not permanently damage the hardware, they may corrupt computations by alterin...
详细信息
A transient hardware fault occurs when an energetic particle strikes a transistor, causing it to change state. Although transient faults do not permanently damage the hardware, they may corrupt computations by altering stored values and signal transfers. In this paper, we propose a new scheme for provably safe and reliable computing in the presence of transient hardware faults. In our scheme, software computations are replicated to provide redundancy while special instructions compare the independently computed results to detect errors before writing critical data. In stark contrast to any previous efforts in this area, we have analyzed our fault tolerance scheme from a formal, theoretical perspective. To be specific, first, we provide an operational semantics for our assembly language, which includes a precise formal definition of our fault model. Second, we develop an assembly-level type system designed to detect reliability problems in compiled code. Third, we provide a formal specification for program fault tolerance under the given fault model and prove that all well-typed programs are indeed fault tolerant. In addition to the formal analysis, we evaluate our detection scheme and show that it only takes 34% longer to execute than the unreliable version.
Future mainstream microprocessors will likely integrate specialized accelerators, such as GPUs, onto a single die to achieve better performance and power efficiency. However, it remains a keen challenge to program suc...
详细信息
Future mainstream microprocessors will likely integrate specialized accelerators, such as GPUs, onto a single die to achieve better performance and power efficiency. However, it remains a keen challenge to program such a heterogeneous multi-core platform, since these specialized accelerators feature ISAs and functionality that are significantly different from the general purpose CPU cores. In this paper, we present EXOCHI: (1) Exoskeleton Sequencer (EXO), an architecture to represent heterogeneous accelerators as ISA-based MIMD architecture resources, and a shared virtual memory heterogeneous multithreaded program execution model that tightly couples specialized accelerator cores with general purpose CPU cores, and (2) C for Heterogeneous Integration (CHI), an integrated C/C++ programming environment that supports accelerator-specific inline assembly and domain-specific languages. The CHI compiler extends the OpenMP pragma for heterogeneous multithreading programming, and produces a single fat binary with code sections corresponding to different instruction sets. The runtime can judiciously spread parallel computation across the heterogeneous cores to optimize performance and power. We have prototyped the EXO architecture on a physical heterogeneous platform consisting of an Intel (R) Core (TM) 2 Duo processor and an 8-core 32-thread Intel (R) Graphics Media Accelerator X3000. In addition, we have implemented the CHI integrated programming environment with the Intel (R) C++ Compiler, runtime toolset, and debugger. On the EXO prototype system, we have enhanced a suite of production-quality media kernels for video and image processing to utilize the accelerator through the CHI programming interface, achieving significant speedup (1.41X to 10.97X) over execution on the IA32 CPU alone.
The proceedings contain 29 papers. The topic discussed include: a serializability violation detector for shared-memory server programs;path slicing;checking type safety of foreign function calls;permission-based owner...
详细信息
The proceedings contain 29 papers. The topic discussed include: a serializability violation detector for shared-memory server programs;path slicing;checking type safety of foreign function calls;permission-based ownership: encapsulating state in higher-order typed languages;code placement for improving dynamic branch prediction accuracy;garbage collection without paging;register allocation for software pipelined multi-dimensional loops;automatically partitioning packet processing applications for pipelined architectures;programming ad-hoc networks of mobile and resource-constrained devices;composing security policies with polymer;and formal loop merging for signal transforms.
We present a novel technique for static race detection in Java programs, comprised of a series of stages that employ a combination of static analyses to successively reduce the pairs of memory accesses potentially inv...
详细信息
Atomos is the first programminglanguage with implicit transactions, strong atomicity, and a scalable multiprocessor implementation. Atomos is derived from Java, but replaces its synchronization and conditional wailin...
详细信息
Program termination is central to the process of ensuring that systems code can always react. We describe a new program termination prover that performs a path-sensitive and context-sensitive program analysis and prov...
详细信息
Dependence graphs and memoization can be used to efficiently update the output of a program as the input changes dynamically. Recent work has studied techniques for combining these approaches to effectively dynamize a...
详细信息
Atomos is the first programminglanguage with implicit transactions, strong atomicity, and a scalable multiprocessor implementation. Atomos is derived from Java, but replaces its synchronization and conditional waitin...
详细信息
Tree parsing as supported by code generator generators like BEG, burg, iburg, lburg and ml-burg is a popular instruction selection method. There are two existing approaches for implementing tree parsing: dynamic progr...
详细信息
暂无评论