Managing the buffering of data along arcs is a critical part of compiling a synchronous dataflow (SDF) program. This paper shows how dataflow properties can be analyzed at compile-time to make buffering more efficient...
详细信息
Managing the buffering of data along arcs is a critical part of compiling a synchronous dataflow (SDF) program. This paper shows how dataflow properties can be analyzed at compile-time to make buffering more efficient. Since the target code corresponding to each node of an SDF graph is normally obtained from a hand-optimized library of predefined blocks, the efficiency of data transfer between blocks is often the limiting factor in how closely an SDF compiler can approximate meticulous manual coding. Furthermore, in the presence of large sample-rate changes, straightforward buffering techniques can quickly exhaust limited on-chip data memory, necessitating the use of slower external memory. The techniques presented in this paper address both of these problems in a unified manner.
The growing complexity and high efficiency requirements of embedded systems call for new code optimization techniques and architecture exploration, using retargetable C and C++ compilers.
The growing complexity and high efficiency requirements of embedded systems call for new code optimization techniques and architecture exploration, using retargetable C and C++ compilers.
Disk subsystem is known to be a major contributor to overall power consumption of high-end parallel systems. Past research proposed several architectural-level techniques to reduce disk power by taking advantage of id...
详细信息
Disk subsystem is known to be a major contributor to overall power consumption of high-end parallel systems. Past research proposed several architectural-level techniques to reduce disk power by taking advantage of idle periods experienced by disks. Although such techniques have been known to be effective in certain cases, they share a common drawback: They operate in a reactive manner, i.e., they control disk power by observing past disk activity (for example, idle and active periods) and estimating future ones. Consequently, they can miss opportunities for saving power and incur significant performance penalties due to inaccuracies in predicting idle and active times. Motivated by this observation, this paper proposes and evaluates a compiler-driven approach to reducing disk power consumption of array-based scientific applications executing on parallel architectures. The proposed approach exposes disk layout information to the compiler, allowing it to derive the disk access pattern, i.e., the order in which parallel disks are accessed. This paper demonstrates two uses of this information. First, we can implement proactive disk power management, i.e., we can select the most appropriate power-saving strategy and disk-preactivation strategy based on the compiler-predicted future idle and active periods of parallel disks. Second, we can restructure the application code to increase the length of idle disk periods, which leads to better exploitation of available power-saving capabilities. We implemented both these approaches within an optimizing compiler and tested their effectiveness using a set of benchmark codes from the Spec 2000 suite and a disk power simulator. Our results show that the compiler-driven disk power management is very promising. The experimental results also reveal that, although proactive disk power management is very effective, code restructuring for disk power achieves additional energy savings across all the benchmarks tested, and these savings a
Important features and capabilities of the 80960 are briefly examined, and an overview of its architecture is given. A detached discussion is presented of the register model, core instruction set, register operations,...
详细信息
Important features and capabilities of the 80960 are briefly examined, and an overview of its architecture is given. A detached discussion is presented of the register model, core instruction set, register operations, memory operations, control operations instruction cache, user-supervisor protection, interrupts, faults, and debug support.
We have developed a framework for analyzing the behavior and relations of various sequential and parallel control constructs, which we can nest in a very general way. A simple yet powerful scheme defines the order of ...
详细信息
We have developed a framework for analyzing the behavior and relations of various sequential and parallel control constructs, which we can nest in a very general way. A simple yet powerful scheme defines the order of data accesses in a program, and provides a well-founded semantic structure for nested constructs. When defining parallel languages or extensions to current languages, designers can use this framework to define how each new feature interacts with the language's other features. Because our approach is based on well-known dependence analysis techniques, it is practical for compiler implementation. It determines which behavior the compiler and system must preserve while allowing aggressive automatic optimization. Instead of being confined to a single programming paradigm, programmers can use the most appropriate constructs for the application, and the compiler can transform and optimize the program for different parallel or sequential architectures.
With recent developments in compilation technology and architectural design, the line between traditional hardware and software roles has become increasingly blurred. The compiler can now see the processor's inner...
详细信息
With recent developments in compilation technology and architectural design, the line between traditional hardware and software roles has become increasingly blurred. The compiler can now see the processor's inner structure, which lets architects exploit sophisticated program analysis techniques to hide branch and memory access delays, for example. Processors can now implement register renaming and dynamic instruction-scheduling algorithms directly in the hardware-something that was once exclusively the compiler's job. A similar shift is occurring in optimizing compilers for parallel machines. To parallelize a larger class of applications, compiler writers are moving beyond static transformations and exploring techniques that rely on runtime decisions or hardware support. This increased blurring of compile-time and runtime optimizations opens many new research opportunities, particularly for program optimization-a task typically performed entirely at compile time. This article describes an optimization continuum and shows how different classes-of optimizations fall within it.
A formal, high-level representation of programs is typically needed for static and dynamic analyses performed by compilers. However, the source code of target applications is not always available in an analyzable form...
详细信息
A formal, high-level representation of programs is typically needed for static and dynamic analyses performed by compilers. However, the source code of target applications is not always available in an analyzable form, e.g., to protect intellectual property. To reason on such applications it becomes necessary to build models from observations of its execution. This paper presents an algebraic approach which, taking as input the trace of memory addresses accessed by a single memory reference, synthesizes an affine loop with a single perfectly nested statement that generates the original trace. This approach is extended to support the synthesis of unions of affine loops, useful for minimally modeling traces generated by automatic transformations of polyhedral programs, such as tiling. The resulting system is capable of processing hundreds of gigabytes of trace data in minutes, minimally reconstructing 100 percent of the static control parts in PolyBench/C applications and 99.9 percent in the Pluto-tiled versions of these benchmarks.
We report the results of dynamic measurements which evaluate the effectiveness of an application specific microprocessor for fuzzy control and fuzzy information processing, We propose to specialize an architecture of ...
详细信息
We report the results of dynamic measurements which evaluate the effectiveness of an application specific microprocessor for fuzzy control and fuzzy information processing, We propose to specialize an architecture of microprocessor for fuzzy theoretic operations using quantitative techniques developed by designers of reduced instruction set computer (RISC), In particular, an introduction of specialized instructions is considered, Experimental results show that we can achieve as high as 2.5 speed up of a program for fuzzy control by introducing two instructions-min and max.
This paper describes SKOL, a system for the synthesis of combinational logic using a library of cells, with emphasis on the technology mapping algorithms. It combines current multilevel optimization techniques with a ...
详细信息
This paper describes SKOL, a system for the synthesis of combinational logic using a library of cells, with emphasis on the technology mapping algorithms. It combines current multilevel optimization techniques with a new approach to the technology mapping problem. This approach is characterized by the use of a numerical string for representing the Boolean expressions and the library cells, which allows a fast selection process. Technology mapping is performed directly on the factored Boolean network, without decomposing it into primitive gates. A dynamic programming approach is used for mapping the whole Boolean network based on the possible matches for each node. Results from benchmark examples show that this approach is effective in reducing the final cell count. Comparisons with existing systems are presented.
暂无评论