HoME is a version of Smalltalk which can be efficiently executed on a multiprocessor and can be executed in parallel by combining a Smalltalk process with a Mach thread and executing the process on the thread. HoME is...
详细信息
ISBN:
(纸本)0897914759
HoME is a version of Smalltalk which can be efficiently executed on a multiprocessor and can be executed in parallel by combining a Smalltalk process with a Mach thread and executing the process on the thread. HoME is nearly the same as ordinary Smalltalk except that multiple processes may execute in parallel. Thus, almost all applications running on ordinary Smalltalk can be executed on HoME without changes in their code. HoME was designed and implemented based on the following fundamental policies: (1) theoretically, an infinite number of processes can become active;(2) the moment a process is scheduled, it becomes active;(3) no process switching occurs;(4) HoME is equivalent to ordinary Smalltalk except for the previous three policies. The performance of the current implementation of HoME running on OMRON LUNA-88K, which had four processors, was measured by benchmarks which execute in parallel with multiple processes. In all benchmarks, the results showed that HoME's performance is much better than HPS on the same workstation.
Future mainstream microprocessors will likely integrate specialized accelerators, such as GPUs, onto a single die to achieve better performance and power efficiency. However, it remains a keen challenge to program suc...
详细信息
ISBN:
(纸本)9781595936332
Future mainstream microprocessors will likely integrate specialized accelerators, such as GPUs, onto a single die to achieve better performance and power efficiency. However, it remains a keen challenge to program such a heterogeneous multi-core platform, since these specialized accelerators feature ISAs and functionality that are significantly different from the general purpose CPU cores. In this paper. we present EXOCHI: (1) Exoskeleton Sequencer (EXO), an architecture to represent heterogeneous accelerators as ISA-based MIMD architecture resources, and a shared virtual memory heterogeneous multithreaded program execution model that tightly couples specialized accelerator cores with general purpose CPU cores, and (2) C for Heterogeneous Integration (CHI), an integrated C/C++ programming environment that supports accelerator-specific inline assembly and domain-specific languages. The CHI compiler extends the OpenMP pragma for heterogeneous multithreading programming, and produces a single fat binary with code sections corresponding to different instruction sets. The runtime can judiciously spread parallel Computation across the heterogenous cores to optimize performance and power. We have prototyped the EXO architecture on a physical heterogeneous platform consisting of an Intel (R) Core (TM) 2 Duo Processor and an 8-core 32-thread Intel (R) Graphics Media Accelerator X3000. In addition. we have implemented the CHI integrated programming environment with the Intel (R) C++ Compiler, runtime toolset. and debugger. On the EXO prototype system, we have enhanced a suite of production-quality media kernels for video and image processing to utilize the accelerator through the CHI programming interface, achieving significant speedup (1.41x to 10.97x) over execution on the IA32 CPU alone.
We describe an approach to implementing a wide-range of concurrency paradigms in high-level (symbolic) programminglanguages. The focus of our discussion is STING, a dialect of Scheme, that supports lightweight thread...
详细信息
ISBN:
(纸本)0897914759
We describe an approach to implementing a wide-range of concurrency paradigms in high-level (symbolic) programminglanguages. The focus of our discussion is STING, a dialect of Scheme, that supports lightweight threads of control and virtual processors as first-class objects. Given the significant degree to which the behavior of these objects may be customized, we can easily express a variety of concurrency paradigms and linguistic structures within a common framework without loss of efficiency. Unlike parallel systems that rely on operating system services for managing concurrency, STING implements concurrency management entirely in terms of Scheme objects and procedures. It, therefore, permits users to optimize the runtime behavior of their applications without requiring knowledge of the underlying runtime system. This paper concentrates on (a) the implications of the design for building asynchronous concurrency structures, (b) organizing large-scale concurrent computations, and (c) implementing robust programming environments for symbolic computing.
We describe a Pattern-Driven Analysis and design (PDA) method for developing software systems. PDA promotes the use of patterns throughout the different phases of software development from analysis to implementation. ...
详细信息
The past decade of experience has demonstrated that the generic programming methodology is highly effective for the design, implementation, and use of large-scale software libraries. The fundamental principle of gener...
详细信息
ISBN:
(纸本)3540291385
The past decade of experience has demonstrated that the generic programming methodology is highly effective for the design, implementation, and use of large-scale software libraries. The fundamental principle of generic programming is the realization of interfaces for entire sets of components, based on their essential syntactic and semantic requirements, rather than for any particular components. Many programminglanguages have features for describing interfaces between software components, but none completely support the approach used in generic programming. We have recently developed G, a languagedesigned to provide first-class language support for generic programming and large-scale libraries. In this paper, we present an overview of g and analyze the interdependence between language features and library design in light of a complete implementation of the Standard Template Library using G. In addition, we discuss important issues related to modularity and encapsulation in large-scale libraries and how language support for validation of components in isolation can prevent many common problems in component integration.
We describe a framework for adding type qualifiers to a language. Type qualifiers encode a simple but highly useful form of subtyping. Our framework extends standard type rules to model the flow of qualifiers through ...
详细信息
We describe a framework for adding type qualifiers to a language. Type qualifiers encode a simple but highly useful form of subtyping. Our framework extends standard type rules to model the flow of qualifiers through a program, where each qualifier or set of qualifiers comes with additional rules that capture its semantics. Our framework allows types to be polymorphic in the type qualifiers. We present a const-inference system for C as an example application of the framework. We show that for a set of real C programs, many more consts can be used than are actually present in the original code.
Empirical program optimizers estimate the values of key optimization parameters by generating different program versions and running them on the actual hardware to determine which values give the best performance. In ...
详细信息
ISBN:
(纸本)9781581136623
Empirical program optimizers estimate the values of key optimization parameters by generating different program versions and running them on the actual hardware to determine which values give the best performance. In contrast, conventional compilers use models of programs and machines to choose these parameters. It is widely believed that model-driven optimization does not compete with empirical optimization, but few quantitative comparisons have been done to date. To make such a comparison, we replaced the empirical optimization engine in ATLAS (a system for generating a dense numerical linear algebra library called the BLAS) with a model-driven optimization engine that used detailed models to estimate values for optimization parameters, and then measured the relative performance of the two systems on three different hardware platforms. Our experiments show that model-driven optimization can be surprisingly effective, and can generate code whose performance is comparable to that of code generated by empirical optimizers for the BLAS.
Some modern superscalar microprocessors provide only imprecise exceptions. That is, they do not guarantee to report the same exception that would be encountered by a straightforward sequential execution of the program...
详细信息
Some modern superscalar microprocessors provide only imprecise exceptions. That is, they do not guarantee to report the same exception that would be encountered by a straightforward sequential execution of the program. In exchange, they offer increased performance or decreased chip area (which amount to much the same thing). This performance/precision tradeoff has not so far been much explored at the programminglanguage level. In this paper we propose a design for imprecise exceptions in the lazy functional programminglanguage Haskell. We discuss several designs, and conclude that imprecision is essential if the language is still to enjoy its current rich algebra of transformations. We sketch a precise semantics for the language extended with exceptions. The paper shows how to extend Haskell with exceptions without crippling the language or its compilers. We do not yet have enough experience of using the new mechanism to know whether it strikes an appropriate balance between expressiveness and performance.
Ruby-an Object-Oriented scripting language-is used world-wide because of its ease of use. However, the current interpreter is slow. To solve this problem, some virtual machines were developed, but none with adequate p...
详细信息
The challenge of exploiting high degrees of instruction-level parallelism is often hampered by frequent branching. Both exposed branch latency and low branch throughput can restrict parallelism. Control critical path ...
详细信息
The challenge of exploiting high degrees of instruction-level parallelism is often hampered by frequent branching. Both exposed branch latency and low branch throughput can restrict parallelism. Control critical path reduction (control CPR) is a compilation technique to address these problems. Control CPR can reduce the dependence height of critical paths through branch operations as well as decrease the number of executed branches. In this paper, we present an approach to control CPR that recognizes sequences of branches using profiling statistics. The control CPR transformation is applied to the predominant path through this sequence. Our approach, its implementation, and experimental results are presented. This work demonstrates that control CPR enhances instruction-level parallelism for a variety of application programs and improves their performance across a range of processors.
暂无评论