The proceedings contain 28 papers. The topics discussed include: efficient building and placing of gating functions;avoiding conditional branches by code replication;accurate static branch prediction by value range pr...
ISBN:
(纸本)0897916972
The proceedings contain 28 papers. The topics discussed include: efficient building and placing of gating functions;avoiding conditional branches by code replication;accurate static branch prediction by value range propagation;improving balanced scheduling with compiler optimizations that increase instruction-level parallelism;selective specialization for object-oriented languages;corpus-based static branch prediction;flow-sensitive interprocedural constant propagation;efficient context-sensitive pointer analysis for c programs;simple and effective link-time optimization of modula-3 program;APT: a data structure for optimal control dependence computation;implementation of the data-flow synchronous language signal;scheduling and mapping: software pipelining in the presence of structural hazards;register allocation using lazy saves, eager restores, and greedy shuffling;context-insensitive alias analysis reconsidered;a type-based compiler for standard ML;unifying data and control transformations for distributed shared-memory machines;storage assignment to decrease code size;optimizing parallel programs with explicit synchronization;the LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization;and the power of assignment motion.
Supporting both task and data parallelism in one programming system is useful, since many applications need both types of parallelism. We present a programming model that integrates task and data parallelism using sha...
详细信息
Supporting both task and data parallelism in one programming system is useful, since many applications need both types of parallelism. We present a programming model that integrates task and data parallelism using shared objects. The model is a generalization of shared objects in Orca. Orca is a task parallel language that uses shared objects for communication between processes and for storing shared (possibly replicated) data. Our new model also uses shared objects for partitioning of shared data and for distribution of work in a data parallel way. Data parallelism is introduced by executing operations on a partitioned object in parallel. The paper describes the design of the new model, its implementation, and its usage for parallel applications that use mixed task and data parallelism.
We present an object-oriented design model and its mapping to Ada 95 in order to build Active Information Systems. This represents part of a methodology based, for each step of the software life cycle, on a model and ...
详细信息
While much recent research has focussed on extending databases beyond the traditional relational model, relatively little has been done to develop database tools for querying data organized in (multidimensional) array...
详细信息
While much recent research has focussed on extending databases beyond the traditional relational model, relatively little has been done to develop database tools for querying data organized in (multidimensional) arrays. The scientific computing community has made little use of available database technology. Instead, multidimensional scientific data is typically stored in local files conforming to various data exchange formats and queried via specialized access libraries tied in to general purpose programminglanguages. To allow such data to be queried using known database techniques, we design and implement a query language for multidimensional arrays. Our main design decision is to treat arrays as functions from index sets to values rather than as collection types. This leads to clean syntax and semantics as well as simple but powerful optimization rules. We present a calculus for arrays that extends standard calculi for complex objects. We derive a higher-level comprehension style query language based on this calculus and describe its implementation, including a data driver for the NetCDF data exchange format. Next, we explore some optimization rules obtained from the equational laws of our core calculus. Finally, we study the expressiveness of our calculus and prove that it essentially corresponds to adding ranking to a query language for complex objects.
The implementation of a safety-critical real-time embedded cxccutive requires the application ofsoftwarc engineering principles that result in a safe. efficient, modular product. How. specifically. does one obtain suc...
详细信息
GUM is a portable, parallel implementation of the Haskell functional language. Despite sustained research interest in parallel functional programming, GUM is one of the first such systems to be made publicly *** is me...
ISBN:
(纸本)9780897917957
GUM is a portable, parallel implementation of the Haskell functional language. Despite sustained research interest in parallel functional programming, GUM is one of the first such systems to be made publicly *** is message-based, and portability is facilitated by using the PVM communications harness that is available on many multi-processors. As a result, GUM is available for both shared-memory (Sun SPARCserver multiprocessors) and distributed-memory (networks of workstations) architectures. The high message-latency of distributed machines is ameliorated by sending messages asynchronously, and by sending large packets of related data in each *** performance figures demonstrate absolute speedups relative to the best sequential compiler technology. To improve the performance of a parallel Haskell program GUM provides tools for monitoring and visualising the behaviour of threads and of processors during execution.
Recent shared-memory parallel computer systems offer the exciting possibility of customizing memory coherence protocols to fit an application's semantics and sharing patterns. Custom protocols have been used to ac...
ISBN:
(纸本)9780897917957
Recent shared-memory parallel computer systems offer the exciting possibility of customizing memory coherence protocols to fit an application's semantics and sharing patterns. Custom protocols have been used to achieve message-passing performance---while retaining the convenient programming model of a global address space---and to implement high-level language constructs. Unfortunately, coherence protocols written in a conventional language such as C are difficult to write, debug, understand, or modify. This paper describes Teapot, a small, domain-specific language for writing coherence protocols. Teapot uses continuations to help reduce the complexity of writing protocols. Simple static analysis in the Teapot compiler eliminates much of the overhead of continuations and results in protocols that run nearly as fast as hand-written C code. A Teapot specification can be compiled both to an executable coherence protocol and to input for a model checking system, which permits the specification to be verified. We report our experiences coding and verifying several protocols written in Teapot, along with measurements of the overhead incurred by writing a protocol in a higher-level language.
This paper describes the distributed memory implementation of a shared memory parallel functional language. The language is Id, an implicitly parallel, mostly functional language that is currently evolving into a dial...
详细信息
ISBN:
(纸本)9780897917704
This paper describes the distributed memory implementation of a shared memory parallel functional language. The language is Id, an implicitly parallel, mostly functional language that is currently evolving into a dialect of Haskell. The target is a distributed memory machine, because we expect these to be the most widely available parallel platforms in the future. The difficult problem is to bridge the gap between the shared memory language model and the distributed memory machine model. The language model assumes that all data is uniformly accessible, whereas the machine has a severe memory hierarchy: a processor's access to remote memory (using explicit communication) is orders of magnitude slower than its access to local memory. Thus, avoiding communication is crucial for good performance. The Id language, and its general dataflow-inspierd compilation to multithreaded code are described elsewhere. In this paper, we focus on our new parallel runtime system and its features for avoiding communication and for tolerating its latency when necessary: multithreading, scheduling and load balancing; the distributed heap model and distributed coherent cacheing, and parallel garbage collection. We have completed the first implementation, and we present some preliminary performance mearsurements.
Many analysis problems can be cast in the form of evaluating minimal models of a logic program. Although such formulations are appealing due to their simplicity and declarativeness, they have not been widely used in p...
ISBN:
(纸本)9780897917957
Many analysis problems can be cast in the form of evaluating minimal models of a logic program. Although such formulations are appealing due to their simplicity and declarativeness, they have not been widely used in practice because, either existing logic programming systems do not guarantee completeness, or those that do have been viewed as too inefficient for integration into a compiler. The objective of this paper is to re-examine this issue in the context of recent advances in implementation technologies of logic programming *** find that such declarative formulations can indeed be used in practical systems, when combined with the appropriate tool for evaluation. We use existing formulations of analysis problems --- groundness analysis of logic programs, and strictness analysis of functional programs --- in this case study, and the XSB system, a table-based logic programming system, as the evaluation tool of choice. We give experimental evidence that the resultant groundness and strictness analysis systems are practical in terms of both time and space. In terms of implementation effort, the analyzers took less than 2 man-weeks (in total), to develop, optimize and evaluate. The analyzer itself consists of about 100 lines of tabled Prolog code and the entire system, including the components to read and preprocess input programs and to collect the analysis results, consists of about 500 lines of code.
The implementation of dynamically bound object oriented pro-gramming languages require the ability to quickly bind a logical reference to a method and to quickly allocate a context for the in-voked method. In this pap...
详细信息
暂无评论