Some applications are most easily expressed in a programminglanguage that supports concurrency, notably interactive and distributed systems. We propose extensions to the purely-functional language Haskell that allow ...
详细信息
Some applications are most easily expressed in a programminglanguage that supports concurrency, notably interactive and distributed systems. We propose extensions to the purely-functional language Haskell that allow it to express explicitly concurrent applications;we call the resulting language Concurrent Haskell. The resulting system appears to be both expressive and efficient, and we give a number of examples of useful abstractions that can be built from our primitives. We have developed a freely-available implementation of Concurrent Haskell, and are now using it as a substrate for a graphical user interface toolkit.
In this paper we prove time and space bounds for the implementation of the programminglanguage NESL on various parallel machine models. NESL is a sugared typed λ-calculus with a set of array primitives and an explic...
详细信息
In this paper we prove time and space bounds for the implementation of the programminglanguage NESL on various parallel machine models. NESL is a sugared typed λ-calculus with a set of array primitives and an explicit parallel map over arrays. Our results extend previous work on provable implementation bounds for functional languages by considering space and by including arrays. For modeling the cost of NESL we augment a standard call-by-value operational semantics to return two cost measures: a DAG representing the sequential dependences in the computation, and a measure of the space taken by a sequential implementation. We show that a NESL program with w work (nodes in the DAG), d depth (levels in the DAG), and s sequential space can be implemented on a p processor butterfly network, hypercube, or CRCW PRAM using O(w/p+d log p) time and O(s+dp log p) reachable space. For programs with sufficient parallelism these bounds are optimal in that they give linear speedup and use space within a constant factor of the sequential space.
Speculative evaluation, including leniency and futures, is often used to produce high degrees of parallelism. Existing speculative implementations, however, may serialize computation because of their implementation of...
详细信息
Speculative evaluation, including leniency and futures, is often used to produce high degrees of parallelism. Existing speculative implementations, however, may serialize computation because of their implementation of queues of suspended threads. We give a provably efficient parallel implementation of a speculative functional language on various machine models. The implementation includes proper parallelization of the necessary queuing operations on suspended threads. Our target machine models are a butterfly network, hypercube, and PRAM. To prove the efficiency of our implementation, we provide a cost model using a profiling semantics and relate the cost model to implementations on the parallel machine models.
The proceedings contain 28 papers. The topics discussed include: efficient building and placing of gating functions;avoiding conditional branches by code replication;accurate static branch prediction by value range pr...
ISBN:
(纸本)0897916972
The proceedings contain 28 papers. The topics discussed include: efficient building and placing of gating functions;avoiding conditional branches by code replication;accurate static branch prediction by value range propagation;improving balanced scheduling with compiler optimizations that increase instruction-level parallelism;selective specialization for object-oriented languages;corpus-based static branch prediction;flow-sensitive interprocedural constant propagation;efficient context-sensitive pointer analysis for c programs;simple and effective link-time optimization of modula-3 program;APT: a data structure for optimal control dependence computation;implementation of the data-flow synchronous language signal;scheduling and mapping: software pipelining in the presence of structural hazards;register allocation using lazy saves, eager restores, and greedy shuffling;context-insensitive alias analysis reconsidered;a type-based compiler for standard ML;unifying data and control transformations for distributed shared-memory machines;storage assignment to decrease code size;optimizing parallel programs with explicit synchronization;the LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization;and the power of assignment motion.
GUM is a portable, parallel implementation of the Haskell functional language. Despite sustained research interest in parallel functional programming, GUM is one of the first such systems to be made publicly *** is me...
ISBN:
(纸本)9780897917957
GUM is a portable, parallel implementation of the Haskell functional language. Despite sustained research interest in parallel functional programming, GUM is one of the first such systems to be made publicly *** is message-based, and portability is facilitated by using the PVM communications harness that is available on many multi-processors. As a result, GUM is available for both shared-memory (Sun SPARCserver multiprocessors) and distributed-memory (networks of workstations) architectures. The high message-latency of distributed machines is ameliorated by sending messages asynchronously, and by sending large packets of related data in each *** performance figures demonstrate absolute speedups relative to the best sequential compiler technology. To improve the performance of a parallel Haskell program GUM provides tools for monitoring and visualising the behaviour of threads and of processors during execution.
Recent shared-memory parallel computer systems offer the exciting possibility of customizing memory coherence protocols to fit an application's semantics and sharing patterns. Custom protocols have been used to ac...
ISBN:
(纸本)9780897917957
Recent shared-memory parallel computer systems offer the exciting possibility of customizing memory coherence protocols to fit an application's semantics and sharing patterns. Custom protocols have been used to achieve message-passing performance---while retaining the convenient programming model of a global address space---and to implement high-level language constructs. Unfortunately, coherence protocols written in a conventional language such as C are difficult to write, debug, understand, or modify. This paper describes Teapot, a small, domain-specific language for writing coherence protocols. Teapot uses continuations to help reduce the complexity of writing protocols. Simple static analysis in the Teapot compiler eliminates much of the overhead of continuations and results in protocols that run nearly as fast as hand-written C code. A Teapot specification can be compiled both to an executable coherence protocol and to input for a model checking system, which permits the specification to be verified. We report our experiences coding and verifying several protocols written in Teapot, along with measurements of the overhead incurred by writing a protocol in a higher-level language.
This paper describes the distributed memory implementation of a shared memory parallel functional language. The language is Id, an implicitly parallel, mostly functional language that is currently evolving into a dial...
详细信息
ISBN:
(纸本)9780897917704
This paper describes the distributed memory implementation of a shared memory parallel functional language. The language is Id, an implicitly parallel, mostly functional language that is currently evolving into a dialect of Haskell. The target is a distributed memory machine, because we expect these to be the most widely available parallel platforms in the future. The difficult problem is to bridge the gap between the shared memory language model and the distributed memory machine model. The language model assumes that all data is uniformly accessible, whereas the machine has a severe memory hierarchy: a processor's access to remote memory (using explicit communication) is orders of magnitude slower than its access to local memory. Thus, avoiding communication is crucial for good performance. The Id language, and its general dataflow-inspierd compilation to multithreaded code are described elsewhere. In this paper, we focus on our new parallel runtime system and its features for avoiding communication and for tolerating its latency when necessary: multithreading, scheduling and load balancing; the distributed heap model and distributed coherent cacheing, and parallel garbage collection. We have completed the first implementation, and we present some preliminary performance mearsurements.
Many analysis problems can be cast in the form of evaluating minimal models of a logic program. Although such formulations are appealing due to their simplicity and declarativeness, they have not been widely used in p...
ISBN:
(纸本)9780897917957
Many analysis problems can be cast in the form of evaluating minimal models of a logic program. Although such formulations are appealing due to their simplicity and declarativeness, they have not been widely used in practice because, either existing logic programming systems do not guarantee completeness, or those that do have been viewed as too inefficient for integration into a compiler. The objective of this paper is to re-examine this issue in the context of recent advances in implementation technologies of logic programming *** find that such declarative formulations can indeed be used in practical systems, when combined with the appropriate tool for evaluation. We use existing formulations of analysis problems --- groundness analysis of logic programs, and strictness analysis of functional programs --- in this case study, and the XSB system, a table-based logic programming system, as the evaluation tool of choice. We give experimental evidence that the resultant groundness and strictness analysis systems are practical in terms of both time and space. In terms of implementation effort, the analyzers took less than 2 man-weeks (in total), to develop, optimize and evaluate. The analyzer itself consists of about 100 lines of tabled Prolog code and the entire system, including the components to read and preprocess input programs and to collect the analysis results, consists of about 500 lines of code.
This paper presents the techniques used for the compilation of the data-flow, synchronous language SIGNAL. The key feature of the compiler is that it performs formal calculus on systems of boolean equations. The origi...
详细信息
Recent work on alias analysis in the presence of pointers has concentrated on context-sensitive interprocedural analyses, which treat multiple calls to a single procedure independently rather than constructing a singl...
详细信息
ISBN:
(纸本)9780897916974
Recent work on alias analysis in the presence of pointers has concentrated on context-sensitive interprocedural analyses, which treat multiple calls to a single procedure independently rather than constructing a single approximation to a procedure's effect on all of its callers. While context-sensitive modeling offers the potential for greater precision by considering only realizable call-return paths, its empirical benefits have yet to be measured. This paper compares the precision of a simple, efficient, context-insensitive points-to analysis for the C programminglanguage with that of a maximally context-sensitive version of the same analysis. We demonstrate that, for a number of pointer-intensive benchmark programs, context-insensitivity exerts little to no precision penalty. We also describe techniques for using the output of context-insensitive analysis to improve the efficiency of context-sensitive analysis without affecting precision.
暂无评论