This article present a technique to simplify Remote Procedure Calls (RPC) in 32-bit Windows 95/NT applications. The program demonstrate how to build simple RPC clients and servers using Microsoft Visual C++ 4.x. The p...
详细信息
This article present a technique to simplify Remote Procedure Calls (RPC) in 32-bit Windows 95/NT applications. The program demonstrate how to build simple RPC clients and servers using Microsoft Visual C++ 4.x. The programs should compile with other compilers or previous 32-bit versions of Visual C++ as well, but will need the Win32 SDK to obtain the RPC development tools.
This paper describes several loop splitting methods for exploiting parallelism from single loops, and also proposes a generalized and optimal loop transformation technique for exploiting parallelism from single loops ...
详细信息
This paper describes several loop splitting methods for exploiting parallelism from single loops, and also proposes a generalized and optimal loop transformation technique for exploiting parallelism from single loops with nonuniform dependencies. The proposed algorithm is based on partitioning a serial loop by using the size of dependence distance such that it varies between different instances of the dependence. It outperforms the two methods proposed in Polychronopoulos.
An unsteady computer model controlled by data flow is proposed. It allows the estimation of execution time of individual program, required number of processors, and device loading. The concept of A-convergency for asy...
详细信息
An unsteady computer model controlled by data flow is proposed. It allows the estimation of execution time of individual program, required number of processors, and device loading. The concept of A-convergency for asymptotics of the ordinary differential equations simulating unsteady condition, is used to determine the successful completion of computational process. As a result of numerical experiments, the behaviour features of computer system with different relations between the processor speed and required storage capacity are revealed.
We propose reference distance as a metric for data locality. Reference distance is the number of referenced memory blocks between two successive references to the same memory block. Effectiveness of program transforma...
详细信息
We propose reference distance as a metric for data locality. Reference distance is the number of referenced memory blocks between two successive references to the same memory block. Effectiveness of program transformations for data locality can be evaluated by using reference distance. Our claim is demonstrated by showing the change in data locality and speedup of matrix multiplication due to loop interchange, loop tiling, and loop unrolling. We also present a result of an experiment using loop distribution and unrolling with respect to two subroutines of Perfect benchmark programs. Reference distance would serve as a useful tool for developing program transformation scheme for data locality optimization.
High Performance Fortran (HPF) was developed to support data parallel programming for single-instruction multiple-data (SIMD) and multiple-instruction multiple-data (MIMD) machines with distributed memory. The program...
详细信息
High Performance Fortran (HPF) was developed to support data parallel programming for single-instruction multiple-data (SIMD) and multiple-instruction multiple-data (MIMD) machines with distributed memory. The programmer is provided a familiar uniform logical address space and specifies the data distribution by directives. The compiler then exploits these directives to allocate arrays in the local memories, to assign computations to elementary processors, and to migrate data between processors when required. We show here that linear algebra is a powerful framework to encode HPF directives and to synthesize distributed code with space-efficient array allocation, tight loop bounds, and vectorized communications for INDEPENDENT loops. The generated code includes traditional optimizations such as guard elimination, message vectorization and aggregation, and overlap analysis. The systematic use of an affine framework makes it possible to prove the compilation scheme correct.
We present a dataflow language for communicating process networks programming. We define a small-step reduction semantic that corresponds closely to our operational intuitions about communicating process networks. Bec...
详细信息
We present a dataflow language for communicating process networks programming. We define a small-step reduction semantic that corresponds closely to our operational intuitions about communicating process networks. Because it is able to describe infinite streams of data, function applications are evaluated in a lazy way and applications annotated with the symbol are evaluated in an angelic `parallel' way. We do this by assigning to each branch a finite amount of evaluation resources, allocated by a fair scheduler. This operational semantic is asynchronous and parallel instead of lazy and sequential as in Lucid or synchronous and parallel as in Lustre. One may then imagine that each branch of a parallel expression is associated to a communicating interpreter. This method avoids the main drawbacks of parallel implementation of the usual functional languages, automatic detection of too small parallel tasks, and evaluation of data that do not satisfy the property of locality.
This paper considers the programming of passive (cold and warm standbys) and active replicated systems in Ada 95. We show that it is relatively easy to develop systems which act as standbys using the facilities provid...
详细信息
This paper considers the programming of passive (cold and warm standbys) and active replicated systems in Ada 95. We show that it is relatively easy to develop systems which act as standbys using the facilities provided by the language and the Distributed systems Annex. Arguably, active replication in Ada 95 can be supported in a manner which is transparent to the application. However, this is implementation-dependent, requires a complex distributed consensus algorithm (or a carefully chosen subset of the language to be used) and has little flexibility. We therefore consider two extensions to the Distributed systems Annex to help give the application programmer more control. The first is a via a new categorization pragma which specifies that a RCI package can be replicated in more than one partition. The second is through the introduction of a coordinated type which has a single primitive operation. Objects which are created from extensions to coordinated types can be freely replicated across the distributed system. When the primitive operation is called, the call is posted to all sites where a replica resides, effectively providing a broadcast (multicast) facility. We also consider extensions to the partition communication subsystem which implement these new features.
The article describes a pattern language, which helps select synchronization primitives for parallel computer programs, avoiding primitives that interact with a given program's locking design. A lock-based paralle...
详细信息
The article describes a pattern language, which helps select synchronization primitives for parallel computer programs, avoiding primitives that interact with a given program's locking design. A lock-based parallel program uses synchronization primitives to guard critical sections of code in which only one CPU or thread may execute concurrently. A poor choice of locking primitive can result in excessive overhead and poor performance under heavy load. The pattern language in this article helps select one of a few straightforward test-and-set, queued, and reader/writer locks. These locks handle most situations. The article presents the implementation-level counterpart to a locking design pattern language. Although design and implementation are often treated as separate activities, they are almost always deeply intertwined. The test-and-set lock and the queued lock provide simple mutual exclusion. The two types of queued lock tolerate extreme contention without imposing excessive memory bandwidth loads on the system bus. The three types of reader/writer lock allow readers to proceed in parallel. The distributed reader/writer lock operates efficiently in the face of high read-side contention and fine-grained parallelism.
A notion of type assignment on Curryfied Term Rewriting systems is introduced that uses Intersection Types of Rank 2, and in which all function symbols are assumed to have a type. Type assignment will consist of speci...
详细信息
A notion of type assignment on Curryfied Term Rewriting systems is introduced that uses Intersection Types of Rank 2, and in which all function symbols are assumed to have a type. Type assignment will consist of specifying derivation rules that describe how types can be assigned to terms, using the types of function symbols. Using a modified unification procedure, for each term the principal pair (of basis and type) will be defined in the following sense: from these all admissible pairs can be generated by chains of operations on pairs, consisting of the operations substitution, copying, and weakening. In general, given an arbitrary typeable qqTRS, the subject reduction property does not hold. Using the principal type for the left-hand side of a rewrite rule, a sufficient and decidable condition will be formulated that typeable rewrite rules should satisfy in order to obtain this property.
Deriving formal specifications from informal requirements is extremely difficult since one has to overcome the conceptual gap between an application domain and the domain of formal specification methods. To reduce thi...
详细信息
Deriving formal specifications from informal requirements is extremely difficult since one has to overcome the conceptual gap between an application domain and the domain of formal specification methods. To reduce this gap we introduce application-specific specification languages, i.e., graphical and textual notations that can be unambiguously mapped to formal specifications in a logic language. We describe a number of realised approaches based on this idea, and evaluate them with respect to their domain specificity vs. generality.
暂无评论