Interactive program steering is a promising technique for improving the performance of parallel and distributedapplications. Steering decisions are typically based on visual presentations of some subset of the comput...
详细信息
ISBN:
(纸本)0818677937
Interactive program steering is a promising technique for improving the performance of parallel and distributedapplications. Steering decisions are typically based on visual presentations of some subset of the computation's current state, a historical view of the computation's behavior, or views of metrics based on the program's performance. As in any endeavor good decisions require accurate information. However the distributed nature of the collection process may result in distortions in the portrayal of the program's execution. these distortions stem from the merging of streams of information from distributed collection points into a single stream without enforcing the ordering relationships that held among the program components that produced the information. An ordering filter placed at the point at which the streams are merged can ensure a valid ordering, leading to more accurate visualizations and better informed steering decisions. In this paper we describe the implementation of such filters in the Falcon interactive steering toolkit, and present a methodology for their specification for automated generation.
this paper describes a framework for providing the ability to use multiple specialized data parallel libraries and/or languages within a single application. the ability to use multiple libraries is required in many ap...
详细信息
ISBN:
(纸本)0818677937
this paper describes a framework for providing the ability to use multiple specialized data parallel libraries and/or languages within a single application. the ability to use multiple libraries is required in many application areas, such as multidisciplinary complex physical simulations and remote sensing image database applications. An application can consist of one program or multiple programs that use different libraries to parallelize operations on distributed data structures. the framework is embodied in a runtime library called Meta-Chaos that has been used to exchange data between data parallel programs written using High Performance Fortran, the Chaos and Multiblock Parti libraries developed at Maryland for handling various types of unstructured problems, and the runtime library for pC++, a data parallel version of C++ from Indiana University. Experimental results show that Meta-Chaos is able to move data between, libraries efficiently, and that Meta-Chaos provides effective support for complex applications.
High complexity of building parallelapplications is often cited as one of the major impediments to the mainstream adoption of parallel computing, To deal withthe complexity of software development, abstractions such...
详细信息
ISBN:
(纸本)0818681187
High complexity of building parallelapplications is often cited as one of the major impediments to the mainstream adoption of parallel computing, To deal withthe complexity of software development, abstractions such as macros, functions, abstract data types, and objects are commonly employed by sequential as well as parallel programming models. this paper describes the concept of a design pattern for the development of parallelapplications. A design pattern in our case describes a recurring parallel programming problem and a reusable solution to that problem. A design pattern is implemented as a reusable code skeleton for quick and reliable development of parallelapplications. A parallel programming system, called DPnDP (Design Patterns and distributed Processes), that employs such design patterns is described. In the past, parallel programming systems have allowed fast prototyping of parallelapplications based on commonly occurring communication and synchronization structures. the uniqueness of our approach a's in the use of a standard structure and interface for a design pattern. this has several important Implications: First, design patterns can be defined and added to the system's library in an incremental manner without requiring any major modification of the system (Extensibility). Second, customization of a parallel application is possible by mixing design patterns with low level parallel code resulting in a flexible and efficient parallel programming toot (Flexibility). Also, a parallel design pattern can be parameterized to provide some variations in terms of structure and behavior.
Decision support systems use On-Line Analytical processing (OLAP) to analyze data by posing complex queries that require different views of data. Traditionally, a relational approach (ROLAP) has been taken to build su...
详细信息
ISBN:
(纸本)0818680679
Decision support systems use On-Line Analytical processing (OLAP) to analyze data by posing complex queries that require different views of data. Traditionally, a relational approach (ROLAP) has been taken to build such systems. More recently, multi-dimensional database techniques (MOLAP) have been applied to decision-support applications. Data is stored in multidimensional arrays which is a natural way to express the multi-dimensionality of the enterprise and is more suited for analysis. Precomputed aggregate calculations in a Data cube can provide efficient query processing for OLAP applications. In this paper we present algorithms and results for in-memory data cube construction on distributed memory machines.
An important development in cluster computing is the availability of multiprocessor workstations. these are able to provide additional computational power to the cluster without increasing network overhead, and allow ...
详细信息
ISBN:
(纸本)0818681187
An important development in cluster computing is the availability of multiprocessor workstations. these are able to provide additional computational power to the cluster without increasing network overhead, and allow multiparadigm parallelism, which we define to be the simultaneous application of bothdistributed and shared memory parallelprocessing techniques to a single problem. In this paper we compare execution times and speedup of parallel programs written in a pure message-passing paradigm withthose that combine message passing and shared-memory primitives in the same application. We consider three basic applicationsthat are common building blocks for many scientific and engineering problems: numerical integration, matrix multiplication and Jacobi iteration. Our results indicate that the added complexity of combining shared- and distributed-memory programming methods in the same program doe snot contribute sufficiently to performance to justify the added programming complexity.
Speed of computation, important in real-time applications, can be improved by reducing the number of multiplications and/or by parallelprocessing. In this paper, we are proposing a paralleldistributed model to imple...
详细信息
Speed of computation, important in real-time applications, can be improved by reducing the number of multiplications and/or by parallelprocessing. In this paper, we are proposing a paralleldistributed model to implement 6×6 - point DFT. this scheme of computation has the advantage that it is based-on-parallel and distributed scheme in which each operation is a simple real addition except at the last layer, where it will be converted to the complex form. Since all the computations in a layer can be done in parallel, the time taken to compute one cell value will be same as the time required for that layer. Hence the speed of computation will be very high. Due to the hierarchical nature of the model, the memory required will be less. the advantage will be more as the order N increases. this can be easily modified to implement the 2-D DFT for any even value of N.
the proceedings contain 23 papers. the special focus in this conference is on Discrete Algorithms, Programming Environments and Implementations. the topics include: parallel mesh generation;efficient massively paralle...
ISBN:
(纸本)3540631380
the proceedings contain 23 papers. the special focus in this conference is on Discrete Algorithms, Programming Environments and Implementations. the topics include: parallel mesh generation;efficient massively parallel quicksort;practical parallel list ranking;on computing all maximal cliques distributedly;a probabilistic model for best-first search BandB algorithms;programming irregular parallelapplications in cilk;a variant of the biconjugate gradient method suitable for massively parallel computing;efficient implementation of the improved quasi-minimal residual method on massively distributed memory computers;programming with shared data abstractions;supporting run-time parallelization of DO-ACROSS loops on general networks of workstations;engineering diffusive load balancing algorithms using experiments;comparative study of static scheduling with task duplication for distributed systems;a new approximation algorithm for the register allocation problem;improving cache performance through tiling and data alignment;a support for non-uniform parallel loops and its application to a flame simulation code;performance otimization of combined variable-cost computations and I/O;parallel software caches;communication efficient parallel searching;parallel sparse cholesky factorization and unstructured graph partitioning for sparse linear system solving.
Software distributed shared memory (DSM) techniques, while effective on applications with coarse-grained sharing, yield poor performance for the fine-grained sharing encountered in applications increasingly relying on...
详细信息
Software distributed shared memory (DSM) techniques, while effective on applications with coarse-grained sharing, yield poor performance for the fine-grained sharing encountered in applications increasingly relying on sophisticated adaptive and hierarchical algorithms. Such applications exhibit irregular communication patterns unsynchronized with computation, incurring large overheads for synchronous (request-reply) DSM protocols that require responsive processing of coherence messages. We describe a new DSM framework, View Caching, that addresses this problem by utilizing application knowledge of data access semantics to enable the construction of low-overhead, asynchronous coherence protocols. Experiments on the Cray T3D show that view caching enables efficient execution of fine-grained irregular applications, reducing both coherence overheads and idle time to improve performance by up to 35% over a weakly-consistent DSM implementation.
the MILLIPEDE system is a small yet powerful interface of a Virtual parallel Machine (VPM) on top of distributed computing environments. MILLIPEDE is thus a convenient environment for porting various existing parallel...
详细信息
ISBN:
(纸本)0818678836
the MILLIPEDE system is a small yet powerful interface of a Virtual parallel Machine (VPM) on top of distributed computing environments. MILLIPEDE is thus a convenient environment for porting various existing parallel programming languages,for the design of new parallel programming languages, and for the development of parallelapplications. MILLIPEDE is fully implemented at the Technion on a cluster of PCs running Windows-NT.(1) In this paper we briefly, describe the MILLIPEDE interface and discuss the implementation issues of several parallel languages.
Current parallelizing compilers for message-passing machines only support a limited class of data-parallelapplications. One method for eliminating this restriction is to combine powerful shared-memory parallelizing c...
详细信息
Current parallelizing compilers for message-passing machines only support a limited class of data-parallelapplications. One method for eliminating this restriction is to combine powerful shared-memory parallelizing compilers with software distributed-shared-memory (DSM) systems. We demonstrate such a system by combining the SUIF parallelizing compiler and the CVM software DSM. Innovations of the system include compiler-directed techniques that: 1) combine synchronization and parallelism information communication on parallel task invocation, 2) employ customized routines for evaluating reduction operations, and 3) select a hybrid update protocol that pre-sends data by flushing updates at barriers. For applications with sufficient granularity of parallelism, these optimizations yield very good eight processor speedups on an IBM SP-2 and DEC Alpha cluster, usually matching or exceeding the speedup of equivalent HPF and message-passing versions of each program. Flushing updates, in particular, eliminates almost all nonlocal memory misses and improves performance by 13% on average.
暂无评论