We propose a high-level approach to program distributedapplications;it is based on the annotation future by which the programmer specifies which expressions may be evaluated remotely in parallel. We present the CEKDS...
详细信息
Blocking locks are commonly used in parallel programs to improve application performance and system throughput. However, most implementations of such locks suffer from two major problems - latency and scalability. In ...
详细信息
ISBN:
(纸本)0818672552
Blocking locks are commonly used in parallel programs to improve application performance and system throughput. However, most implementations of such locks suffer from two major problems - latency and scalability. In this paper, we propose an implementation of blocking locks using scheduler adaptation which exploits the interaction between thread schedulers and locks. By experimentation using well-known multiprocessor applications on a KSR2 multiprocessor, we demonstrate how such an implementation considerably reduces the latency and improves the scalability of blocking locks.
The number of parallel anddistributed programming languages available is enormous. This means that the user faces a difficult task in selecting the language which suits his/her application. Most of the facilities sup...
详细信息
The number of parallel anddistributed programming languages available is enormous. This means that the user faces a difficult task in selecting the language which suits his/her application. Most of the facilities supported by new languages could be, however, introduced as simple extensions of existing languages. In object-oriented languages where the concepts of reusability and extensibility are natural parts of the language, extensions can easily take place by creating frameworks which provide the abstraction level required by the user. In this paper, we present a framework that supports the dynamic reconfiguration of processes. C++CL is a framework developed in C++ and layered above PVM. This means that dynamic reconfigurable systems are easily built in a C++ fashion following the CL approach. At the same time, by using PVM, we improve the portability of the framework over heterogeneous networks and can also use all tools available for tracing and debugging PVM parallel processes. The paper then shows how the framework can be extended to provide special classes of programs.
One of the major impediments to the widespread use of large-scale, distributed memory multiprocessors is the difficulty of efficiently partitioning and mapping application algorithms onto these machines so as to extra...
详细信息
One of the major impediments to the widespread use of large-scale, distributed memory multiprocessors is the difficulty of efficiently partitioning and mapping application algorithms onto these machines so as to extract a large portion of the machines' peak performance. In this paper, we present the preliminary accomplishments of an ongoing effort aimed at automating the complex tasks of software partitioning and mapping during the system definition phase of application development for distributed memory multiprocessors. We describe a technique called the Augmented Task Dependency Graph (ATDG) for representing the high-level design of the application software. The ATDG allows one to express functional parallelism as well as data parallelism in a manner that facilitates automated partitioning and mapping. We propose a new strategy for searching through the possible space of design choices for partitioning and mapping. The proposed approach, called hierarchical hybrid search, organizes the search space as a hierarchy of sub-spaces. It permits the use of different search techniques for searching through different search sub-spaces. Examples of search techniques that could be employed in the proposed approach include hill-climbing, simulated annealing, and genetic algorithms.
This paper presents an approach to texture segmentation by thresholding based on compactness measures of fuzzy sets to determine thresholds of an ill-defined image. The extension of fuzziness in the texture feature sp...
详细信息
ISBN:
(纸本)1864352094
This paper presents an approach to texture segmentation by thresholding based on compactness measures of fuzzy sets to determine thresholds of an ill-defined image. The extension of fuzziness in the texture feature space provides more meaningful results than by considering fuzziness in gray scale domain. The effectiveness of the algorithm is demonstrated by comparison with other traditional non-fuzzy methods or the controversial fuzzy method in gray scale alone. In addition, the efficiency of our algorithm is further improved by parallel implementation using distributed shared memory workstations.
The distance transform (DT) and the medial axis transform (MAT) are two important image operations. They are both used to extract of the information about the shape and the position of the foreground pixels relative t...
详细信息
The distance transform (DT) and the medial axis transform (MAT) are two important image operations. They are both used to extract of the information about the shape and the position of the foreground pixels relative to each other. Many applications of these transforms are applied in the fields of image processing and computer vision, such as expanding shrinking, thinning and computing shape factor, etc. Each of these two transforms is essentially a global operation. Unless the digital image is very small, all global operations are prohibitively costly. In order to provide the efficient transform computations, it is considerably desired to develop parallel algorithms for these two operations. In this paper, we provide the fastest parallel algorithms to compute the chessboard distance transform (CDT) which is a DT based on the chessboard metrics, and the medial axis transform (MAT). Each of the transforms of a 2-D binary image array of size N×N can be computed in O(1) time on the 2-D 2N×2N RAP.
PEC (Packed Exponential Connections) is a scalable interconnection network for parallel systems that meets many requirements for a large range of system sizes, i.e. from 16 to over 1,000,000 processors. A scalable net...
详细信息
PEC (Packed Exponential Connections) is a scalable interconnection network for parallel systems that meets many requirements for a large range of system sizes, i.e. from 16 to over 1,000,000 processors. A scalable network architecture must meet the following criteria: have a low average and maximum diameter to avoid communication latency;minimize routing contentions;have a constant number of ports per node and a simple wire layout to allow for expansion;be inherently fault tolerant;be subdividable for disjoint multi-user applications;and be able to handle a large range of algorithm implementations without adding undo overhead. Our initial research has shown that PEC can meet each of the above criteria within acceptable limits. Further investigation is required to confirm these preliminary results.
Isolating computation and communication concerns into separate pure computation and pure coordination modules enhances modularity, understandability, and reusability of parallel and/or distributed software. MANIFOLD i...
详细信息
The discrete cosine transform (DCT) is a key step in many image and video-coding applications, and its efficient implementation has been extensively studied for software implementations and for custom VLSI. In this pa...
详细信息
The discrete cosine transform (DCT) is a key step in many image and video-coding applications, and its efficient implementation has been extensively studied for software implementations and for custom VLSI. In this paper, we discuss how the partial distributed arithmetic algorithm can be applied to the efficient implementation of the DCT in reconfigurable logic circuits which form the core of a custom computer. Early calculations show a performance improvement of about 5 times for DCT based implementations on an FPGA compared to conventional arithmetic.
A performance prediction method is presented, which accurately predicts the expected program execution time on massively parallel systems. We consider distributed-memory architectures with SMD nodes and a fast communi...
详细信息
暂无评论