distributed C++ (DC++) is a language for writing parallel applications on loosely coupled distributed systems in C++. Its key idea is to extend the C++ class into 3 categories: gateway classes which act as communicati...
详细信息
distributed C++ (DC++) is a language for writing parallel applications on loosely coupled distributed systems in C++. Its key idea is to extend the C++ class into 3 categories: gateway classes which act as communication and synchronization entry points between abstract processors, classes whose instances may be passed by value between abstract processors via gateways, and vanilla C++ classes. DC++ code is compiled to C++ code with calls to the DC++ runtime system. the DC++ compiler wraps gateway classes with handle classes so that remote procedure calls are transparent. It adds static variables to value classes and produces code which is used to marshal and unmarshal arguments when these value classes are used in remote procedure calls. Value classes are deep copied and preserve structure sharing. this paper shows DC++ compilation and performance.< >
A new approach for dynamic submesh allocation in mesh-connected multiprocessor system, which supports a multi-user environment, is proposed. the proposed strategy effectively prunes the search space by searching for f...
详细信息
A new approach for dynamic submesh allocation in mesh-connected multiprocessor system, which supports a multi-user environment, is proposed. the proposed strategy effectively prunes the search space by searching for free submeshes on the corners of allocated submeshes along withthe corners of the mesh system. A submesh is selected withthe potential of causing the least amount of fragmentation in the system. the proposed strategy possesses complete submesh recognition capability; it is a best-fit strategy, as well. Existing strategies do not provide this combination of capabilities. the deallocation time and memory overhead is shown to be constant in that it does not grow withthe size of the mesh. Simulation results indicate that the proposed strategy outperforms the existing ones in terms of parameters such as average delay in honoring a request, standard deviation of the delays, average allocation time, average deallocation time, and the amount of memory required.< >
In this paper, we propose a new parallel genetic algorithm (GA), called extended distributed genetic algorithm (EDGA) for channel routing problem. the EDGA combines the advantages of previous parallel GA models, viz.,...
详细信息
In this paper, we propose a new parallel genetic algorithm (GA), called extended distributed genetic algorithm (EDGA) for channel routing problem. the EDGA combines the advantages of previous parallel GA models, viz., master/slave GA model and distributed GA model. In EDGA, the root processor executes the conventional genetic algorithm with global selection on total population and the remaining processors execute conventional genetic algorithm with local selection on subpopulations. After certain number of generations, the total population on the root processor and the subpopulations on the remaining processors are unchanged, and the process is repeated till terminating conditions are reached. this incorporates features of both global and local selection in the proposed EDGA. the EDGA is designed to obtain good speedup, global optimal solution, and full utilization of the parallel system. We have implemented master/slave GA, distributed GA, and the proposed EDGA in C on a transputer-based parallel MIMD machine and compared their performance. It is found that the EDGA achieves higher speedup than both master/slave GA, and distributed GA.< >
this paper presents efficient techniques for identifying opportunities to increase parallelism when the asynchronous remote procedure call model of parallel execution is applied to programs constructed from abstract d...
详细信息
this paper presents efficient techniques for identifying opportunities to increase parallelism when the asynchronous remote procedure call model of parallel execution is applied to programs constructed from abstract data type (ADT) modules. Frequently, an ADT module is needed simultaneously by multiple clients, thus causing contention. To decrease queueing delays to execute module operations, cloning of modules is often useful. the main contribution of this work is a heuristic to efficiently determine, for each module used in a program, an upper bound on the number of clones of that module that can be used concurrently. the results include transformation rules for conditionals, so that the clone analysis algorithm avoids considering an exponential number of paths through the program. Additionally, a technique is presented to determine the number of clones required each time a loop is unrolled. Furthermore, analysis is given to consider the effects of intermodule dependence relationships on the sharing of clones.< >
Because most of the recognizable queries in deductive databases can be transformed into transitive-closure (TC) problem, the development of efficient algorithms to process the different forms of TC problems within the...
详细信息
We present parallel techniques on hypercubes for solving optimally a class of polygon problems. We thus obtain optimal O(log n)-time, n-processor hypercube algorithms for the problems of computing the portions of an n...
详细信息
We present parallel techniques on hypercubes for solving optimally a class of polygon problems. We thus obtain optimal O(log n)-time, n-processor hypercube algorithms for the problems of computing the portions of an n-vertex simple polygonal chain C that are visible from a given source point, computing the convex hull of C, testing an n-vertex simple polygon P for monotonicity, and other related problems as well. Previously it was not known how to achieve these complexity bounds on hypercubes, one of the main difficulties being that there is no known optimal sorting hypercube algorithm that achieves these bounds. In fact these are the first optimal geometric hypercube algorithms that do not assume that the input is given already sorted by x or y coordinates. the hypercube model we use is the standard one, with O(1) local memory per processor, and with one-port communication.< >
Recursive Diagonal Torus (RDT), a class of interconnection network is proposed for massively parallel computers with up to 2/sup 16/ nodes. By adding remote links to the diagonal directions of the torus network recurs...
详细信息
Recursive Diagonal Torus (RDT), a class of interconnection network is proposed for massively parallel computers with up to 2/sup 16/ nodes. By adding remote links to the diagonal directions of the torus network recursively, the RDT can realize a smaller diameter (e.g., it is 11 for 2/sup 16/ nodes) with smaller number of links per node (i.e., 8 links per node) than that of the hypercube. A simple routing algorithm called vector routing, which is near-optima and easy to implement is also proposed. the RDT comprises the mesh structure, and emulates hypercube and tree structures easily. FFT and the bitonic sorting algorithm are also easy to implement.< >
A variation of the hypercube, called the incomplete hypercube, allows any sized construction, providing for better incremental flexibility. In this paper, traffic density over links in an incomplete hypercube with arb...
详细信息
A variation of the hypercube, called the incomplete hypercube, allows any sized construction, providing for better incremental flexibility. In this paper, traffic density over links in an incomplete hypercube with arbitrary size is analyzed for the first time. Despite its structural non-homogeneity, an incomplete hypercube is shown to exhibit bounded link traffic density (i.e., /spl les/ 2 messages per link per cycle) under the uniform message distribution, independent of the system size. As a result, it is easily achievable to build an incomplete hypercube with sufficient link communication capability that avoids any potential points of congestion, ensuring high performance. the incomplete hypercube thus appears to be an attractive and practical candidate topology for interconnecting parallel systems, since it shares every advantage of complete hypercubes while eliminating the restriction on system sizes.< >
this paper presents an approach to parallelprocessing of test generation for logic circuits in a loosely-coupled distributed network of general purpose computers. We first analyze the relation between the number of p...
详细信息
暂无评论