The computation of autocorrelation matrix is used heavily in several areas including signal and image processing, where parallel and application-specific architectures are also being increasingly used. Therefore, an e...
详细信息
The computation of autocorrelation matrix is used heavily in several areas including signal and image processing, where parallel and application-specific architectures are also being increasingly used. Therefore, an efficient scheme to compute autocorrelation matrix on parallelarchitectures has tremendous benefits. In this paper, a parallel algorithm for the computation of autocorrelation matrix on 2-D mesh, is presented. The computation requirements for the elements of the autocorrelation matrix is highly skewed and the proposed algorithm attempts to balance the computation load, without requiring an external load balancing algorithm or processor. In this sense, the load balancing is embedded within the algorithm. The exact number of computation steps are derived. The time complexity of the proposed algorithm is shown to be within twice the optimal (or lower bound). It is also shown to have twice the speedup of a straight-forward parallel algorithm.
We propose an adaptive processor allocation strategy based on shape manipulations of required submesh for large mesh-connected systems. When an incoming job requests a rectangular submesh, our strategy first tries to ...
详细信息
ISBN:
(纸本)0818682596
We propose an adaptive processor allocation strategy based on shape manipulations of required submesh for large mesh-connected systems. When an incoming job requests a rectangular submesh, our strategy first tries to allocate the conventional rectangular submeshes including 90-degree rotation and folding techniques. If it fails, our strategy further tries to allocate more flexible and robust L-shaped submeshes instead of signaling the allocation failure. Thus, our strategy accommodates incoming job earlier than conventional strategies. Simulation results indicate that our strategy performs more efficiently than other strategies in terms of the external fragmentation, the job response time, and the system utilization. Our strategy is transparent to application programmers and does not require additional hardware supports. Moreover, with our L-shaped submesh allocation strategy, application programmers using the mesh-connected system may no longer limit their request to rectangular submeshes. They can request the L-shaped submesh with the number of processors much closer to the exactly needed value to execute their job.
In this paper we consider the determination of allocation functions as a part of the design of massively parallel processor arrays for algorithms which can be represented as systems of uniform recurrence equations. Th...
详细信息
In this paper we consider the determination of allocation functions as a part of the design of massively parallel processor arrays for algorithms which can be represented as systems of uniform recurrence equations. The objective is to find allocation functions minimizing the necessary chip area for a hardware implementation of the processor array. We propose an algorithm approximately minimizing the number of processors under consideration of the necessary chip area needed to implement the processors of the processor array. The arising optimization problems can be solved using integer linear programming.
The MINCUT problem for graphs is to find a linear arrangement with minimum cut. The problem is NP-complete for general graphs while polynomial-time solvable for trees. The PLANAR MINCUT problem does not allow edge cro...
详细信息
ISBN:
(纸本)0818682596
The MINCUT problem for graphs is to find a linear arrangement with minimum cut. The problem is NP-complete for general graphs while polynomial-time solvable for trees. The PLANAR MINCUT problem does not allow edge crossings in arrangements. We present a parallel algorithm for the PLANAR MINCUT problem for trees with n vertices, which takes O(log/sup 2/ n) time and O(n) processors in the EREW PRAM.
The mpC language is an ANSI C superset supporting modular parallelprogramming for distributed memory machines. It allows the user to specify dynamically an application topology, and the mpC programming environment us...
详细信息
The mpC language is an ANSI C superset supporting modular parallelprogramming for distributed memory machines. It allows the user to specify dynamically an application topology, and the mpC programming environment uses this information in run time to provide the most efficient execution of the program on any particular distributed memory machine. The paper describes the features of mpC and its programming environment which allow to use them for developing libraries of parallel programs.
In this paper we are concerned with parallel implementation of row-oriented Gram-Schmidt orthogonalization. For the data partitioning four types of columnwise partitioning schemes were considered: column (1-col), bloc...
详细信息
In this paper we are concerned with parallel implementation of row-oriented Gram-Schmidt orthogonalization. For the data partitioning four types of columnwise partitioning schemes were considered: column (1-col), block, cyclic and block-cyclic (b-c) partitioning. Analytical models for parallel execution time required by these implementations are derived and compared with numerical results. The best partitioning scheme is shown theoretically and by numerical results.
Techniques of customizing and extending operating systems (OSs) have a growing impact on system architectures in the field of distributed computing and parallelprogramming. Even if traditional methods of adaption hav...
详细信息
Techniques of customizing and extending operating systems (OSs) have a growing impact on system architectures in the field of distributed computing and parallelprogramming. Even if traditional methods of adaption have been limited to the user-level, modern OSs cannot do without kernel support. Hence concepts and structures of microkernel architectures must be redefined to meet the requirements of today's and future applications. We propose a new customizable low-level OS architecture-the Dycos kernel. We discuss customization demands on microkernels and describe the basic kernel concept. Dycos is an object-based approach providing a toolbox of operations to build user-definable compositions of kernel structures. The Dycos approach has been evaluated on a Solaris 2.5.1 platform.
The author addresses through a specific example the question about the overhead incurred with the use of higher abstraction levels for parallelprogramming. He develops a simple molecular dynamics application in ALWAN...
详细信息
The author addresses through a specific example the question about the overhead incurred with the use of higher abstraction levels for parallelprogramming. He develops a simple molecular dynamics application in ALWAN and in MPI, and compares the execution performances on an Intel Paragon machine.
Task scheduling is essential for the proper functioning of parallel processor systems. Scheduling of tasks onto networks of parallel processors is an interesting problem that is well-defined and documented in the lite...
详细信息
Task scheduling is essential for the proper functioning of parallel processor systems. Scheduling of tasks onto networks of parallel processors is an interesting problem that is well-defined and documented in the literature. However, most of the available techniques are based on heuristics that solve certain instances of the scheduling problem very efficiently and in reasonable amounts of time. This paper investigates an alternative paradigm, based on genetic algorithms, that can be used to efficiently solve the scheduling problem without the need to apply any restricted assumptions that are problem-specific, such as is the case when using heuristics. The conditions under which a genetic algorithm performs best will also be highlighted. This will be accompanied by a number of examples and case studies.
In this paper we consider the broadcast and gossiping problems in circulant networks. The circulant graphs are studied extensively as reliable interconnection networks for the multiprocessor systems. We consider gossi...
详细信息
In this paper we consider the broadcast and gossiping problems in circulant networks. The circulant graphs are studied extensively as reliable interconnection networks for the multiprocessor systems. We consider gossiping in the store-and-forward, full-duplex and shouting model for the case when communicating nodes can exchange up to a fixed number p of packets at each round of gossiping (p-gossiping). A general method for evaluation of the lower bounds for p-gossiping in circulant graphs is established. The parallel decentralized node-invariant broadcast and p-gossiping algorithms are proposed which provide a minimum execution time and a minimum of message loading of a network during message-passing.
暂无评论