作者:
Fyodorov, VBRAS
Inst High Performance Comp Syst Moscow 117872 Russia
Three types of optoelectronic architectures for N x N high-performance switching fabrics with multibit word parallel data transmission through connected pairs of free-space optical channels are considered. The fabrics...
详细信息
ISBN:
(纸本)0818682596
Three types of optoelectronic architectures for N x N high-performance switching fabrics with multibit word parallel data transmission through connected pairs of free-space optical channels are considered. The fabrics differ in their functional capability to realize selfrouting strictly nonblocking conflict-free networking under arbitrary call requests. The possibility of creating such networks from laser and photodetector arrays, smart pixel structures, free-space optics, lenslets or selfoc lens arrays, and electronic control circuits is discussed.
In order to efficiently compute Fast Fourier transform (FFT) various parallel algorithms and their implementation to multiprocessors and multicomputers have been developed. In general, the local interconnection networ...
详细信息
ISBN:
(纸本)0818682596
In order to efficiently compute Fast Fourier transform (FFT) various parallel algorithms and their implementation to multiprocessors and multicomputers have been developed. In general, the local interconnection network is more high speed than a global one, but its capability depends on network architecture. On the other hand, the global interconnection network is not so high speed, but it does not depends on network architecture, It provides a flexible communication interface to the programmer. In this paper, we discuss parallel radix R FFT algorithms on a multiprocessor or multicomputer system with a global interconnection network. We propose two algorithms a stage-by-stage method and a multi-stage method. We also estimate the communication time. Then we show that the communication time is very sensitive to and affected by data exchange strategy. Finally, we implement these algorithms on two commercial massively parallel computers(nCUBE/2 and CM5) and measure these communication time.
Task scheduling is essential for the proper functioning of parallel processor systems. Scheduling of tasks onto networks of parallel processors is an interesting problem that is well-defined and documented in the lite...
详细信息
ISBN:
(纸本)0818682596
Task scheduling is essential for the proper functioning of parallel processor systems. Scheduling of tasks onto networks of parallel processors is an interesting problem that is well-defined and documented in the literature. However, most of the available techniques are based on heuristics that solve certain instances of the scheduling problem very efficiently and in reasonable amounts of time. This paper investigates an alternative paradigm, based on genetic algorithms, that can be used to efficiently solve the scheduling problem without the need to apply any restricted assumptions that are problem-specific, like it is the case when using heuristics. The conditions under which a genetic algorithm performs best will also be highlighted. This will be accompanied by a number of examples and case studies.
parallel applications with inconstant usage patterns presents a big challenge to programmers in that the spawning of tasks and the communication between them may be conditional (named »conditional parallel progra...
详细信息
This paper proposes a parallel branch and bound algorithm designed for solving the Vehicle Routing Problem (VRP) on NOWs (Networks of Workstations). Our objective is to minimize the execution time by considering paral...
详细信息
ISBN:
(纸本)0818682596
This paper proposes a parallel branch and bound algorithm designed for solving the Vehicle Routing Problem (VRP) on NOWs (Networks of Workstations). Our objective is to minimize the execution time by considering parallel implementation to find an exact solution to the VRP in real time. Our experimental studies reveal that the proposed parallel branch and bound algorithm can achieve super-linear speedup for large problem sizes. Dynamic load balancing techniques for solving the VRP on NOWs are also discussed.
In this paper we propose a parallel and distributed genetic algorithms (PDGA) on fixed network topology multiprocessor systems in which each processor element carries out genetic operations on its own chromosome set a...
详细信息
ISBN:
(纸本)0818682596
In this paper we propose a parallel and distributed genetic algorithms (PDGA) on fixed network topology multiprocessor systems in which each processor element carries out genetic operations on its own chromosome set and communicates with only the neighbors (we say chromosome migration). We execute the proposed method to investigate effects of chromosome migration, on the multiprocessor systems with ring, torus, and hypercube topology for benchmark problem instances. From the results, we find that the ring topology is more suitable for our proposed parallel and distributed execution since it avoids immature convergence for its topological feature. We show its effectiveness by experimental evaluation.
In this paper we are concerned with parallel implementation of row-oriented Gram-Schmidt orthogonalization. For the data partitioning four types of columnwise partitioning schemes were considered: column. (1-col), blo...
详细信息
ISBN:
(纸本)0818682596
In this paper we are concerned with parallel implementation of row-oriented Gram-Schmidt orthogonalization. For the data partitioning four types of columnwise partitioning schemes were considered: column. (1-col), block, cyclic and block-cyclic (b-c) partitioning. Analytical models for parallel execution time required by these implementations are derived and compared with numerical results. The best partitioning scheme is shown theoretically and by numerical results.
Consider a pyramid with n levels and a k-dimensional hypercube, 0 less than or equal to k less than or equal to 2n - 2. This paper presents a parallel algorithm for embedding large pyramids into smaller hypercubes wit...
详细信息
ISBN:
(纸本)0818682596
Consider a pyramid with n levels and a k-dimensional hypercube, 0 less than or equal to k less than or equal to 2n - 2. This paper presents a parallel algorithm for embedding large pyramids into smaller hypercubes with load balancing. With dilation 4, congestion at most 2(n-k/2) + 4, and load [2(2n-k)/3] when Ic is even, our algorithm embeds the pyramid into the hypercube, otherwise, with the same dilation and load, it has congestion 2(n-(k+1)/2+1) + 6 when k is odd. The algorithm can be performed in O(k)-bit time.
In this work we present the analysis, on a dynamic processor allocation environment, of four scheduling algorithms running on top of the nano-threads programming model. Three of them are well-known: uniform-sized chun...
详细信息
ISBN:
(纸本)0818677937
In this work we present the analysis, on a dynamic processor allocation environment, of four scheduling algorithms running on top of the nano-threads programming model. Three of them are well-known: uniform-sized chunking, guided self-scheduling and trapezoid self-scheduling. The fourth is our proposal: adaptable size chunking. In that environment, applications are automatically decomposed into tasks by a parallelizing compiler which uses the Hierarchical Task Graph to represent the source application. The parallel code is an executable representation of this graph with the support of a user-level library (the nano-threads library). The execution environment includes a user-level process (CPU manager) which controls the allocation of processors to applications. The analysis of the scheduling algorithms shows it is possible to provide enough information to the library to allow a fast adaptation to dynamic changes in the processors allocated to the application.
The architectural performance gain of a micro processor is going to saturate because of the small gain of instruction. level parallelism. In this paper, we discuss the design points and some tentative solutions to ove...
详细信息
ISBN:
(纸本)0818682596
The architectural performance gain of a micro processor is going to saturate because of the small gain of instruction. level parallelism. In this paper, we discuss the design points and some tentative solutions to overcome this bottleneck and propose a processor architecture called Very Large Data Path. The's architecture broadens the window of instruction analysis to extract 10 times of parallel gain compared with the conventional superscaler processors. This paper discusses the system elements and shows some preliminary evaluation results.
暂无评论