Our main contribution is to present the first known general-case, time- and VLSI-optimal, algorithm for convex hull computation on meshes with multiple broadcasting. Specifically, we show that for every choice of a po...
详细信息
Our main contribution is to present the first known general-case, time- and VLSI-optimal, algorithm for convex hull computation on meshes with multiple broadcasting. Specifically, we show that for every choice of a positive constant epsilon, the convex hull of a set of an arbitrary set of m (n(1/2+epsilon) less than or equal to m less than or equal to n) points in the plane input in the first [m/root n] columns of a mesh with multiple broadcasting of size root n x root n can be computed in Theta(m/root n) time.
We compare two methods for solving banded linear systems on a hypercube multiprocessor. Both methods are based on Gaussian elimination. The differences in the methods are due to different allocation schemes to distrib...
详细信息
We compare two methods for solving banded linear systems on a hypercube multiprocessor. Both methods are based on Gaussian elimination. The differences in the methods are due to different allocation schemes to distribute the data among the nodes. We implemented both methods on the Intel iPSC/2 hypercube. Timing results and efficiency results obtained on this multiprocessor are discussed.
We present a garbage-collection algorithm, suitable for loosely-coupled multiprocessor systems, in which the processing elements (PEs) share only the communication medium. The algorithm is global, i.e., it involves al...
详细信息
We present a garbage-collection algorithm, suitable for loosely-coupled multiprocessor systems, in which the processing elements (PEs) share only the communication medium. The algorithm is global, i.e., it involves all the PEs in the system. It allows space compaction, and it uses a system-wide marking phase to mark all accessible objects where a combination of parallel breadth-first/depth-first strategies is used for tracing the object-graphs according to a decentralized credit mechanism that regulates the number of garbage collection messages in the system. The credit mechanism is crucial for determining the space requirement of the garbage-collection messages. Also a variation of this algorithm is presented for systems with high locality of reference. It allows each PE to perform first its local garbage collection and only invokes the global garbage collection when the freed space by the local collector is insufficient.
Designing efficient parallel algorithms in a message-based parallel computer should consider both time-space tradeoffs and computation-communication tradeoffs. In order to balance these tradeoffs and achieve the optim...
详细信息
Designing efficient parallel algorithms in a message-based parallel computer should consider both time-space tradeoffs and computation-communication tradeoffs. In order to balance these tradeoffs and achieve the optimal performance of an algorith, one has to consider various design parameters such as the number of processors required and the size of partitions. In this paper, we demonstrate that, for certain data parallel algorithms, it is possible to determine these design parameters analytically. To serve as a basis for the discussions that follow, a simple model for the NCUBE hypercube computer is introduced. Using this model, we use two examples, array summation and matrix multiplication, to illustrate how their performance can be modeled. By optimizing these expressions, one is able to determine optimal design parameters which arrive at efficient execution. Experiments on a 64-node NCUBE verified the accuracy of the analytic results and are used to further support the discussions.
An asynchronous gossip algorithm with a restart strategy is proposed to an approximated distributed minimax optimization in this paper. The restart strategy controls a step length of a subgradient method and resets a ...
详细信息
The goal of this paper is to develop a grid-characteristic method intended for high-performance computer systems and implemented on unstructured tetrahedral hierarchical meshes with the use of a multiple time step and...
详细信息
The goal of this paper is to develop a grid-characteristic method intended for high-performance computer systems and implemented on unstructured tetrahedral hierarchical meshes with the use of a multiple time step and high-order interpolation, including interpolation with a limiter, piecewise parabolic interpolation, and monotone interpolation. The method is designed for simulating complex three-dimensional dynamical processes in heterogeneous media. It involves accurately stated contact conditions and produces physically correct solutions of problems in seismology and seismic exploration. Hierarchical meshes make it possible to take into account numerous inhomogeneous inclusions (cracks, cavities, etc.) and to solve problems in a real-life formulation. The grid-characteristic method enables the use of a multiple time step. As a result, the computation time is considerably reduced and the efficiency of the method is raised. The method is parallelized on a computer cluster with an optimal use of system resources.
Dictionary compression belongs to the class of lossless compression methods and is mainly used for compressing text files. The most known examples of this technique are the algorithms of the LZ coding family whose com...
详细信息
Dictionary compression belongs to the class of lossless compression methods and is mainly used for compressing text files. The most known examples of this technique are the algorithms of the LZ coding family whose common feature is the use of an adaptive dictionary which is dynamically adjusting during the algorithm execution. In this paper, we present a parallel algorithm for one of these coding algorithms, namely the LZ77 coding algorithm also known as a sliding-window coding algorithm. We also present a parallel algorithm for the corresponding LZ77 decoding algorithm. Although there exist PRAM algorithms for various dictionary compression methods, their rather irregular structure has discouraged their implementation on practical interconnection networks such as the mesh and hypercube. However in the case of LZ77 coding/decoding, we show how to exploit the specific properties of the algorithm in order to achieve an efficient implementation on the hypercube. Specifically, we show how to encode a N-character string on a N-node hypercube in only O(log2N) time. In contrast, a naive simulation of a PRAM algorithm of the LZ77 coding on the hypercube would have O(log3N) complexity. In addition, we further enhance the performance of our parallel algorithms by using some known heuristics from the field of text compression.
We investigate the layer undulations that appear in smectic A liquid crystals when a magnetic field is applied in the direction parallel to the smectic layers. In an earlier work (Garcia-Cervera and Joo in J Comput Th...
详细信息
We investigate the layer undulations that appear in smectic A liquid crystals when a magnetic field is applied in the direction parallel to the smectic layers. In an earlier work (Garcia-Cervera and Joo in J Comput Theor Nanosci 7:795-801, 2010) the authors characterized the critical field using the Landau-de Gennes model for smectic A liquid crystals. In this paper, we obtain an asymptotic expression of the unstable modes using I"-convergence theory, and a sharp estimate of the critical field. Under the assumption that the layers are fixed at the boundaries, the maximum layer undulation occurs in the middle of the cell and the displacement amplitude decreases near the boundaries. Our estimate of the critical field is consistent with the Helfrich-Hurault theory. When natural boundary conditions are considered, the displacement amplitude does not diminish near the boundary, in sharp contrast with the Dirichlet case, and the critical field is reduced compared to the one calculated in the classical theory. This is consistent with the experiments carried out by Ishikawa and Lavrentovich (Phys Rev E 63:030501(R), 2001). Furthermore, we prove the existence and stability of the solution to the nonlinear system of the Landau-de Gennes model using bifurcation theory. Numerical simulations are used to illustrate the predictions of the analysis.
Application of associative processors to the solution of the maximal flow problem is investigated. To take maximum advantage of the capability of associative processors, a new algorithm based on matrix representation ...
详细信息
Application of associative processors to the solution of the maximal flow problem is investigated. To take maximum advantage of the capability of associative processors, a new algorithm based on matrix representation is developed. The new algorithm is then compared with the associative version of the Ford and Fulkerson labeling method. The comparison is made on the total associative memory access time required for problem solution by each algorithm running on an associate processor. Results show that the ratio of the labeling algorithm to the new algorithm is about 3 for a dense network with 5 nodes. This ratio increases as the number of nodes increases, and decreases as the density of the network decreases.
In this paper, we discuss the implementation of Bitz and Kung's path planning algorithm on a ring of general-purpose processors. We show that Bitz and Kung's algorithm, originally designed for the Warp machine...
详细信息
In this paper, we discuss the implementation of Bitz and Kung's path planning algorithm on a ring of general-purpose processors. We show that Bitz and Kung's algorithm, originally designed for the Warp machine, is not efficient in this context, due to the intensive inter-processor communications that it requires. We design a modified version that is much more performing. The new version updates a segment ofkpositions within a step and allocates blocks ofrconsecutive rows of the map to the processors in a wraparound fashion. Bitz and Kung's algorithm corresponds to the situation (k,r) = (l, 1). We analytically determine the optimal values of the parameters (k,r) which minimize the parallel execution time as a function of the problem sizenand of the number of processorsp. The theoretical results are nicely corroborated by numerical experiments on a ring of 32 transputers.
暂无评论