parallel algorithms for the solution of dense systems of nonlinear equations on a message-passing multiprocessor computer are developed. Specifically, a distributed finite-difference Newton method, a multiple secant m...
详细信息
parallel algorithms for the solution of dense systems of nonlinear equations on a message-passing multiprocessor computer are developed. Specifically, a distributed finite-difference Newton method, a multiple secant method, and a rank-1 secant method are proposed. Experimental results, obtained on an Intel hypercube, indicate that these methods exhibit good parallelism.
A novel linearly implicit predictor-corrector scheme is developed for the numerical solution of reaction-diffusion equations. Iterative processes are avoided by treating the nonlinear reaction terms explicitly, while ...
详细信息
A novel linearly implicit predictor-corrector scheme is developed for the numerical solution of reaction-diffusion equations. Iterative processes are avoided by treating the nonlinear reaction terms explicitly, while maintaining superior accuracy and stability properties compared to the well-known theta methods and linearly implicit Runge-Kutta methods. The proposed method allows the opportunity of solving large systems of reaction-diffusion equations by alleviating the necessity of solving the accompanying large linear systems of algebraic equations due to the natural parallelism which surfaces across the system. Numerical results confirm the enhanced stability, accuracy and efficiency of the method when applied to reaction-diffusion equations arising in biochemistry and population ecology. (C) 1999 Elsevier Science Ltd. All rights reserved.
MapReduce has emerged as a popular tool for distributed processing of massive data. However, it is not efficient when handling skewed data and it often leads to reducer load imbalance. In this paper, we address the pr...
详细信息
MapReduce has emerged as a popular tool for distributed processing of massive data. However, it is not efficient when handling skewed data and it often leads to reducer load imbalance. In this paper, we address the problem of how to efficiently partition intermediate keys to balance the workload of all reducers when processing skewed data. We present a sampling scheme to compute the approximate distribution of key frequency, estimate the overall distribution and then make a partition scheme in advance. Then, we apply it to map phase of the executing MapReduce job. This work not only provides a load-balanced partition strategy, but also keeps a high performance of synchronous mode of MapReduce. We also propose two partition methods based on sampling results: cluster combination and cluster split combination. The experimental results show that our methods achieve a better time and load balancing results. (C) 2013 Elsevier Ltd. All rights reserved.
作者:
Bertsekas, Dimitri P.MIT
Lab Informat & Decis Syst 77 Massachusetts Ave Cambridge MA 02139 USA
A new and simple algorithm for finding shortest paths in a directed graph is proposed. In the single origin-single destination case, the algorithm maintains a single path starting at the origin, which is extended or c...
详细信息
A new and simple algorithm for finding shortest paths in a directed graph is proposed. In the single origin-single destination case, the algorithm maintains a single path starting at the origin, which is extended or contracted by a single node at each iteration. Simultaneously, at most one dual variable is adjusted at each iteration so as to either improve or maintain the value of a dual function. For the case of multiple origins, the algorithm is well suited for parallel computation. It maintains multiple paths that can be extended or contracted in parallel by several processors that share the results of their computations. Based on experiments with randomly generated problems on a serial machine, the algorithm substantially outperforms its closest competitors for problems with few origins and a single destination. It also seems better suited for parallel computation than other shortest path algorithms.
The breadth-first search procedure is an algorithm that traverses the vertices of a graph, determining the distance from each vertex to the initial vertex. The distance is infinite for a non-reachable vertex from the ...
详细信息
The breadth-first search procedure is an algorithm that traverses the vertices of a graph, determining the distance from each vertex to the initial vertex. The distance is infinite for a non-reachable vertex from the starting vertex. Despite having an efficient serial version, this important algorithm is irregular, making its effective parallel implementation a daunting task. This paper shows the results of an OpenMP-based implementation of the breadth-first search procedure using the bag data structure. Furthermore, the code relied on the C++ programming language. This paper reimplements an existing proposal coded using the Cilk++ programming language. The experiments relied on 32 strongly connected graphs and 31 disconnected graphs in executions performed on two machines. The first machine contained 28 cores and two threads per core. The second machine comprised 48 processing cores, with hyperthreading disabled. Regarding the serial version, the parallel implementation yielded a speedup of up to 20x when using 28 processing cores and up to 25x when using 56 threads in tests performed on a machine with the first generation of Intel (R) Xeon (R) Scalable processors. Furthermore, the new parallel implementation yielded speedups of up to 45x when using 48 cores in experiments performed on a machine with the second generation of Intel (R) Xeon (R) Scalable processors.
This paper presents part of the work being carried out to obtain parallel versions of the main SLICOT routines for model reduction. It is focused on the parallel solution of standard Lyapunov equations obtaining the C...
详细信息
This paper presents part of the work being carried out to obtain parallel versions of the main SLICOT routines for model reduction. It is focused on the parallel solution of standard Lyapunov equations obtaining the Cholesky factor of the controllability and observability Grammians. This operation is an important basis for model reduction methods. Routines from the standard libraries BLAS, LAPACK, SLICOT, PBLAS and ScaLAPACK have been used whenever possible in the parallelisation process. However, it has been necessary to develop some new routines. Experimental results obtained using a cluster of PC's are shown.
This paper presents a parallel sparse Cholesky factorization algorithm for shared-memory MIMD multiprocessors. The algorithm is particularly well suited for vector supercomputers with multiple processors, such as the ...
详细信息
This paper presents a parallel sparse Cholesky factorization algorithm for shared-memory MIMD multiprocessors. The algorithm is particularly well suited for vector supercomputers with multiple processors, such as the Cray Y-MR The new algorithm is a straightforward parallelization of the left-looking supernodal sparse Cholesky factorization algorithm. Like its sequential predecessor, it improves performance by reducing indirect addressing and memory traffic. Experimental results on a Cray Y-MP demonstrate the effectiveness of the new algorithm. On eight processors of a Cray Y-MP, the new routine performs the factorization at rates exceeding one Gflop for several test problems from the Harwell-Boeing sparse matrix collection.
Software testing is one of the most important phases of the software development life cycle. However, software testing is traditionally seen as a difficult and time consuming activity that is hard to embed in the soft...
详细信息
Software testing is one of the most important phases of the software development life cycle. However, software testing is traditionally seen as a difficult and time consuming activity that is hard to embed in the software development process. Software and hardware testers concentrate on how to minimize the testing time, as well as ensure that the system is also tested well and made acceptable. Basic combinatorial interaction (i.e. pairwise or 2-way) testing has been one of the commonly used methods in achieving the above goal with 50-97 percent of errors detection. However, empirical evidence has proved that the 2-way interaction testing is a poor strategy for testing highly interactive systems. Therefore there is a need for going beyond pairwise testing to uncover these errors. To speed up the process of solving problems, researchers have applied parallel algorithms to various large computationally expensive optimization problems and have succeeded in solving these problems in an acceptable time. Therefore, in this paper we have enhanced our previous strategy “A tree based strategy for test data generation and cost calculation for pairwise combinatorial interaction testing” to work effectively in parallel and to support a 3-way interaction testing. The correctness of the strategy has been proved, and the performance evaluation shows the efficient of the strategy in reducing test size.
The philosophy of multisplitting methods is the replacement of a large-scale linear or nonlinear problem by a set of smaller subproblems, each of which can be solved locally and independently in parallel by taking adv...
详细信息
The philosophy of multisplitting methods is the replacement of a large-scale linear or nonlinear problem by a set of smaller subproblems, each of which can be solved locally and independently in parallel by taking advantage of well-tested sequential algorithms. Because of this formulation most compute-intensive operations can be calculated independently and the algorithms are highly parallel. In continuation of our earlier work we utilize a new parameter-free formulation of linearly constrained convex minimization problems to obtain a parallel algorithm of multisplitting type. Numerical results both serial and parallel are reported which demonstrate its efficiency and which also show that it compares favorably to our earlier parameter-dependent approach.
This paper describes the parallelisation of a state estimator with confidence limit analysis. State estimation involves the optimal fitting of an overdetermined set of measurements to the corresponding values calculat...
详细信息
This paper describes the parallelisation of a state estimator with confidence limit analysis. State estimation involves the optimal fitting of an overdetermined set of measurements to the corresponding values calculated from the mathematical model of the system. The inaccuracies associated with measurements lead to discrepancies within the state estimate. Consequently for the state estimation algorithm to be of practical use it needs to quantify the effect of these discrepancies in the form of state confidence limits [2], However, the quasi-quadratic numerical complexity of the state estimation algorithms suggests a need for parallel implementation of the probabilistic state estimation, so that the real-time performance may be maintained also for large-scale systems. The algorithm is based on the idea of ‘tearing’ the original system into subsystems and then coordination of the resulting subsystem solutions. The algorithm has been tested in the context of water distribution systems state estimation.
暂无评论