The focus of this paper is on the analysis and evaluation of a type of parallel strategies applied to the algorithm Advanced Multidimensional Interval analysis Global Optimization (AMIGO). We investigate two parallel ...
详细信息
The focus of this paper is on the analysis and evaluation of a type of parallel strategies applied to the algorithm Advanced Multidimensional Interval analysis Global Optimization (AMIGO). We investigate two parallel versions of AMIGO, called parallel AMIGO (PAMIGO) algorithm, Global-PAMIGO and Local-PAMIGO. The idea behind our study is that in order to exploit the potential parallelism of algorithms, researchers need to adapt them to the target computer architectures. Our PAMIGO algorithms have been designed for shared memory architectures and are based on a threaded programming model, which is suitable to be run on current personal computers with multicore processors. Our first experimental results show a promising speed-up up to four process units. We analyse the loss of efficiency when the number of process units is greater than four by obtaining a profile of the algorithm executions. Secondly we experiment with the use of a local memory allocator per thread. This increases the efficiency by reducing the number of lock conflicts given by the standard system memory allocator. Our experimental results for both PAMIGO versions, using up to 15 process units, obtain a good performance for hard to solve problems on unicore and multicore processors. It is noteworthy that both versions of PAMIGO obtain a similar performance. Our experiments may be useful for researchers who use parallel BB algorithms.
If there exist any two vertices in G whose distance becomes longer when a vertex u is removed, then u is defined as a hinge vertex. Finding the set of hinge vertices in a graph can be used to identify critical nodes i...
详细信息
If there exist any two vertices in G whose distance becomes longer when a vertex u is removed, then u is defined as a hinge vertex. Finding the set of hinge vertices in a graph can be used to identify critical nodes in an actual network. A number of studies concerning hinge vertices have been made in recent years. In general, it is known that more efficient sequential or parallel algorithms can be developed by restricting classes of graphs. For instance, Chang et al. presented an O(n + m) time algorithm for finding all hinge vertices of a strongly chordal graph [1]. Ho et al. presented a linear time algorithm for all hinge vertices of a permutation graph [3]. In this paper, we shall propose a parallel algorithm which runs in O(log n) time with O(n) processors on CREW PRAM for finding all hinge vertices of an interval graph [4].
The modified singed-digit(MSD) number system can offer parallel addition and subtraction of any two numbers, while cal-ly propagation constrained only between two adjacent digits. Basing on MSD addition, we develop pa...
详细信息
ISBN:
(纸本)0819430129
The modified singed-digit(MSD) number system can offer parallel addition and subtraction of any two numbers, while cal-ly propagation constrained only between two adjacent digits. Basing on MSD addition, we develop parallel algorithms for high-speed multiplication (MSD multiplication) in this paper. The simultaneous generation of all the partial products and the pairwise addition of the partial products are unique features of this MSD multiplier. Optoelectronic butterfly interconnection (OEBI) network matches well with the multiplier presented here.
Interconnection networks of various topologies are used in parallel computing. It is important to study the graph theoretical/combinatorial properties of the underlying networks in order to better understand them and ...
详细信息
ISBN:
(纸本)0769515797
Interconnection networks of various topologies are used in parallel computing. It is important to study the graph theoretical/combinatorial properties of the underlying networks in order to better understand them and develop more efficient parallel algorithms as well as fault-tolerant communication/routing algorithms. In this paper, we approach this problem from a new angle by looking into the spectra (eigenvalues and their multiplicities) of these networks. Eigenvalues of the adjacency matrix of a graph can reveal certain properties of the graph since they are closely related to some of its combinatorial invariants. Specifically, for some of the popular interconnection networks, we study their eigenvalues and multiplicities by (1) summarizing the currently available results;(2) deriving some of these results in a more straight forward way;(3) obtaining new results;and (4) presenting experimental results on several interconnection networks. In addition, we briefly survey the results that relate spectra of graphs to their structural properties. Although much work remains to be done, by looking into the spectra of interconnection networks, we hope to bring about a more unified approcah to studying their topological properties.
We study the problem of executing parallel programs, in particular Cilk programs, on a collection of processors of different speeds. We consider a model in which each processor maintains an estimate of its own speed, ...
详细信息
We study the problem of executing parallel programs, in particular Cilk programs, on a collection of processors of different speeds. We consider a model in which each processor maintains an estimate of its own speed, where communication between processors has a cost, and where all scheduling must be online. This problem has been considered previously in the fields of asynchronous parallel computing and scheduling theory. Our model is a bridge between the assumptions in these fields. We provide a new more accurate analysis of an old scheduling algorithm called the maximum utilization scheduler. Based on this analysis, we generalize this scheduling policy and define the high utilization scheduler. We next focus on the Cilk platform and introduce a new algorithm for scheduling Cilk multithreaded parallel programs on heterogeneous processors. This scheduler is inspired by the high utilization scheduler and is modified to fit in a Cilk context.' A crucial aspect of our algorithm is that it keeps the original spirit of the Cilk scheduler. In fact, when our new algorithm runs on homogeneous processors, it exactly mimics the dynamics of the original Cilk scheduler.
This paper describes Operator Distribution Method for parallel Planning (ODMP), a parallelization method for efficient heuristic planning. The method innovates in that it parallelizes the application of the available ...
详细信息
This paper describes Operator Distribution Method for parallel Planning (ODMP), a parallelization method for efficient heuristic planning. The method innovates in that it parallelizes the application of the available operators to the current state and the evaluation of the successor states using the heuristic function. In order to achieve better load balancing and a lift in the scalability of the algorithm, the operator set is initially enlarged, by grounding the first argument of each operator. Additional load balancing is achieved through the reordering of the operator set, based on the expected amount of imposed work. ODMP is effective for heuristic planners, but it can be applied to planners that embody other search strategies as well. It has been applied to GRT, a domain-independent heuristic planner, and CL, a heuristic planner for simple logistics problems, and has been thoroughly tested on a set of logistics problems adopted from the AIPS-98 planning competition, giving quite promising results.
Given an n x n array A of integers, with at least one positive value, the maximum subarray sum problem consists in finding the maximum sum among the sums of all rectangular subarrays of A. The maximum subarray problem...
详细信息
ISBN:
(纸本)9783319214047;9783319214030
Given an n x n array A of integers, with at least one positive value, the maximum subarray sum problem consists in finding the maximum sum among the sums of all rectangular subarrays of A. The maximum subarray problem appears in several scientific applications, particularly in Computer Vision. The algorithms that solve this problem have been used to help the identification of the brightest regions of the images used in astronomy and medical diagnosis. The best known sequential algorithm that solves this problem has O(n(3)) time complexity. In this work we revisit the BSP/CGM parallel algorithm that solves this problem and we present BSP/CGM algorithms for the following related problems: the maximum largest subarray sum, the maximum smallest subarray sum, the number of subarrays of maximum sum, the selection of the subarray with k-maximum sum and the location of the subarray with the maximum relative density sum. To the best of our knowledge there are no parallel BSP/CGM algorithms for these related problems. Our algorithms use p processors and require O (n(3)/p) parallel time with a constant number of communication rounds. In order to show the applicability of our algorithms, we have implemented them on a cluster of computers using MPI and on a machine with GPGPU using CUDA and OpenMP. We have obtained good speedup results in both environments. We also tested the maximum relative density sum algorithm with a image of the cancer imaging archive.
This paper presents an acceleration framework for packing linear programming problems where the amount of data available is limited, i.e., where the number of constraints m is small compared to the variable dimension ...
详细信息
ISBN:
(纸本)9781728101248
This paper presents an acceleration framework for packing linear programming problems where the amount of data available is limited, i.e., where the number of constraints m is small compared to the variable dimension n. The framework can be used as a black box to speed up linear programming solvers dramatically, by two orders of magnitude in our experiments. We present worst-case guarantees on the quality of the solution and the speedup provided by the algorithm, showing that the framework provides an approximately optimal solution while running the original solver on a much smaller problem. The framework can be used to accelerate exact solvers, approximate solvers, and parallel/distributed solvers. Further, it can be used for both linear programs and integer linear programs.
An analytical model is presented for estimating parallel efficiency of the domain decomposition method which is used for the parallelization of implicit finite element code. Serial and parallel finite element codes wi...
详细信息
ISBN:
(纸本)0818672676
An analytical model is presented for estimating parallel efficiency of the domain decomposition method which is used for the parallelization of implicit finite element code. Serial and parallel finite element codes with domain decomposition and direct LDU solution of equation systems are developed. Dependencies of parallel efficiency on problem size are obtained for IBM SP2 with 4, 6 and 8 processor nodes. It is shown that interprocessor load balancing during assembly-decomposition phase which can be achieved by partitioning into unequal subdomains increase parallel efficiency considerably. Predicted and measured values of parallel efficiency are in reasonable agreement.
Software applications are driving the growth of cloud computing era. The security of data over the networks is essential in order to enable cloud computing applications. The security of data is ensured through the use...
详细信息
ISBN:
(纸本)9783319137315;9783319137308
Software applications are driving the growth of cloud computing era. The security of data over the networks is essential in order to enable cloud computing applications. The security of data is ensured through the use of various types of encryption algorithms. In the current cloud computing era we are witnessing the use of multi core processors which has enabled us to run security applications, simultaneously at both client and the server end. The encryption as well as decryption process of security algorithms is compute intensive and can take significant benefit from parallel implementations that can run on these multi core processors. Moreover these algorithms will consume more energy on uniprocessor systems due to the massive calculations they do, because there is a non-linear relationship between frequency of a core and power supply. This paper introduces a parallel version of Blowfish algorithm using Single Instruction Multiple Data model which is named as PBlock and its implementation on a Symmetric Multi Processor machine along with the results of performance gains that we have obtained on a number of benchmark examples.
暂无评论