A recent work shows how we can optimize a tree based mode of operation for a hash function where the sizes of input message blocks and digest are the same, subject to the constraint that the involved tree structure ha...
详细信息
A recent work shows how we can optimize a tree based mode of operation for a hash function where the sizes of input message blocks and digest are the same, subject to the constraint that the involved tree structure has all its leaves at the same depth. In this work, we show that we can further optimize the running time of such a mode by using a tree having leaves at all its levels. We make the assumption that the input message block has a size a multiple of that of the digest and denote by d the ratio block size over digest size. The running time is evaluated in terms of number of operations performed by the hash function, i.e. the number of calls to its underlying function. It turns out that a digest can be computed in inverted right perpendicular log(d+1)(l/2)inverted left perpendicular +2 evaluations of the underlying function using inverted right perpendicular l/2 inverted left perpendicular processors, where / is the number of blocks of the message. Other results of interest are discussed, such as the optimization of the parallel running time for a tree of restricted height. (C) 2019 Elsevier Inc. All rights reserved.
The high intensity of research and modeling in fields of mathematics, physics, biology and chemistry requires new computing resources. For the big computational complexity of such tasks computing time is large and cos...
详细信息
ISBN:
(纸本)9781479942763
The high intensity of research and modeling in fields of mathematics, physics, biology and chemistry requires new computing resources. For the big computational complexity of such tasks computing time is large and costly. The most efficient way to increase efficiency is to adopt parallel principles. Purpose of this paper is to present the issue of parallel computing with emphasis on the analysis of parallel systems, the impact of communication delays on their efficiency and on overall execution time. Paper focuses is on finite algorithms for solving systems of linear equations, namely the matrix manipulation (Gauss elimination method GEM). algorithms are designed for architectures with shared memory (openMP), distributed-memory (MPI) and for their combination (MPI+openMP). The properties of the algorithms were analytically determined and they were experimentally verified. The conclusions are drawn for theory and practice.
We provide a tight analysis that settles the round complexity of the well-studied parallel randomized greedy MIS algorithm, thus answering the main open question of Blelloch, Fineman, and Shun [SPAA'12]. The paral...
详细信息
We provide a tight analysis that settles the round complexity of the well-studied parallel randomized greedy MIS algorithm, thus answering the main open question of Blelloch, Fineman, and Shun [SPAA'12]. The parallel/distributed randomized greedy Maximal Independent Set (MIS) algorithm works as follows. An order of the vertices is chosen uniformly at random. Then, in each round, all vertices that appear before their neighbors in the order are added to the independent set and removed from the graph along with their neighbors. The main question of interest is the number of rounds it takes until the graph is empty. This algorithm has been studied since 1987, initiated by Coppersmith, Raghavan, and Tompa [FOCS'87], and the previously best known bounds were O(log n) rounds in expectation for Erdos-Renyi random graphs by Calkin and Frieze [Random Struc. Alg.'90] and O(log(2) n) rounds with high probability for general graphs by Blelloch, Fineman, and Shun [SPAA'12]. We prove a high probability upper bound of O(log n) on the round complexity of this algorithm in general graphs and that this bound is tight. This also shows that parallel randomized greedy MIS is as fast as the celebrated algorithm of Luby [STOC'85, JALG'86].
As the global economy continues to grow, the business scope of enterprises is further expanded, the environment will become more complex, the risks they face will increase, and the uncertainty will gradually increase....
详细信息
Superparamagnetic clustering (SPC) is an unsupervised classification technique in which clusters are self-organised based on data density and mutual interaction energy. Traditional SPC algorithm uses the Swendsen-Wang...
详细信息
Superparamagnetic clustering (SPC) is an unsupervised classification technique in which clusters are self-organised based on data density and mutual interaction energy. Traditional SPC algorithm uses the Swendsen-Wang Monte Carlo approximation technique to significantly reduce the search space for reasonable clustering. However, Swendsen-Wang approximation is a Markov process which limits the conventional superparamagnetic technique to process data clustering in a sequential manner. Here the authors propose a parallel approach to replace the conventional appropriation to allow the algorithm to perform clustering in parallel. One synthetic and one open-source dataset were used to validate the accuracy of this parallel approach in which comparable clustering results were obtained as compared to the conventional implementation. The parallel method has an increase of clustering speed at least 8.7 times over the conventional approach, and the larger the sample size, the more increase in speed was observed. This can be explained by the higher degree of parallelism utilised for the increased data points. In addition, a hardware architecture was proposed to implement the parallel superparamagnetic algorithm using digital electronic technologies suitable for rapid or real-time neural spike sorting.
An agreement protocol enables a system of n nodes in a distributed network to agree on a common input value. In the implicit version of the problem, only a subset of the nodes are required to decide the common value. ...
详细信息
A cut tree is a combinatorial structure that represents the edge-connectivity between all pairs of nodes of an undirected graph. Cut trees have multiple applications in dependability, as they represent how much it tak...
详细信息
A cut tree is a combinatorial structure that represents the edge-connectivity between all pairs of nodes of an undirected graph. Cut trees have multiple applications in dependability, as they represent how much it takes to disconnect every pair of network nodes. They have been used for solving connectivity problems, routing, and in the analysis of complex networks, among several other applications. This work presents a parallel version of the classical Gomory-Hu cut tree algorithm. The algorithm is heavily based on tasks that compute the minimum cut on contracted graphs. The main contribution is an efficient strategy to compute the contracted graphs, that allows processes to take advantage of previously contracted graph instances, instead of always computing all contractions from the original input graph. The proposed algorithm was implemented using MPI and experimental results are presented for several families of graphs and show significant performance gains.
We present a parallelized geometric multigrid (GMG) method, based on the cell-based Vanka smoother, for higher order space-time finite element methods (STFEM) to the incompressible Navier-Stokes equations. The STFEM i...
详细信息
We present a parallelized geometric multigrid (GMG) method, based on the cell-based Vanka smoother, for higher order space-time finite element methods (STFEM) to the incompressible Navier-Stokes equations. The STFEM is implemented as a time marching scheme. The GMG solver is applied as a preconditioner for generalized minimal residual iterations. Its performance properties are demonstrated for 2D and 3D benchmarks of flow around a cylinder. The key ingredients of the GMG approach are the construction of the local Vanka smoother over all degrees of freedom in time of the respective subinterval and its efficient application. For this, data structures that store pre-computed cell inverses of the Jacobian for all hierarchical levels and require only a reasonable amount of memory overhead are generated. The GMG method is built for the *** finite element library. The concepts are flexible and can be transferred to similar software platforms.
Morphological operations are among the most popular classic image filters. The filter assumes the maximum or minimum value within a window and is often used for light object thickening and thinning operations, which a...
详细信息
Morphological operations are among the most popular classic image filters. The filter assumes the maximum or minimum value within a window and is often used for light object thickening and thinning operations, which are important components of various workflows, such as object recognition and stylization. Circular windows are preferred over rectangular windows for obtaining isotropic filter results. However, the existing efficient algorithms focus on rectangular or binary input images. Efficient morphological operations with circular windows for grayscale images remain challenging. In this study, we present a fast grayscale morphology heuristic computation algorithm that decomposes circular windows using the convex hull of circles. We significantly accelerate traditional methods based on Minkowski addition by introducing new decomposition rules specialized for circular windows. As our morphological operation using a convex hull can be computed independently for each pixel, the algorithm is efficient for modern multithreaded hardware.
A novel algorithm for computing the action of a matrix exponential over a vector is proposed. The algorithm is based on a multilevel Monte Carlo method, and the vector solution is computed probabilistically generating...
详细信息
A novel algorithm for computing the action of a matrix exponential over a vector is proposed. The algorithm is based on a multilevel Monte Carlo method, and the vector solution is computed probabilistically generating suitable random paths which evolve through the indices of the matrix according to a suitable probability law. The computational complexity is proved in this paper to be significantly better than the classical Monte Carlo method, which allows the computation of much more accurate solutions. Furthermore, the positive features of the algorithm in terms of parallelism were exploited in practice to develop a highly scalable implementation capable of solving some test problems very efficiently using high performance supercomputers equipped with a large number of cores. For the specific case of shared memory architectures the performance of the algorithm was compared with the results obtained using an available Krylov-based algorithm, outperforming the latter in all benchmarks analyzed so far. (C) 2020 Elsevier Ltd. All rights reserved.
暂无评论