This paper shows an extensive software performance analysis of dedicated hash functions, particularly concentrating on Pentium 111, which is a current dominant processor. The targeted hash functions are MD5, RIPEMD-12...
详细信息
This paper shows an extensive software performance analysis of dedicated hash functions, particularly concentrating on Pentium 111, which is a current dominant processor. The targeted hash functions are MD5, RIPEMD-128-160, SHA-1-256-512 and Whirlpool, which fully cover currently used and future promised hashing algorithms. We try to optimize hashing speed not only by carefully arranging pipeline scheduling but also by processing two or even three message blocks in parallel using MMX registers for 32-bit oriented hash functions. Moreover we thoroughly utilize 64-bit MMX instructions for maximizing performance of 64-bit oriented hash functions, SHA-512 and Whirlpool. To our best knowledge, this paper gives the first detailed measured performance analysis of SHA-256, SHA-512 and Whirlpool.
The box counting algorithm is a well-known method for the computation of the fractal dimension of an image. It is often implemented using a recursive subdivision of the image into a set of regular tiles or boxes. Para...
详细信息
The box counting algorithm is a well-known method for the computation of the fractal dimension of an image. It is often implemented using a recursive subdivision of the image into a set of regular tiles or boxes. parallel implementations often try to map the boxes to different compute units, and combine the results to get the total number of boxes intersecting a shape. This paper presents a novel and highly efficient method using Open Computing Language (OpenCL) kernels to perform the computation on a per-pixel basis. The mapping and reduction stages are performed in a single pass, and therefore require the enqueuing of only a single kernel. Each instance of the kernel updates the information pertaining to all the boxes containing the pixel, and simultaneously increments the box counters at multiple levels, thereby eliminating the need for another pass to perform the summation. The complete implementation and coding details of the proposed method are outlined. The performance of the method on different processors are analyzed with respect to varying image sizes.
In this paper, we present an enhancement for Particle Swarm Optimization performance by utilizing CUDA and a Tree Reduction Algorithm. PSO is a widely used metaheuristic algorithm that has been adapted into a CUDA ver...
详细信息
In this paper, we present an enhancement for Particle Swarm Optimization performance by utilizing CUDA and a Tree Reduction Algorithm. PSO is a widely used metaheuristic algorithm that has been adapted into a CUDA version known as CPSO. The tree reduction algorithm is employed to efficiently compute the global best position. To evaluate our approach, we compared the speedup achieved by our CUDA version against the standard version of PSO, observing a maximum speedup of 37x. Additionally, we identified a linear relationship between the size of swarm particles and execution time;as the number of particles increases, so does computational load - highlighting the efficiency of parallel implementations in reducing execution time. Our proposed parallel PSOs have demonstrated significant reductions in execution time along with improvements in convergence speed and local optimization performance - particularly beneficial for solving large-scale problems with high computational loads.
This paper deals with the numerical determination of the stress and displacement distribution in a solid body subjected to the applied external force. The tackled solid mechanics problem is governed by the Navier-Cauc...
详细信息
The dynamic relaxation method has been widely used for the design and analysis of cable-membrane structures. The method iteratively determines a static solution, therefore the parallelization of the method can speed u...
详细信息
In this work we present parallel algorithms based on the use of two-stage methods for solving the PageRank problem as a linear system. Different parallel versions of these methods are explored and their convergence pr...
详细信息
The convex cones approach to linear programming is illustrated. Two methods are introduced. The first, called primal, is based on a tangeney condition for nonnegative orthant and an affine set. The second, called dual...
详细信息
暂无评论