检索结果-内蒙古大学图书馆

Performance analysis and parallel implementation of dedicated hash functions on Pentium III

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES 2003年第1期E86A卷 54-63页

作者： Nakajima, J Matsui, M Mitsubishi Electr Corp Kamakura Kanagawa 2478501 Japan

This paper shows an extensive software performance analysis of dedicated hash functions, particularly concentrating on Pentium 111, which is a current dominant processor. The targeted hash functions are MD5, RIPEMD-128-160, SHA-1-256-512 and Whirlpool, which fully cover currently used and future promised hashing algorithms. We try to optimize hashing speed not only by carefully arranging pipeline scheduling but also by processing two or even three message blocks in parallel using MMX registers for 32-bit oriented hash functions. Moreover we thoroughly utilize 64-bit MMX instructions for maximizing performance of 64-bit oriented hash functions, SHA-512 and Whirlpool. To our best knowledge, this paper gives the first detailed measured performance analysis of SHA-256, SHA-512 and Whirlpool.

关键词： dedicated hash functions parallel implementations Pentium III

来源：评论

学校读者我要写书评

暂无评论

parallel IMPLEMENTATION OF THE BOX COUNTING ALGORITHM IN OPENCL

引用

FRACTALS-COMPLEX GEOMETRY PATTERNS AND SCALING IN NATURE AND SOCIETY 2015年第3期23卷

作者： Mukundan, Ramakrishnan Univ Canterbury Dept Comp Sci & Software Engn Christchurch 1 New Zealand

The box counting algorithm is a well-known method for the computation of the fractal dimension of an image. It is often implemented using a recursive subdivision of the image into a set of regular tiles or boxes. parallel implementations often try to map the boxes to different compute units, and combine the results to get the total number of boxes intersecting a shape. This paper presents a novel and highly efficient method using Open Computing Language (OpenCL) kernels to perform the computation on a per-pixel basis. The mapping and reduction stages are performed in a single pass, and therefore require the enqueuing of only a single kernel. Each instance of the kernel updates the information pertaining to all the boxes containing the pixel, and simultaneously increments the box counters at multiple levels, thereby eliminating the need for another pass to perform the summation. The complete implementation and coding details of the proposed method are outlined. The performance of the method on different processors are analyzed with respect to varying image sizes.

关键词： Fractal Dimension OpenCL Kernels Box Counting Algorithm Multifractal Analysis parallel implementations

来源：评论

学校读者我要写书评

暂无评论

Enhancing Particle Swarm Optimization Performance Through CUDA and Tree Reduction Algorithm

引用

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS 2024年第4期15卷 206-213页

作者： Younis, Hussein Eleyat, Mujahed Arab Amer Univ Dept Comp Syst Engn Jenin Palestine

In this paper, we present an enhancement for Particle Swarm Optimization performance by utilizing CUDA and a Tree Reduction Algorithm. PSO is a widely used metaheuristic algorithm that has been adapted into a CUDA version known as CPSO. The tree reduction algorithm is employed to efficiently compute the global best position. To evaluate our approach, we compared the speedup achieved by our CUDA version against the standard version of PSO, observing a maximum speedup of 37x. Additionally, we identified a linear relationship between the size of swarm particles and execution time;as the number of particles increases, so does computational load - highlighting the efficiency of parallel implementations in reducing execution time. Our proposed parallel PSOs have demonstrated significant reductions in execution time along with improvements in convergence speed and local optimization performance - particularly beneficial for solving large-scale problems with high computational loads.

关键词： Particle swarm optimization tree reduction algorithm parallel implementations CUDA GPU

来源：评论

学校读者我要写书评

暂无评论

A massively-parallel multicore acceleration of a point contact solid mechanics simulation

引用

Civil-Comp Proceedings 2017年 111卷

作者： Kolman, M. Kosec, G. Parallel and Distributed Systems Laboratory Joef Stefan Institute Ljubljana Slovenia

This paper deals with the numerical determination of the stress and displacement distribution in a solid body subjected to the applied external force. The tackled solid mechanics problem is governed by the Navier-Cauchy equation that describes the deformation within the solid body through the displacement vector field. To obtain the solution, a coupled system of non-linear Partial Differential Equations (PDE) of second order has to be solved. In this paper, the problem is approached by a strong form Moving Least Squares (MLS) based numerical discretization also referred to as a Meshless Local Strong Form Method (MLSM). A generic C++ implementation of a MLSM is used for demonstration of parallel solution of a Point Contact problem on Intel® Xeon Phi™ multicore accelerator. All tests are executed on either the host machine with two Intel® Xeon® E5-2620 v3 6 core processors or offloaded to its 60 core Intel® Xeon Phi™ SE10/7120 series. The shared memory parallelization is implemented through an OpenMP API. © Civil-Comp Press, 2017.

关键词： Point contacts Application programming interfaces (API) Least squares approximations Numerical methods Partial differential equations Displacement vector fields Meshless MLSM Nonlinear partial differential equations OpenMP parallel implementations Shared memory parallelization Stress and displacement distribution

来源：评论

学校读者我要写书评

暂无评论

parallel implementation of dynamic relaxation with CUDA

引用

Civil-Comp Proceedings 2017年 111卷

作者： Ivanyi, P. Faculty of Engineering and Information Technology University of Pécs Hungary

The dynamic relaxation method has been widely used for the design and analysis of cable-membrane structures. The method iteratively determines a static solution, therefore the parallelization of the method can speed up the analysis process. The method has already been parallelized with the MPI environment. This paper discusses a new parallelization approach, which is programmed with the NVIDIA CUDA API and executed on a GPU system. Since a GPU system has a large number of cores and a separate memory from the computer therefore the original dynamic relaxation method has to be reorganized. The paper also discusses some performance measurements of the dynamic relaxation method on GPU systems. © Civil-Comp Press, 2017.

关键词： Iterative methods Asynchronous parallel Cable membrane structures CUDA Design and analysis Dynamic relaxation Dynamic relaxation method parallel implementations Performance measurements

来源：评论

学校读者我要写书评

暂无评论

parallel computation of page rank using two-stage methods

引用

Civil-Comp Proceedings 2017年 111卷

作者： Migallon, H. Migallon, V. Penades, J. Department of Physics and Computer Architectures University Miguel Hernández Alicante Spain Department of Computer Science and Artificial Intelligence University of Alicante Alicante Spain

In this work we present parallel algorithms based on the use of two-stage methods for solving the PageRank problem as a linear system. Different parallel versions of these methods are explored and their convergence properties are analyzed. The parallel implementation has been developed using a mixed MPI/OpenMP model to exploit parallelism beyond a single level. In order to investigate and analyze the proposed parallel algorithms, we have used several realistic large datasets. The numerical results show that the proposed algorithms can significantly speed up the convergence time with respect to the parallel Power algorithm and behave better than other well-known techniques. © Civil-Comp Press, 2017.

关键词： parallel algorithms Linear systems Memory architecture Convergence properties Distributed Memory Investigate and analyze PageRank parallel Computation parallel implementations Shared memory Two stage methods

来源：评论

学校读者我要写书评

暂无评论

Techniques of linear programming based on the theory of convex cones

引用

Optimization 1989年第6期20卷 761-777页

作者： D’Alessandro, P. Mora, Manuel Dalla De Santis, Elena University of I’A quila Department of Electrical Engineering Italy

The convex cones approach to linear programming is illustrated. Two methods are introduced. The first, called primal, is based on a tangeney condition for nonnegative orthant and an affine set. The second, called dual, uses the extreme rays of a pointed polyhedral cone, given by the intersection of the nonnegative orthant with a subspace, to compute the maximum and than proceeds in the same way as the primal to derive the solution. A complete theory for the extreme rays of any cone of this form is given. Another highlight of the paper is the theory of the strictly tangent relaxation, which is independent of the particular method used, and allows the reduction of the problem to a minimal form. A numerical example is included to give a practical feeling of the dual method. Finally features of the methods, like the possibility of highly parallel implementations, open problems and perspectives are discussed. © 1989, Taylor & Francis Group, LLC. All rights reserved.

关键词： Extreme rays parallel implementations Polyhedral cones Reduction of dimension of linear programming problems Solution of linear programming problems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：