This paper presents parallel scalar multiplication techniques for elliptic curve cryptography using q-based addition-subtraction k-chain which can also effectively resist side-channel attack. Many techniques have been...
详细信息
ISBN:
(纸本)9781509026555
This paper presents parallel scalar multiplication techniques for elliptic curve cryptography using q-based addition-subtraction k-chain which can also effectively resist side-channel attack. Many techniques have been discussed to improve scalar multiplication, for example, double-and-add, NAF, w-NAF, addition chain and addition-subtraction chain. However, these techniques cannot resist side-channel attack. Montgomery ladder, random w-NAF and uniform operation techniques are also widely used to prevent side-channel attack, but their operations are not efficient enough comparing to those with no side-channel attack prevention. We have found a new way to use k-chain for this purpose. In this paper, we extend the definition of k-chain to q-based addition-subtraction k-chain and modify an algorithm proposed by Jarvinen et al. to generate the q-based addition-subtraction k-chain. We show the upper and lower bounds of its length which lead to the computation time using the new chain techniques. The chain techniques are used to reduce the cost of scalar multiplication in parallel ways. Comparing to wNAF, which is faster than double-and-add and Montgomery ladder technique, the maximum computation time of our q-based addition-subtraction k-chain techniques can have up to 25.92% less addition costs using only 3 parallel computing cores. We also discuss on the optimization for multiple operand point addition using hybrid-double multiplier which is proposed by Azarderakhsh and Reyhani-Masoleh. The proposed parallel chain techniques can also tolerate side-channel attack efficiently.
Modern applications such as graph and data analytics, when operating on real world data, have working sets much larger than cache capacity and are bottlenecked by DRAM. To make matters worse, DRAM bandwidth is increas...
详细信息
ISBN:
(纸本)9781450341219
Modern applications such as graph and data analytics, when operating on real world data, have working sets much larger than cache capacity and are bottlenecked by DRAM. To make matters worse, DRAM bandwidth is increasing much slower than per CPU core count, while DRAM latency has been virtually stagnant. parallel applications that are bound by memory bandwidth fail to scale, while applications bound by memory latency draw a small fraction of much-needed bandwidth. While expert programmers may be able to tune important applications by hand through heroic effort, traditional compiler cache optimizations have not been sufficiently aggressive to overcome the growing DRAM gap. In this paper, we introduce milk a C/C++ language extension that allows programmers to annotate memory bound loops concisely. Using optimized intermediate data structures, random indirect memory references are transformed into batches of efficient sequential DRAM accesses. A simple semantic model enhances programmer productivity for efficient parallelization with OpenMP. We evaluate the MILK compiler on parallel implementations of traditional graph applications, demonstrating performance gains of up to 3 x
In the present paper an approach to solving the global optimization problems using a nested optimization scheme is developed. The use of different algorithms at different nesting levels is the novel element. A complex...
详细信息
ISBN:
(纸本)9783319556680;9783319556697
In the present paper an approach to solving the global optimization problems using a nested optimization scheme is developed. The use of different algorithms at different nesting levels is the novel element. A complex serial algorithm (on CPU) is used at the upper level, and a simple parallel algorithm (on GPU) is used at the lower level. This computational scheme has been implemented in ExaMin parallel solver. The results of computational experiments demonstrating the speedup when solving a series of test problems are presented.
Particle Swarm Optimization (PSO) is a heuristic technique that have been used to solve problems where many events occur simultaneously and small pieces of the problem can collaborate to reach a solution. Among its ad...
详细信息
ISBN:
(纸本)9783319322438;9783319322421
Particle Swarm Optimization (PSO) is a heuristic technique that have been used to solve problems where many events occur simultaneously and small pieces of the problem can collaborate to reach a solution. Among its advantages are fast convergence, large exploration coverage, and adequate global optimization;however to address the premature convergence problem, modifications to the basic model have been developed such as Aging Leader and Challengers (ALC) PSO and Bio-inspired Aging (BAM) PSO. Being these algorithmsparallel in nature, some authors have attempted different approaches to apply PSO using MPI and GPU. Nevertheless ALC-PSO and BAM-PSO have not been implemented in parallel. For this study, we develop PSO, ALC-PSO and BAM-PSO, through MPI and GPU using the High Performance Computing Cluster (HPCC) Agave. The results suggest that ALC-PSO and BAM-PSO reduce the premature convergence, improving global precision, whilst BAM-PSO achieves better optimal at the expense of significantly increasing the algorithm computational complexity.
In recent years, significant research effort has been invested in development of mesh-free methods for different types of continuum problems. Prominent amongst these methods are element free Galerkin (EFG) method, RKP...
详细信息
ISBN:
(纸本)9780791857496
In recent years, significant research effort has been invested in development of mesh-free methods for different types of continuum problems. Prominent amongst these methods are element free Galerkin (EFG) method, RKPM, and mesh-less Petrov Galerkin (MLPG) method. Most of these methods employ a set of nodes for disbretization of the problem domain, and use a moving least squares (MLS) approximation to generate shape functions. Of these methods, MLPG method is seen as a pure meshless method since it does not require any background mesh. Accuracy and flexibility of MLPG method is well established for a variety of continuum problems. However, most of the applications have been limited to small scale problems solvable on serial machines. Very few attempts have been made to apply it to large scale problems which typically involve many millions (or even billions) of nodes and would require use of parallel algorithms based on domain decomposition. Such parallel techniques are well established in context of mesh-based methods. Extension of these algorithms in conjunction with MLPG method requires considerable further research. Objective of this paper is to spell out these challenges which need urgent attention to enable the application of meshless methods to large scale problems. We specifically address the issue of the solution of large scale linear problems which would necessarily require use of iterative solvers. We focus on application of BiCGSTAB method and an appropriate set of preconditioners for the solution of the MLPG system.
We consider a routing problem with constraints. To solve this problem, we employ a variant of the dynamic programming method, where the significant part (that is, the part that matters in view of precedence constraint...
详细信息
ISBN:
(纸本)9783319449142;9783319449135
We consider a routing problem with constraints. To solve this problem, we employ a variant of the dynamic programming method, where the significant part (that is, the part that matters in view of precedence constraints) of the Bellman function is calculated by means of an independent calculations scheme. We propose a parallel implementation of the algorithm for a supercomputer, where the construction of position space layers for the hypothetical processors is conducted with use of discrete dynamic systems' apparatus.
A parallel implementation of a surface reconstruction algorithm is presented. This algorithm uses the vector field surface representation and was adapted in a previous work by the authors to handle large scale environ...
详细信息
ISBN:
(纸本)9781509032280
A parallel implementation of a surface reconstruction algorithm is presented. This algorithm uses the vector field surface representation and was adapted in a previous work by the authors to handle large scale environment reconstruction. Two parallel implementations with different memory requirements and processing speeds are described and compared. These parallel implementations increase the vector field computation speed by a factor of up to 31 times relative to a purely serial implementation. The method is demonstrated on different datasets captured on the sites of Hydro-Quebec using a variety of sensors: LiDAR, sonar and the WireScan, an underwater laser scanner designed at our laboratory.
The processing of graphs is of increasing importance in many applications, with the size of such graphs growing rapidly. As with scientific computing, there is a growing need to understand the relationship between sys...
详细信息
ISBN:
(纸本)9781509036820
The processing of graphs is of increasing importance in many applications, with the size of such graphs growing rapidly. As with scientific computing, there is a growing need to understand the relationship between system architectures and graph algorithms, especially as both the scale of the system and the size of the graph increase. To date there is one such graph benchmark that has several hundred comparative reports available, namely Breadth First Search, which has over the last few years fueled new algorithms that have improved typical performance very significantly. This paper suggests an additional benchmark based on the computation of neighborhoods and Jaccard coefficients that is of both a different intrinsic complexity and can be recast in multiple ways that may be suitable for different classes of real-world applications.
In this research a parallel version of two existing algorithms that implement Maximum Likelihood Scale Invariant Map (MLHL-SIM) and Scale Invariant Map (SIM) is proposed. By using OpenMP to distribute the independent ...
详细信息
ISBN:
(纸本)9783319446363;9783319446356
In this research a parallel version of two existing algorithms that implement Maximum Likelihood Scale Invariant Map (MLHL-SIM) and Scale Invariant Map (SIM) is proposed. By using OpenMP to distribute the independent iterations of for-loops among the available threads, a significant reduction in the computation time for all the experiments is achieved. The higher the size of the considered map is, the higher the reduction of the computation time in the parallel algorithm is. So, for two given datasets, measured times are up to a 29.45% and a 36.21% of the sequential time for the MLHL-SIM algorithm. For the SIM algorithm it also reduces the computation time being a 42.09% and a 36.72% of the sequential version for the two datasets respectively. Results prove the improvement on the speed up of the parallel version.
Due the recent increase of the volume of data that has been generated, organizing this data has become one of the biggest problems in Computer Science. Among the different strategies propose to deal efficiently and ef...
详细信息
Due the recent increase of the volume of data that has been generated, organizing this data has become one of the biggest problems in Computer Science. Among the different strategies propose to deal efficiently and effectively for this purpose, we highlight those related to clustering, more specifically, density-based clustering strategies, which stands out for its ability to define clusters of arbitrary shape and the robustness to deal with the presence of data noise, such as DBSCAN and OPTICS. However, these algorithms are still a computational challenge since they are distance-based proposals. In this work we present a new approach to make OPTICS feasible based on data indexing strategy. Although the simplicity with which the data are indexed, using graphs, it allows explore various parallelization opportunities, which were explored using graphic processing unit (GPU). Based on this structure, the complexity of OPTICS is reduced to O(E * logV) in the worst case, becoming itself very fast. In our evaluation we show that our proposal can be over 200x faster than its sequential version using CPU.
暂无评论