Since the silicon technology entered the many-core era, new computing platforms are exploiting higher and higher levels of parallelism. thanks to scalable, clustered architectures, embedded systems and high-performanc...
详细信息
In this paper we present a parallel preconditioner for the standard Finite Volume (FV) discretization of elliptic problems, using the standard continuous piecewise linear Finite Element (FE) function space. the propos...
详细信息
ISBN:
(数字)9783642551956
ISBN:
(纸本)9783642551956
In this paper we present a parallel preconditioner for the standard Finite Volume (FV) discretization of elliptic problems, using the standard continuous piecewise linear Finite Element (FE) function space. the proposed preconditioner is constructed using an abstract framework of the Additive Schwarz Method, and is fully parallel. the convergence rate of the Generalized Minimal Residual (GMRES) method withthis preconditioner is shown to be almost optimal, i.e., it depends poly-logarithmically on the mesh sizes.
作者:
Jeljeli, HamzaUniv Lorraine
CARAMEL Project Team LORIA INRIACNRS Campus SciBP 239 F-54506 Vandoeuvre Les Nancy France
In cryptanalysis, solving the discrete logarithm problem (DLP) is key to assessing the security of many public-key cryptosystems. the index-calculus methods, that attack the DLP in multiplicative subgroups of finite f...
详细信息
ISBN:
(数字)9783319098739
ISBN:
(纸本)9783319098739;9783319098722
In cryptanalysis, solving the discrete logarithm problem (DLP) is key to assessing the security of many public-key cryptosystems. the index-calculus methods, that attack the DLP in multiplicative subgroups of finite fields, require solving large sparse systems of linear equations modulo large primes. this article deals with how we can run this computation on GPU- and multi-core-based clusters, featuring InfiniBand networking. More specifically, we present the sparse linear algebra algorithmsthat are proposed in the literature, in particular the block Wiedemann algorithm. We discuss the parallelization of the central matrix-vector product operation from both algorithmic and practical points of view, and illustrate how our approach has contributed to the recent record-sized DLP computation in GF(2(809)).
this paper concerns a new approach to evaluation of Option Price sensitivities using the Monte Carlo simulation, based on the parallel GPU architecture and Automatic Differentiation methods. In order to study rounding...
详细信息
ISBN:
(数字)9783642551956
ISBN:
(纸本)9783642551956
this paper concerns a new approach to evaluation of Option Price sensitivities using the Monte Carlo simulation, based on the parallel GPU architecture and Automatic Differentiation methods. In order to study rounding errors, the interval arithmetic is used. Considerations are based on two implementations of the algorithm - the sequential and parallel ones. For efficient differentiation, the Adjoint method is employed. Computational experiments include analysis of performance, uncertainty error and rounding error and consider Black-Scholes and Heston models.
Matrix eigenvalue theory has become an important analysis tool in scientific computing. Sometimes, people do not need to find all eigenvalues but only the maximum eigenvalue. Existing algorithms of finding the maximum...
详细信息
ISBN:
(数字)9783319111940
ISBN:
(纸本)9783319111940;9783319111933
Matrix eigenvalue theory has become an important analysis tool in scientific computing. Sometimes, people do not need to find all eigenvalues but only the maximum eigenvalue. Existing algorithms of finding the maximum eigenvalue of matrices are implemented sequentially. Withthe increasing of the orders of matrices, the workload of calculation is getting heavier. therefore, traditional sequential methods are unable to meet the need of fast calculation for large matrices. this paper proposes a parallel algorithm named PA-ST to find the maximum eigenvalue of positive matrices by using similarity transformation which is implemented by CUDA (Computer Unified Device Architecture) on GPU (Graphic Process Unit). To the best of our knowledge, this is the first CUDA based parallel algorithm of calculating maximum eigenvalue of matrices. In order to improve the performance, optimization techniques are applied in this paper such as using the shared memory rather than the global memory to improve the speed of computation, avoiding bank conflicts by setting the span index, satisfying the principle of coalesced memory access, and by using single-precision floating-point arithmetic and the pinned memory to reduce the copy operation and obtain higher data transfer bandwidth between the host and the GPU device. the experimental results show that the similarity transformation technique can significantly shorten the running time compared to the sequential algorithm and the speedup ratio is nearly stable when the number of iterations increases. As the matrix order increases, the running time of the sequential algorithm and PA-ST increases correspondingly. Experiments also show that the speedup ratio of the PA-ST is between 2.85 and 35.028.
In one of the most important methods in Density Functional theory - the Full-Potential Linearized Augmented Plane Wave (FLAPW) method - dense generalized eigenproblems are organized in long sequences. Moreover each ei...
详细信息
ISBN:
(数字)9783642551956
ISBN:
(纸本)9783642551956
In one of the most important methods in Density Functional theory - the Full-Potential Linearized Augmented Plane Wave (FLAPW) method - dense generalized eigenproblems are organized in long sequences. Moreover each eigenproblem is strongly correlated to the next one in the sequence. We propose a novel approach which exploits such correlation through the use of an eigensolver based on subspace iteration and accelerated with Chebyshev polynomials. the resulting solver, parallelized using the Elemental library framework, achieves excellent scalability and is competitive with current dense parallel eigensolvers.
Computations in Fluid Dynamics require minimisation of time in which the result could be obtained. While parallel techniques allow for handling of large problems, it is the adaptivity that ensures that computational e...
详细信息
ISBN:
(数字)9783642551956
ISBN:
(纸本)9783642551956
Computations in Fluid Dynamics require minimisation of time in which the result could be obtained. While parallel techniques allow for handling of large problems, it is the adaptivity that ensures that computational effort is focused on interesting regions in time and space. parallel efflciency, in a domain decomposition based approach, strongly depends on partitioning quality. For adaptive simulation partitioning quality is lost due to the dynamic modification of the computational mesh. Maintaining high efflciency of parallelization requires rebalancing of the numerical load. this paper presents performance results of an adaptive and dynamically balanced in-house flow solver. the results indicate that the rebalancing technique might be used to remedy to the adverse effects of adaptivity on overall parallel performance.
In this paper we present CUDA kernels that compute an interval matrix product. Starting from a naive implementation we investigate possible speedups using commonly known techniques from standard matrix multiplication....
详细信息
ISBN:
(数字)9783642551956
ISBN:
(纸本)9783642551956
In this paper we present CUDA kernels that compute an interval matrix product. Starting from a naive implementation we investigate possible speedups using commonly known techniques from standard matrix multiplication. We also evaluate the achieved speedup when our kernels are used to accelerate a variant of an existing algorithm that finds an enclosure for the solution of a linear system. Moreover the quality of our enclosure is discussed.
Inspired by the proliferation of content-centric applications in the Internet, Information-Centric Networking (ICN) has emerged as a promising networking paradigm. Focusing on the delivery of content instead of the pa...
详细信息
ISBN:
(纸本)9781631900259
Inspired by the proliferation of content-centric applications in the Internet, Information-Centric Networking (ICN) has emerged as a promising networking paradigm. Focusing on the delivery of content instead of the pairwise communication between end-hosts, ICN inherently supports location-independent content/information distribution, through the means of in-network caching and multicast;as well as mobile computing. However, so far the vast majority of ICN research efforts have mostly focused on the design of sound and scalable architectures and protocols for the current Internet application landscape. In this paper, we revisit ICN in the context of a radically different application environment of smart grids and in particular, the case of smart charging of electric vehicles. Based on a thorough description of the currently forming application environment in the Netherlands, we highlight the inefficiencies resulting from a host-centric model. We then show how ICN can address these limitations and ultimately support quality and security in such application environment. Besides qualitative benefits, our preliminary analysis also demonstrates that ICN can substantially reduce communication and security complexity, thus fostering the development and widespread adoption of the smart charging application.
A new simplified definition of time-domain parallelism is introduced for explicit time evolution calculations, and is implemented on parallel machines with bucket-brigade type communications. By the use of an identity...
详细信息
ISBN:
(纸本)9783642552243
A new simplified definition of time-domain parallelism is introduced for explicit time evolution calculations, and is implemented on parallel machines with bucket-brigade type communications. By the use of an identity operator instead of introducing an approximate solver, a recurrence formula for the parareal-in-time algorithm is much simplified. In spite of such a simple definition, it is applicable to many of explicit time-evolution calculations. In addition, this approach overcomes several drawbacks known in the original parareal-in-time method. In order to implement this algorithm on parallel machines, a parallel bucket-brigade interface is introduced, which reduces programming and tuning costs for complicated space-time parallel programs.
暂无评论