In this paper we evaluate a new coalesced data and kernel scheme used to reduce the execution costs of cardiac simulations that run on multi-GPU environments. the new scheme was tested for an important part of the sim...
详细信息
ISBN:
(纸本)9783319495835;9783319495828
In this paper we evaluate a new coalesced data and kernel scheme used to reduce the execution costs of cardiac simulations that run on multi-GPU environments. the new scheme was tested for an important part of the simulator, the solution of the systems of Ordinary Differential Equations (ODEs). the results have shown that the proposed scheme is very effective. the execution time to solve the systems of ODEs on the multi-GPU environment was reduced by half, when compared to a scheme that does not implemented the proposed data and kernel coalescing. As a result, the total execution time of cardiac simulations was 25% faster.
the Gauss-Huard algorithm (the GHA) is a specialized version of Gauss-Jordan elimination for the solution of linear systems that, enhanced with column pivoting, exhibits numerical stability and computational cost clos...
详细信息
ISBN:
(纸本)9783319495835;9783319495828
the Gauss-Huard algorithm (the GHA) is a specialized version of Gauss-Jordan elimination for the solution of linear systems that, enhanced with column pivoting, exhibits numerical stability and computational cost close to those of the conventional solver based on the LU factorization with row pivoting. Furthermore, the GHA can be formulated as a procedure rich in matrix multiplications, so that high performance can be expected on current architectures with multi-layered memories. Unfortunately, in principle the GHA does not admit the introduction of look-ahead, a technique that has been demonstrated to be rather useful to improve the performance of the LU factorization on multi-threaded platforms with high levels of hardware concurrency. In this paper we analyze the effect of this drawback on the implementation of the GHA on systems accelerated with graphics processing units (GPUs), exposing the roles of the CPU-to-GPU and single precision-to-double precision performance ratios, as well as the contribution from the operations in the algorithm's critical path.
the increasing use of mobile social networks has lately transformed news media. Real-world events are nowadays reported in social networks much faster than in traditional channels. As a result, the autonomous detectio...
详细信息
ISBN:
(纸本)9783319495835;9783319495828
the increasing use of mobile social networks has lately transformed news media. Real-world events are nowadays reported in social networks much faster than in traditional channels. As a result, the autonomous detection of events from networks like Twitter has gained lot of interest in both research and media groups. DBSCAN-like algorithms constitute a well-known clustering approach to retrospective event detection. However, scaling such algorithms to geographically large regions and temporarily long periods present two major shortcomings. First, detecting real-world events from the vast amount of tweets cannot be performed anymore in a single machine. Second, the tweeting activity varies a lot within these broad space-time regions limiting the use of global parameters. Against this background, we propose to scale DBSCAN-like event detection techniques by parallelizing and distributing them through a novel density-aware MapReduce scheme. the proposed scheme partitions tweet data as per its spatial and temporal features and tailors local DBSCAN parameters to local tweet densities. We implement the scheme in Apache Spark and evaluate its performance in a dataset composed of geo-located tweets in the Iberian peninsula during the course of several football matches. the results pointed out to the benefits of our proposal against other state-of-the-art techniques in terms of speed-up and detection accuracy.
Current parallel programming frameworks aid to a great extent developers to implement applications in order to exploit parallel hardware resources. Nevertheless, developers require additional expertise to properly use...
详细信息
Kirchhoff pre-stack depth migration (KPSDM) algorithm, as one of the most widely used migration algorithms, plays an important part in getting the real image of the earth. However, this program takes considerable time...
详细信息
ISBN:
(数字)9783319111940
ISBN:
(纸本)9783319111940;9783319111933
Kirchhoff pre-stack depth migration (KPSDM) algorithm, as one of the most widely used migration algorithms, plays an important part in getting the real image of the earth. However, this program takes considerable time due to its high computational cost;hence the working efficiency of the oil industry is affected. the general purpose Graphic processing Unit (GPU) and the Compute Unified Device Architecture (CUDA) developed by NVIDIA have provided a new solution to this problem. In this study, we have proposed a parallel algorithm of the Kirchhoff pre-stack depth migration and an optimization strategy based on the CUDA technology. Our experiments indicate that for large data computations, the accelerated algorithm achieves a speedup of 8 similar to 15 times compared with NVIDIA GPU.
We show that developing an optimal parallelization of the two-list algorithm is much easier than we once thought. All it takes is to observe that the steps of the search phase of the two-list algorithm are closely rel...
详细信息
To make parallel programming as widespread as parallelarchitectures, more structured parallel programming paradigms are necessary. One of the possible approaches are algorithmic skeletons. they can be seen as higher ...
详细信息
In this paper, we propose an implementation of a parallel two-dimensional fast Fourier transform (FFT) using Intel Advanced Vector Extensions (AVX) instructions on multi-core processors. the combination of vectorizati...
详细信息
Most of cryptographic systems are based on modular exponentiation. It is performed using successive modular multiplications. One way of improving the throughput of a cryptographic system implementation is reducing the...
详细信息
this two volume set LNCS 7016 and LNCS 7017 constitutes the refereed proceedings of the 11thinternationalconference on algorithms and architectures for parallelprocessing, ica3pp 2011, held in Melbourne, Australia,...
详细信息
ISBN:
(数字)9783642246692
ISBN:
(纸本)9783642246685
this two volume set LNCS 7016 and LNCS 7017 constitutes the refereed proceedings of the 11thinternationalconference on algorithms and architectures for parallelprocessing, ica3pp 2011, held in Melbourne, Australia, in October 2011. the second volume includes 37 papers from one symposium and three workshops held together withica3pp 2011 main conference. these are 16 papers from the 2011 international Symposium on Advances of Distributed Computing and Networking (ADCN 2011), 10 papers of the 4th IEEE international Workshop on Internet and Distributed Computing Systems (IDCS 2011), 7 papers belonging to the III international Workshop on Multicore and Multithreaded architectures and algorithms (M2A2 2011), as well as 4 papers of the 1st IEEE international Workshop on parallelarchitectures for Bioinformatics Systems (HardBio 2011).
暂无评论