Matrix eigenvalue theory has become an important analysis tool in scientific computing. Sometimes, people do not need to find all eigenvalues but only the maximum eigenvalue. Existing algorithms of finding the maximum...
详细信息
ISBN:
(数字)9783319111940
ISBN:
(纸本)9783319111940;9783319111933
Matrix eigenvalue theory has become an important analysis tool in scientific computing. Sometimes, people do not need to find all eigenvalues but only the maximum eigenvalue. Existing algorithms of finding the maximum eigenvalue of matrices are implemented sequentially. Withthe increasing of the orders of matrices, the workload of calculation is getting heavier. therefore, traditional sequential methods are unable to meet the need of fast calculation for large matrices. this paper proposes a parallel algorithm named PA-ST to find the maximum eigenvalue of positive matrices by using similarity transformation which is implemented by CUDA (Computer Unified Device Architecture) on GPU (Graphic Process Unit). To the best of our knowledge, this is the first CUDA based parallel algorithm of calculating maximum eigenvalue of matrices. In order to improve the performance, optimization techniques are applied in this paper such as using the shared memory rather than the global memory to improve the speed of computation, avoiding bank conflicts by setting the span index, satisfying the principle of coalesced memory access, and by using single-precision floating-point arithmetic and the pinned memory to reduce the copy operation and obtain higher data transfer bandwidth between the host and the GPU device. the experimental results show that the similarity transformation technique can significantly shorten the running time compared to the sequential algorithm and the speedup ratio is nearly stable when the number of iterations increases. As the matrix order increases, the running time of the sequential algorithm and PA-ST increases correspondingly. Experiments also show that the speedup ratio of the PA-ST is between 2.85 and 35.028.
In this paper, we address the problem of defining a semantic indexing techniques based on RDF triples. In particular, we define algorithms for: i) defining clustering techniques of semantically similar RDF triplets;ii...
详细信息
this paper introduces a heuristic-based scheduler to optimise the throughput and latency of stream programs with dynamic network structure. the novelty is the utilisation of positive and negative demands of the stream...
详细信息
With a suitable method to rank the user influence in micro-blogging service, we could get influential individuals to make information reach large populations. Here a novel parallel social influence model is proposed t...
详细信息
the number of nodes inside supercomputers is continuously increasing. As detailed in the TOP500 list, there are now systems that include more than one million nodes;for instance China's Tianhe-2. To cope withthis...
详细信息
We present in this paper a security-driven solution for scheduling of N independent jobs on M parallel machines that minimizes three different objectives simultaneously, namely the failure probability, the total compl...
详细信息
the programming model of parallel tasks is a suitable programming abstraction for parallel applications running on heterogeneous clusters, which are clusters composed of multiple subclusters. In this model, an applica...
详细信息
Withthe development of microarray technology, it is possible now to study and measure the expression profiles of thousands of genes simultaneously which can lead to identify subgroup of specific disease or extract hi...
详细信息
Graph500 is a benchmark suite for big data analysis. Matrices used for Graph500 inherit the properties of graph analysis such as breadth first search for SNS and PageRank for web searching engine. Especially power sav...
详细信息
暂无评论