Based on the principle of QR loop iterations, this paper implements a parallel algorithm based on the hardware of GPU (Graphic Process Unit) by using routines from CUDA (Computer Unified Device Architecture) to find t...
详细信息
ISBN:
(纸本)9781479925667
Based on the principle of QR loop iterations, this paper implements a parallel algorithm based on the hardware of GPU (Graphic Process Unit) by using routines from CUDA (Computer Unified Device Architecture) to find the eigenvalues of general matrices. CPU and GPU computing card form a client-server computing framework. Here, CPU can be regarded as a client and GPU card is considered as a computing server. For the experiment environment, this paper chooses a GPU card with the model of NVIDIA GeForce GTX460 as a server side and a CPU with the model of Intel Core i5-760 quad-core as a client side. Win7 64-bit is selected as the operating system. The parallel implementation consists of two parts:PA H and PA QR. PA H is a procedure that transforms a matrix A into the Hessenberg matrix B. PA QR is the actual parallel algorithm of the QR iterations that is imposed on the Hessenberg matrix B for finding eigenvalues of the general matrix A. The speedup ratio of the proposed algorithm is jarless when the number of iterations becomes greater. The experimental results show that the parallel implementation with CUDA on GPU only makes use of less running time than traditional sequential algorithms. The speedup ratio of PA H is between 1.79 and 7.81. The speedup ratio of PA QR is between 3.24 and 118.9. Especially, when the order of general matrix equals 8192, the amount of iterations becomes 10000. The speedup ratio of the PA-H and the PA QR can run up to 7.81 and 118.9 respectively.
Approximate string matching using the k-difference technique has been widely applied to many fields such as pattern recognition and computational biology. Data dependency exists in the traditional sequential algorithm...
详细信息
Approximate string matching using the k-difference technique has been widely applied to many fields such as pattern recognition and computational biology. Data dependency exists in the traditional sequential algorithm. Therefore, it is hard to design a parallel algorithm for approximate string matching with k differences. This paper presents a technique to eliminate data dependency. Based on this technique, this paper also presents a parallel algorithm which can calculate the elements in the same row of the edit distance matrix in parallel by eliminating data dependency. The algorithm has high parallelism, but requires synchronization. To validate the proposed algorithm, it is implemented on GPU and multiple-core CPUs. Moreover, the CUDA optimization techniques are also presented in the paper. Finally, experimental results show that, compared with the traditional sequential algorithm on CPU with twenty-four cores, the proposed parallel algorithm achieves speedup of 7-42 on GPU.
The Graph Isomorphism (GI) problem has been extensively studied due to its significant applications. The most effective class of GI algorithms, i.e., canonical labeling algorithms, are suitable for either graphs with ...
详细信息
The Graph Isomorphism (GI) problem has been extensively studied due to its significant applications. The most effective class of GI algorithms, i.e., canonical labeling algorithms, are suitable for either graphs with high randomness or symmetry, or graphs for which both of them are not strongly held. Also, the core operations of canonical labeling algorithms, i.e., individuation-refinement (IR) and certificate comparison, usually occupy more than 70% of the total running time. How to weaken the limitations of structures and improve the efficiency of these operations are challenges. In this paper, we present an efficient GI algorithm called PEACE, which is particularly suitable for graphs with high randomness or symmetry. We present a parallel implementation of our algorithm on GPUs. We design some new techniques and also use some existing methods to speed up calculations under CUDA. More importantly, these techniques can be applied to all IR-based GI algorithms. We evaluate the proposed algorithm on various graphs to make comprehensive comparison with currently the most efficient canonical labeling algorithms on CPUs. Experimental results show that PEACE is superior to other algorithms on graphs with high symmetry or many automorphisms, and up to 50% performance increase can be achieved in the best case. We also apply our parallel techniques on these algorithms, and compare the performance on CPU and multiple GPU devices. The results show that the techniques make all algorithms gain 15~55 speedup.
There are limitations in traditional anti-theft technologies. All the technologies have a common ground, that is: the devices are exposed to the space, also easily found and destroyed by adversaries. This paper demons...
详细信息
keyword search has become one of hot topics in the field of information retrieval. It can provide users a simple and friendly interface. But the efficiency of some existing keyword search algorithms is low and there a...
详细信息
Due to the nature of multi-radio multi-channel wireless sensor networks, such as the quality of service of the links, channel conflict etc., we investigated the problem of data query based on data cache and channel sw...
详细信息
Due to the nature of multi-radio multi-channel wireless sensor networks, such as the quality of service of the links, channel conflict etc., we investigated the problem of data query based on data cache and channel switch, and proved it to be an NP-complete problem. Firstly, we constructed a LP equation based on the data flow conservation and link-channel constraint etc. to formulate the problem, then designed a polynomial approximate algorithm. The algorithm used dynamic programming strategy to minimize the delay of unit data packet transmission from cache nodes to the query node, greedily chose a cache node with the smallest delay of unit data packet transmission, and collected the new covered data packets. Theoretical analysis and experimental results indicate that the proposed algorithm can reduce the communicate delay and improve the efficiency of query effectively.
Data aggregation is a fundamental and yet time-consuming task in WSNs, especially in high-density WSNs. Therefore, people have focused on the problem of minimum-latency data aggregation. The problem has been already p...
详细信息
Data aggregation is a fundamental and yet time-consuming task in WSNs, especially in high-density WSNs. Therefore, people have focused on the problem of minimum-latency data aggregation. The problem has been already proved that it is an NP-hard. This paper proposes a cluster-based data aggregation scheduling algorithm called MPMC in multi-channel and multi-power WSNs to minimize the data aggregation latency. The paper adopts the idea of that the low power is used for packet transmission in inner-cluster and high power is used for packet transmission between clusters. This paper analyzes the number of channel under different topologies that approaches a constant. In simulation experiments, MPMC compares with the best algorithm based on single channel and the best algorithm based on multi-channel. Simulation results show that the MPMC algorithm proposed in this paper achieves the minimum average latency.
A distributed multi-dimensional probabilistic Top-k (DMPT) query processing algorithm was proposed for sensor networks. DMPT exploited Skyline operator to get the Top-k results and reduced data transmissions and query...
详细信息
A distributed multi-dimensional probabilistic Top-k (DMPT) query processing algorithm was proposed for sensor networks. DMPT exploited Skyline operator to get the Top-k results and reduced data transmissions and query delay through feedback and filtering mechanism. Data uncertainty, multi-dimensional attributes, distributed network and energy limitation were considered in DMPT, and the Top-k results were attained by computing Skylayers of data. Experiments on real and simulated data are conducted to demonstrate that DMPT has better energy efficiency and faster response compared with traditional algorithm.
Solving the eigenvalues of matrices is an open problem which is often related to scientific computation. With the increasing of the order of matrices, traditional sequential algorithms are unable to meet the needs for...
详细信息
In the procedure of three-way handshake of transmission control protocol connection establishment, it involves the management of half-connection and connection tables. However, in the famous networks simulator NS2, th...
详细信息
In the procedure of three-way handshake of transmission control protocol connection establishment, it involves the management of half-connection and connection tables. However, in the famous networks simulator NS2, there is only a formal description instead of a concrete and complete implementation for the procedure of TCP connection establishment. An improvement was beed made on the point. Table structures for half-connections and connections are added, and the way of managing half-connection table in Linux kernel is also imported into NS2. Simulation results show that it is clear to monitor the change in half-connection table in the TCP connection establishment procedure, and thus the requirement of simulating the connection establishment procedure in studies such as protection from TCP SYN flooding attack can be met.
暂无评论