As the sizes of FPGA device grow, the long run-time of the placement is becoming a great challenge for the FPGA design flow. Simulated annealing is the best-known method applied to this problem due to the good quality...
详细信息
ISBN:
(纸本)9781479927173
As the sizes of FPGA device grow, the long run-time of the placement is becoming a great challenge for the FPGA design flow. Simulated annealing is the best-known method applied to this problem due to the good quality of result (QoR), but its computation time seems not satisfactory. In this paper, we propose a parallel placement algorithm named MPP-SA (Multi-core parallel Placement algorithm based on Simulated Annealing). Our goal is to provide a fast placement algorithm with high QoR. MPP-SA has the same annealing schedule as the traditional simulated annealing, but it uses the parallel approach to move blocks concurrently by multiple threads that are run on different cores of the same processor. To ensure the correctness of the results, MPP-SA also uses synchronization technology and lock mechanism, which brings some overheads. However, experiment results show that these overheads have not seriously affected the performance of our algorithm, especial for large circuits. Compared with the placement algorithm of T_VPlace in VPR5.0, MPP-SA is able to decrease the run-time of 5 different size benchmark circuits by an average of 32%-42% without losing QoR.
Precise integration methods to solve structural dynamic responses and the corresponding time integration formula are composed of two parts: the multiplication of an exponential matrix with a vector and the integratio...
详细信息
Precise integration methods to solve structural dynamic responses and the corresponding time integration formula are composed of two parts: the multiplication of an exponential matrix with a vector and the integration term. The second term can be solved by the series solution. Two hybrid granularity parallel algorithms are designed, that is, the exponential matrix and the first term are computed by the fine-grained parallel algorithra and the second term is computed by the coarse-grained parallel algorithm. Numerical examples show that these two hybrid granularity parallel algorithms obtain higher speedup and parallel efficiency than two existing parallel algorithms.
A parallel algorithm of covariance matrix, which is used to realize the dimensionality reduction process of hyperspectral image based on Principal Component Analysis (PCA) and Minimum Noise Fraction (MNF), is proposed...
详细信息
ISBN:
(纸本)9781467311595
A parallel algorithm of covariance matrix, which is used to realize the dimensionality reduction process of hyperspectral image based on Principal Component Analysis (PCA) and Minimum Noise Fraction (MNF), is proposed in this paper. The performance of the parallel algorithm according to the experiment under cluster circumstance with message passing interface (MPI) is discussed. The Gustafsun Law and Amdahl Law usually used to analyze the parallel algorithm results are also discussed in this experiment. At last, some further research areas and questions have been listed.
Change detection is an important technique in damage assessment area. As the amount of remote sensing images and the complexity of algorithms rise, the demand for processing power is increasing. In this paper, we prop...
详细信息
ISBN:
(纸本)9780769546766
Change detection is an important technique in damage assessment area. As the amount of remote sensing images and the complexity of algorithms rise, the demand for processing power is increasing. In this paper, we propose PLog-FLCM, a parallel algorithm for change detection. It is implemented on AMD Accelerated parallel Processing (APP) SDK v2 based on Open Computing Language. The parallel characteristics and implementation details of the proposed PLog-FLICM algorithm are presented. Experiments on several Synthetic Aperture Radar(SAR) images demonstrate that the proposed algorithm outperform other algorithms, and the designed parallel algorithm can greatly reduce the computational time of change detection algorithm. It has achieved speedups of between 63 and 145 times on AMD Radeon HD 6870 Graphics Processing Unit(GPU).
Tackling the current volume of graph-structured data requires parallel tools. We extend our work on analyzing such massive graph data with the first massively parallel algorithm for community detection that scales to ...
详细信息
ISBN:
(数字)9783642314643
ISBN:
(纸本)9783642314636;9783642314643
Tackling the current volume of graph-structured data requires parallel tools. We extend our work on analyzing such massive graph data with the first massively parallel algorithm for community detection that scales to current data sizes, scaling to graphs of over 122 million vertices and nearly 2 billion edges in under 7300 seconds on a massively multithreaded Cray XMT. Our algorithm achieves moderate parallel scalability without sacrificing sequential operational complexity. Community detection partitions a graph into subgraphs more densely connected within the subgraph than to the rest of the graph. We take an agglomerative approach similar to Clauset, Newman, and Moore's sequential algorithm, merging pairs of connected intermediate subgraphs to optimize different graph properties. Working in parallel opens new approaches to high performance. On smaller data sets, we find the output's modularity compares well with the standard sequential algorithms.
A significative number of recent applications require numerical solution of large systems of Abel-Volterra integral equations. Here we propose a parallel algorithm to numerically solve a class of these systems, design...
详细信息
A significative number of recent applications require numerical solution of large systems of Abel-Volterra integral equations. Here we propose a parallel algorithm to numerically solve a class of these systems, designed for a distributed-memory MIMD architecture. In order to achieve a good efficiency we employ a fully parallel and fast convergent waveform relaxation (WR) method and evaluate the lag term by using FFT techniques. To accelerate the convergence of the WR method and to best exploit the parallel architecture we develop special strategies. The performances of the resulting code, NSWR4, are illustrated on some examples. (c) 2008 Elsevier B. V. All rights reserved.
parallel algorithms for accurate summation and dot product are proposed, They are parallelized versions of fast and accurate algorithms of calculating sum and dot product using error-free transformations which are rec...
详细信息
parallel algorithms for accurate summation and dot product are proposed, They are parallelized versions of fast and accurate algorithms of calculating sum and dot product using error-free transformations which are recently proposed by Ogita et al. [T. Ogita, S.M. Rump, S. Oishi, Accurate sum and dot product, SIAM J. Sci. Comput. 26 (6) (2005) 1955-1988]. They have shown their algorithms are fast in terms of measured computing time. However, due to the strong data dependence in the process of their algorithms, it is difficult to parallelize them. Similarly to their algorithms, the proposed parallel algorithms in this paper are designed to achieve the results as if computed in K-fold working precision with keeping the fastness of their algorithms. Numerical results are presented showing the performance of the proposed parallel algorithm of calculating dot product. (C) 2008 Elsevier B.V. All rights reserved.
parallel computing model plays a great basic role in advanced computing;Based on researching existing parallel computing models, this paper brings forward a parallel computing model-Layer Forward Net toward Cubic-R ar...
详细信息
ISBN:
(纸本)9780769548524;9781467330930
parallel computing model plays a great basic role in advanced computing;Based on researching existing parallel computing models, this paper brings forward a parallel computing model-Layer Forward Net toward Cubic-R architecture, and describes the model's structure, parameter, logic abstractly. Lastly towards typical N-Body problem, this paper designs a parallel algorithm, and analyses its complexity. The compared result shows that this model has low computing complexity, increase by layer and other merit.
Efficient mapping of a real-time HD video application to graphics hardware is challenging. Developers face the challenges of choosing the right parallelism model, balancing thread's process granularity between mas...
详细信息
ISBN:
(纸本)9781467345651
Efficient mapping of a real-time HD video application to graphics hardware is challenging. Developers face the challenges of choosing the right parallelism model, balancing thread's process granularity between massive computing resources on the GPU, and partitioning tasks between the CPU and GPU. The paper illustrated the mapping approaches by a case of HD H.264 encoderbased on X264 reference code and then evaluating it on state-of-the-art CPU and GPUs in depth. In the paper, we first split most of the computing task into Single-Instruction Multiple-Thread (SIMT) kernels, which are then chained intocertaininput/output data stream. Then we implementeda completedH.264 encoding on the computer unified device architecture (CUDA) platform. Finally, we present methods for exploiting multi-level parallelism and memory efficiency when mapping H.264 code, which we use to increase the efficiency of the execution on GPUs. Our experimental results show that computation efficiencyof GPU and then real-time encoding performance are achievedwith CUDA.
Monochromatic-square-free grid coloring is a challenging computational problem with connections to Ramsey-theoretic combinatorics and multiparty communication complexity for which no polynomial time algorithm is known...
详细信息
ISBN:
(纸本)9781450312035
Monochromatic-square-free grid coloring is a challenging computational problem with connections to Ramsey-theoretic combinatorics and multiparty communication complexity for which no polynomial time algorithm is known. In this paper, we report on a parallel search for exact grid coloring solutions and its implementation on a large-scale cluster computer. We obtain the first-known 2-color solution for the 14 X 14 grid.
暂无评论