Flow direction algorithm based on gridded DEM is one kind of the most widely used algorithms in digital terrain analysis. Being a typical recursive algorithm, flow direction algorithm coded traditionally for sequentia...
详细信息
the Application of word sense disambiguation (WSD) methods based on supervised machine learning are limited by the difficulties in defining sense tags and acquiring labeled data for training. In this paper, the two pr...
详细信息
this paper investigates the use of Graphics processing Units (GPUs) as general purpose parallelarchitectures, for the acceleration of the solution of the Economic Dispatch problem (ED) via stochastic search algorithm...
详细信息
We present new multithreaded vertex ordering and distance-k graph coloring algorithmsthat are well-suited for multicore platforms. the vertex ordering techniques rely on various notions of "degree", are kno...
详细信息
ISBN:
(纸本)9783642233975
We present new multithreaded vertex ordering and distance-k graph coloring algorithmsthat are well-suited for multicore platforms. the vertex ordering techniques rely on various notions of "degree", are known to be effective in reducing the number of colors used by a greedy coloring algorithm, and are generic enough to be applicable to contexts other than coloring. We employ approximate degree computation in the ordering algorithms and speculation and iteration in the coloring algorithms as our primary tools for breaking sequentiality and achieving effective parallelization. the algorithms have been implemented using OpenMP, and experiments conducted on Intel Nehalem and other multicore machines using various types of graphs attest that the algorithms provide scalable runtime performance. the number of colors the algorithms use is often close to optimal. the techniques used for computing the ordering and coloring in parallel are applicable to other problems where there is an inherent ordering to the computations that needs to be relaxed for increasing concurrency.
this paper reports on methods for the parallelization of artificial neural networks algorithms using multithreaded and multicore CPUs in order to speed up the training process. the developed algorithms were implemente...
详细信息
ISBN:
(纸本)9783642202810
this paper reports on methods for the parallelization of artificial neural networks algorithms using multithreaded and multicore CPUs in order to speed up the training process. the developed algorithms were implemented in two common parallel programming paradigms and their performances are assessed using four datasets with diverse amounts of patterns and with different neural network architectures. All results show a significant increase in computation speed. which is reduced nearly linear withthe number of cores for problems with very large training datasets.
Computing unto 100GOPS without cooling is essential for high-end embedded systems and much required by markets. A novel master-slave multi-SIMD architecture and its kernel (template) based parallel programming flow is...
详细信息
We present a high performance tsunami-prediction system using General Purpose Graphics processing Units (GPGPU). It is based on TUNAMI-N1, a Numerical Analysis Model for Investigation of near-field tsunamis. It uses l...
详细信息
this paper presents a novel high parallel decoder architecture for the quasi-cyclic low-density parity-check (QC-LDPC) codes defined in WiMAX system. Based on the turbo-decoding message passing (TDMP) algorithm, this ...
详细信息
ISBN:
(纸本)9781457716171
this paper presents a novel high parallel decoder architecture for the quasi-cyclic low-density parity-check (QC-LDPC) codes defined in WiMAX system. Based on the turbo-decoding message passing (TDMP) algorithm, this architecture costs 8 similar to 16 clock cycles for each iteration in the decoding process. In the normalized comparison withthe state-of-art work, this design achieves up to 6.5x higher parallelism and 76% power reduction. the energy/bit/iteration of this design is only 1/5 of the previous work.
Commercial off-the-shelf (COTS) graphics processing units (GPU) perform the signal processing operations needed for video games and similar consumer applications. the high volume and competitive nature of that industr...
详细信息
Commercial off-the-shelf (COTS) graphics processing units (GPU) perform the signal processing operations needed for video games and similar consumer applications. the high volume and competitive nature of that industry have produced inexpensive GPUs with impressive amounts of signal processing power. these devices use parallelprocessingarchitectures to execute DSP algorithms far faster than single, or even multi-core central processing units typically found in workstations. this paper describes a project which improves the performance of a radar telemetry application using the NVidiaTM brand GPU and CUDATM software, although the results could be extended to other devices.
Molecular dynamics (MD) simulation has broad applications, but its irregular memory-access pattern makes performance optimization a challenge. this paper presents a joint application/architecture study to enhance on-c...
详细信息
ISBN:
(纸本)9781450306980
Molecular dynamics (MD) simulation has broad applications, but its irregular memory-access pattern makes performance optimization a challenge. this paper presents a joint application/architecture study to enhance on-chip parallelism of MD on Godson-T-like many-core architecture. First, a preprocessing leveraging an adaptive divide-and-conquer framework is designed to exploit locality through memory hierarchy with software controlled memory. then we propose three incremental optimization strategies: (1) a novel data-layout to re-organize linked-list cell data structures to improve data locality;(2) an on-chip locality-aware parallel algorithm to enhance data reuse;and (3) a pipelining algorithm to hide latency to shared memory. Experiments on Godson-T simulator exhibit strong-scaling parallel efficiency 0.99 on 64 cores, which is confirmed by an FPGA emulator. Detailed analysis shows that optimizations utilizing architectural features to maximize data locality and to enhance data reuse benefit scalability most. Furthermore, a simple performance model suggests that the optimization scheme is likely to scale well toward exascale. Certain architectural features are found essential for these optimizations, which could guide future hardware developments.
暂无评论