this paper investigates the use of Graphics processing Units (GPUs) as general purpose parallelarchitectures, for the acceleration of the solution of the Economic Dispatch problem (ED) via stochastic search algorithm...
详细信息
parallel Computing System (PCS) is currently used widely in many applications of complex problems involving high computations. this is because it has the capability to process computations efficiently using a parallel...
详细信息
ISBN:
(纸本)9783642217289
parallel Computing System (PCS) is currently used widely in many applications of complex problems involving high computations. this is because it has the capability to process computations efficiently using a parallel scheme. ARS cluster is a low-cost PCS developed to implement processing of full-field digital mammograms. In this system eight processors are used to communicate via the Ethernet network using LINUX which is Fedora 7 as the operating system and Matlab Distributed Computing Server (MDCS) as a platform to process the digital mammograms. In this paper the Wavelet Transforms Modulus Maxima (WTMM) method is used to detect the edge of tumor in digital mammogram implemented on the ARS cluster. the study involved 80 digitized mammographic images obtained from the Malaysian National Cancer Center (NCC). the performance of the PCS in detecting the edge of tumors in digital mammograms using WTMM on the ARS cluster is reported. the experimental results showed that the speedup of the PCS improves when the number of processors is increased.
We present and evaluate a set of architectures for conversational dialogue systems, exploring rule-based and statistical classification approaches. In a case study, we show that while a rule-based dialogue policy is c...
详细信息
We introduce a parallel corpus of spoken Cantonese and written Chinese. this sentence-aligned corpus consists of transcriptions of Cantonese spoken in television programs in Hong Kong, and their corresponding Chinese ...
详细信息
We present a method for developing dense linear algebra algorithmsthat seamlessly scales to thousands of cores. It can be done with our project called DPLASMA (Distributed PLASMA) that uses a novel generic distribute...
详细信息
Commercial off-the-shelf (COTS) graphics processing units (GPU) perform the signal processing operations needed for video games and similar consumer applications. the high volume and competitive nature of that industr...
详细信息
Commercial off-the-shelf (COTS) graphics processing units (GPU) perform the signal processing operations needed for video games and similar consumer applications. the high volume and competitive nature of that industry have produced inexpensive GPUs with impressive amounts of signal processing power. these devices use parallelprocessingarchitectures to execute DSP algorithms far faster than single, or even multi-core central processing units typically found in workstations. this paper describes a project which improves the performance of a radar telemetry application using the NVidiaTM brand GPU and CUDATM software, although the results could be extended to other devices.
An approach to execute an improved soft morphological filter (ISMF) on Graphic processing Unit (GPU) is present in this paper. ISMF regarded as the extension of the standard morphological operators performs well in re...
详细信息
the proceedings contain 95 papers. the topics discussed include: scalable co-clustering algorithms;parallel prefix computation in the recursive dual-net;a two-phase differential synchronization algorithm for remote fi...
ISBN:
(纸本)3642131352
the proceedings contain 95 papers. the topics discussed include: scalable co-clustering algorithms;parallel prefix computation in the recursive dual-net;a two-phase differential synchronization algorithm for remote files;query optimization over parallel relational data warehouses in distributed environments by simultaneous fragmentation and allocation;a high efficient on-chip interconnection network in SIMD CMPs;design of a slot assignment scheme for link error distribution on wireless grid networks;dynamic resource tuning for flexible core chip multiprocessors;a grid based system for closure computation and online service;parallel domain decomposition methods for high-order finite element solutions of the Helmholtz problem;frequencies;quick forwarding of queries to relevant peers in a hierarchical P2P file search system;cluster-fault-tolerant routing in burnt pancake graphs;and edge-bipancyclicity of all conditionally faulty hypercubes.
the proceedings contain 95 papers. the topics discussed include: scalable co-clustering algorithms;parallel prefix computation in the recursive dual-net;a two-phase differential synchronization algorithm for remote fi...
ISBN:
(纸本)3642131182
the proceedings contain 95 papers. the topics discussed include: scalable co-clustering algorithms;parallel prefix computation in the recursive dual-net;a two-phase differential synchronization algorithm for remote files;query optimization over parallel relational data warehouses in distributed environments by simultaneous fragmentation and allocation;a high efficient on-chip interconnection network in SIMD CMPs;design of a slot assignment scheme for link error distribution on wireless grid networks;dynamic resource tuning for flexible core chip multiprocessors;a grid based system for closure computation and online service;parallel domain decomposition methods for high-order finite element solutions of the Helmholtz problem;frequencies;quick forwarding of queries to relevant peers in a hierarchical P2P file search system;cluster-fault-tolerant routing in burnt pancake graphs;and edge-bipancyclicity of all conditionally faulty hypercubes.
this paper presents a novel high parallel decoder architecture for the quasi-cyclic low-density parity-check (QC-LDPC) codes defined in WiMAX system. Based on the turbo-decoding message passing (TDMP) algorithm, this ...
详细信息
ISBN:
(纸本)9781457716171
this paper presents a novel high parallel decoder architecture for the quasi-cyclic low-density parity-check (QC-LDPC) codes defined in WiMAX system. Based on the turbo-decoding message passing (TDMP) algorithm, this architecture costs 8 similar to 16 clock cycles for each iteration in the decoding process. In the normalized comparison withthe state-of-art work, this design achieves up to 6.5x higher parallelism and 76% power reduction. the energy/bit/iteration of this design is only 1/5 of the previous work.
暂无评论