In this paper we used a generalized net which gives a possibility for parallel optimization of multilayer neural networks. For training the backpropagation algorithm with momentum was considered. We proposed a general...
详细信息
In the drug discovery field, solving the problem of virtual screening is a long term-goal. the scoring functionality which evaluates the fitness of the docking result is one of the major challenges in virtual screenin...
详细信息
ISBN:
(纸本)9783642131189
In the drug discovery field, solving the problem of virtual screening is a long term-goal. the scoring functionality which evaluates the fitness of the docking result is one of the major challenges in virtual screening. In general, scoring functionality in docking requires large amount of floating-point calculations and usually takes several weeks or even months to be finished. this time-consuming disadvantage is unacceptable especially when highly fatal and infectious virus arises such as SARS and H1N1. this paper presents how to leverage the computational power of GPU to accelerate Dock6 [1]'s Amber [2] scoring with NVIDIA CUDA [3] platform. We also discuss many factors that will greatly influence the performance after porting the Amber scoring to GPU, including thread management, data transfer and divergence hidden. Our GPU implementation shows a 6.5x speedup with respect to the original version running on AMD dual-core CPU for the same problem size.
In this paper we study algorithms for performing the LU and QR factorizations of dense matrices. Recently, two communication optimal algorithms have been introduced for distributed memory architectures, referred to as...
详细信息
the availability of real parallelism in multi-core based architectures has resurrected the interest in concurrent computing in general, and parallel computing in particular New languages and libraries have been recent...
详细信息
ISBN:
(纸本)9783642144028
the availability of real parallelism in multi-core based architectures has resurrected the interest in concurrent computing in general, and parallel computing in particular New languages and libraries have been recently proposed to increase productivity in the context of these architectures In this paper we present a novel approach that resorts to the service abstraction for annotating parallelism
this paper presents a GPU-based parallel Population Based Incremental Learning (PBIL) algorithm with a local search on bound constrained optimization problems. the genotype of an entire population is evolved in PBIL, ...
详细信息
ISBN:
(纸本)9780791849026
this paper presents a GPU-based parallel Population Based Incremental Learning (PBIL) algorithm with a local search on bound constrained optimization problems. the genotype of an entire population is evolved in PBIL, which was derived from Genetic algorithms. Graphics processing Units (GPU) is an emerging technology for desktop parallel computing. In this research, the classical PBIL is adapted in the data-parallel GPU computing platform. the global optimal search of the PBIL is enhanced by a local Pattern Search method. the hybrid PBIL method is implemented in the GPU environment, and compared to a similar implementation in the common computing environment with a Central processing Unit (CPU). Computational results indicate that GPU-accelerated PBIL method is effective and faster than the corresponding CPU implementation.
To enumerate chemical compounds with given path frequencies is a fundamental procedure in Chemo- and Bio-inforrnatics. the applications include structure determination, novel molecular development, etc. the problem co...
详细信息
ISBN:
(纸本)9783642131356
To enumerate chemical compounds with given path frequencies is a fundamental procedure in Chemo- and Bio-inforrnatics. the applications include structure determination, novel molecular development, etc. the problem complexity has been proven as NP-hard. Many methods have been proposed to solve this problem. However, most of them are heuristic algorithms. Fujiwara et al. propose a sequential branch-and-bound algorithm. Although it reaches all solutions and avoids exhaustive searching, the computation time still increases significantly when the number of atoms increases. Hence, in this paper, a parallel algorithm is presented for solving this problem. the experimental results showed that computation time was reduced even when more processes were launched. Moreover, the speed-up ratio for most of the test cases was satisfactory and, furthermore, it showed potential for use in drug design.
Multimedia applications are among the most dominant computing workloads driving innovations in high performance and cost effective systems. In this regard, modern general-purpose microprocessors have included multimed...
详细信息
ISBN:
(纸本)9783642131189
Multimedia applications are among the most dominant computing workloads driving innovations in high performance and cost effective systems. In this regard, modern general-purpose microprocessors have included multimedia extensions (e.g., MMX, SSE, VIS, MAX, ALTIVEC) to their instruction set architectures to improve the performance of multimedia with little added cost to microprocessors. Whereas prior studies of multimedia extensions have primarily focused on a single processor, this paper quantitatively evaluates the impact of multimedia extensions on system performance and efficiency for different number of processing elements (PEs) within an integrated multiprocessor array. this paper also identifies the optimal PE granularity for the array system and implementation technology in terms of throughput, area efficiency, and energy efficiency using architectural and workload simulation. Experimental results with cycle accurate simulation and technology modeling show that MMX-type instructions (a representative Intel's multimedia extensions) achieve an average speedup ranging from 1.24x (at a 65,536 PE system) to 5.65x (at a 4 PE system) over the baseline performance. In addition, the MMX-enhanced processor array increases both area and energy efficiency over the baseline for all the configurations and programs. Moreover, the highest area and energy efficiency are achieved at the number of PEs between 256 and 1,024. these evaluation techniques composed of performance simulation and technology modeling can provide solutions to the design challenges in a new class of multiprocessor array systems for multimedia.
We investigate the performance of the routines in LAPACK and the Successive Band Reduction (SBR) toolbox for the reduction of a dense matrix to tridiagonal form, a crucial preprocessing stage in the solution of the sy...
详细信息
ISBN:
(纸本)9783642143892
We investigate the performance of the routines in LAPACK and the Successive Band Reduction (SBR) toolbox for the reduction of a dense matrix to tridiagonal form, a crucial preprocessing stage in the solution of the symmetric eigenvalue problem, on general-purpose multicore processors. In response to the advances of hardware accelerators, we also modify the code in SBR. to accelerate the computation by off-loading a significant part of the operations to a graphics processor (GPU). Performance results illustrate the parallelism and scalability of these algorithms on current high-performance multi-core architectures.
Natural human-robot interaction requires different and more robust models of language understanding (NLU) than non-embodied NLU systems. In particular, architectures are required that (1) process language incrementall...
详细信息
ISBN:
(纸本)9781424448937
Natural human-robot interaction requires different and more robust models of language understanding (NLU) than non-embodied NLU systems. In particular, architectures are required that (1) process language incrementally in order to be able to provide early backchannel feedback to human speakers;(2) use pragmatic contexts throughout the understanding process to infer missing information;and (3) handle the underspecified, fragmentary, or otherwise ungrammatical utterances that are common in spontaneous speech. In this paper, we describe our attempts at developing an integrated natural language understanding architecture for HRI, and demonstrate its novel capabilities using challenging data collected in human-human interaction experiments.
Scientific applications usually exhibit irregular patterns of execution and high resource usage. parallelarchitectures are a feasible solution to face these drawbacks, but porting software to parallel platforms means...
详细信息
暂无评论