the execution of parallel applications, using grid computing, requires an environment that enables them to be executed, managed, scheduled and monitored. the execution environment must provide a processing model, cons...
详细信息
ISBN:
(纸本)9781467351652;9780769549149
the execution of parallel applications, using grid computing, requires an environment that enables them to be executed, managed, scheduled and monitored. the execution environment must provide a processing model, consisting of programming and execution models, withthe objective appropriately exploiting grid computing characteristics. this paper proposes a parallelprocessing model, based on shared variables for grid computing, consisting of an execution model that is appropriate for the grid and a CPAR parallel language programming model. the environment is designed to execute parallel applications in grid computing, where all the characteristics present in grid computing are transparent to users. the results show that this environment is an efficient solution for the execution of parallel applications.
Nowadays, modern computer systems rely heavily on parallelprocessing, and not only because of the multicore CPUs bundled with any machine, even mobile devices, but more and more thanks to the parallelprocessing capa...
详细信息
ISBN:
(纸本)9783319654829;9783319654812
Nowadays, modern computer systems rely heavily on parallelprocessing, and not only because of the multicore CPUs bundled with any machine, even mobile devices, but more and more thanks to the parallelprocessing capacities of graphics processing units (GPU), general-purpose computing on graphics processing units (GPGPU) being one example. In this paper, relying on the DirectX 12 framework, we propose an innovative approach to enable parallelprocessing for graphical rendering on boththe CPU and GPU for the popular Racket functional programming language (formerly PLT Scheme), and importantly without compromising Racket's usability and programmer-friendliness. Our performance evaluations show significant improvements with respect to execution time (x3 speed-up in some cases), CPU utilisation time (reduced by as much as 80% in some scenarios) and the frame rate when using moving graphics.
the two volume set LNCS 7133 and LNCS 7134 constitutes the thoroughly refereed post-conference proceedings of the 10thinternationalconference on Applied parallel and Scientific Computing, PARA 2010, held in Reykjav&...
详细信息
ISBN:
(数字)9783642281518
ISBN:
(纸本)9783642281501
the two volume set LNCS 7133 and LNCS 7134 constitutes the thoroughly refereed post-conference proceedings of the 10thinternationalconference on Applied parallel and Scientific Computing, PARA 2010, held in Reykjavík, Iceland, in June 2010. these volumes contain three keynote lectures, 29 revised papers and 45 minisymposia presentations arranged on the following topics: cloud computing, HPC algorithms, HPC programming tools, HPC in meteorology, parallel numerical algorithms, parallel computing in physics, scientific computing tools, HPC software engineering, simulations of atomic scale systems, tools and environments for accelerator based computational biomedicine, GPU computing, high performance computing interval methods, real-time access and processing of large data sets, linear algebra algorithms and software for multicore and hybrid architectures in honor of Fred Gustavson on his 75th birthday, memory and multicore issues in scientific computing - theory and praxis, multicore algorithms and implementations for application problems, fast PDE solvers and a posteriori error estimates, and scalable tools for high performance computing.
the latest advancements in computer graphics architectures, as the replacement of some fixed stages of the pipeline for programmable stages (shaders), have been enabling the development of parallel general purpose app...
详细信息
ISBN:
(纸本)9783642246685
the latest advancements in computer graphics architectures, as the replacement of some fixed stages of the pipeline for programmable stages (shaders), have been enabling the development of parallel general purpose applications on massively parallel graphics architectures (Streaming Processors). For years the graphics processing unit (GPU) is being optimized for increasingly high throughput of massively parallel floating-point computations. However, only the applications that exhibit Data Level parallelism can achieve substantial acceleration in such architectures. In this paper we present a parallel implementation of the GridRT architecture for GPGPU ray tracing. Such architecture can expose two levels of parallelism in ray tracing: parallel ray processing and parallel intersection tests, respectively. We also present a traditional parallel implementation of ray tracing in GPGPU, for comparison against the GridRT-GPGPU implementation.
2D image convolution is ubiquitous in image processing and computer vision problems such as feature extraction. Exploiting parallelism is a common strategy for accelerating convolution. parallel processors keep gettin...
详细信息
ISBN:
(纸本)9781479923410
2D image convolution is ubiquitous in image processing and computer vision problems such as feature extraction. Exploiting parallelism is a common strategy for accelerating convolution. parallel processors keep getting faster, but algorithms such as image convolution remain memory bounded on parallel processors such as GPUs. therefore, reducing memory communication is fundamental to accelerating image convolution. To reduce memory communication, we reorganize the convolution algorithm to prefetch image regions to register, and we do more work per thread with fewer threads. To enable portability to future architectures, we implement a convolution autotuner that sweeps the design space of memory layouts and loop unrolling configurations. We focus on convolution with small filters (2x2-7x7), but our techniques can be extended to larger filter sizes. Depending on filter size, our speedups on two NVIDIA architectures range from 1.2x to 4.5x over state-of-the-art GPU libraries.
To accelerate the execution of most DSP(Digital Signal processing) algorithms such as FFT, FIR, Vector operations, while keeping the flexibility of the chip, a reconfigurable architecture (named ReDAr) for DSP is prop...
详细信息
ISBN:
(纸本)078037889X
To accelerate the execution of most DSP(Digital Signal processing) algorithms such as FFT, FIR, Vector operations, while keeping the flexibility of the chip, a reconfigurable architecture (named ReDAr) for DSP is proposed and implemented, and finally will be applied to the Radar system of Automatic Navigation Equipment. By analyzing these algorithms, the structure of Reconfigurable processing Element (RPE), the Crossbar interconnect network, the Memory organization. the host controlling strategy, and the data sequencing scheme of the architecture are conceived. and parts of them, including the RPE, Crossbar. data sequencer, are reconfigurable. After configuration. it can be interconnected into a parallel and pipelined framework. closely matching the algorithms and like a dedicated hardware. By simulation. the performances of these algorithms mapped onto this architecture are comparative to algorithm-specific chips in market, and satisfy the requirement of the targeted application.
Proteins are one of the most vital macromolecules on the cellular level. In order to understand the function of a protein, its structure needs to be determined. For this purpose, different computational approaches hav...
详细信息
ISBN:
(纸本)9783642246685
Proteins are one of the most vital macromolecules on the cellular level. In order to understand the function of a protein, its structure needs to be determined. For this purpose, different computational approaches have been introduced. Genetic algorithms can be used to search the vast space of all possible conformations of a protein in order to find its native structure. A framework for design of such algorithmsthat is both generic, easy to use and performs fast on distributed systems may help further development of genetic algorithm based approaches. We propose such a framework based on a parallel master-slave model which is implemented in C++ and Message Passing interface. We evaluated its performance on distributed systems with a different number of processors and achieved a linear acceleration in proportion to the number of processing units.
Erbium and ytterbium doped fiber lasers are becoming important Sources from telecom to industrial applications. this work focuses on laser architectures for non-conventional telecommunication bands and high power puls...
详细信息
ISBN:
(纸本)9781424426256
Erbium and ytterbium doped fiber lasers are becoming important Sources from telecom to industrial applications. this work focuses on laser architectures for non-conventional telecommunication bands and high power pulsed Sources for micromachining and material processing.
Finite-Difference Time-Domain (FDTD) has been proved to be a very useful computational electromagnetic algorithm. However, the scheme based on traditional general purpose processors can be computationally prohibitive ...
详细信息
ISBN:
(纸本)9783642131189
Finite-Difference Time-Domain (FDTD) has been proved to be a very useful computational electromagnetic algorithm. However, the scheme based on traditional general purpose processors can be computationally prohibitive and require thousands of CPU hours, which hinders the large-scale application of FDTD. With rapid progress on GPU hardware capability and its programmability, we propose in this paper a novel scheme in which GPU is applied to accelerate three-dimensional FDTD with UPML absorbing boundary conditions. this GPU-based scheme can reduce the computation time significantly, while obtaining high accuracy as compared withthe CPU-based scheme. With only one AMD ATI HD4850 GPU, when computational domain is up to (180x80x180), our implementation of the GPU-based FDTD performs approximately 93 times faster than the one running with Intel E2180 dual cores CPU.
暂无评论