the web browser is a CPU-intensive program. Especially on mobile devices, webpages load too slowly, expending significant time in processing a document's appearance. Due to power constraints, most hardware-driven ...
详细信息
the power of contemporary processors is based more and more on multicore architecturesthis kind of power is accessible only to parallel applications, which are able to provide work for each core. Creating a scalable ...
详细信息
ISBN:
(纸本)9783642144028
the power of contemporary processors is based more and more on multicore architecturesthis kind of power is accessible only to parallel applications, which are able to provide work for each core. Creating a scalable parallel/multithreaded application efficiently using available cores is a difficult task, especially if I/O performance must be considered as well We consider a multithreaded database loader with a compressing function the performance of the loader is examined from a number of perspectives Because compression is a computationally intensive task, parallel execution can potentially provide a big advantage in this case A list or performance related areas we encountered is presented and discussed We identify and verify tools allowing us to deal with specific performance areas We find out, that only an orchestrated employment of several tools can bring the desired effect the discussion provides a general procedure one can follow when improving the performance of multithreaded programs Key performance areas specific to the database loader are pointed out A special interest, is directed towards performance variations observed when many parallelthreads are active on it multicore CPU A significant slowdown of computations is observed if many threads are computing simultaneously the slowdown is related mainly to memory access and cache behavior and it is much larger for Core2 Quad system than a dual Xeon machine
the MPI and OpetiMP implementations of the parallel simulated annealing algorithm solving the vehicle routing problem (VRPTW) are presented. the algorithm consists of a number of components which co-operate periodical...
详细信息
ISBN:
(纸本)9783642143892
the MPI and OpetiMP implementations of the parallel simulated annealing algorithm solving the vehicle routing problem (VRPTW) are presented. the algorithm consists of a number of components which co-operate periodically by exchanging their best solutions found to date. the objective of the work is to explore speedups and scalability of the two implementations. For comparisons the selected VRPTW benchmarking tests are used.
In this paper we develop the parallel numerical algorithm for modelling of electromagnetic properties of thin conductive layers. the explicit finite difference scheme is obtained after approximation of the system of d...
详细信息
Stochastic algorithms for Ultra-fast Transport in sEmiconductors (SALUTE) is a Grid Application which integrates a set of novel Monte Gallo, quasi-Monte Carlo and hybrid algorithms for solving various computationally ...
详细信息
ISBN:
(纸本)9783642144028
Stochastic algorithms for Ultra-fast Transport in sEmiconductors (SALUTE) is a Grid Application which integrates a set of novel Monte Gallo, quasi-Monte Carlo and hybrid algorithms for solving various computationally intensive problems important for industry (design of modern semiconductor devices) SALUTE studies memory and quantum effects during the femtosecond relaxation process due to electron-phonon interaction in one-band semiconductors or quantum wires there are two main reasons for running this application on the Grid (i) quantum problems are very computationally intensive (ii) the inherently parallel nature of Monte Carlo applications makes efficient use of Grid resources I n this paper we study the quasirandom approach in SALUTE, using the scrambled Halton, Sobol and Niederreiter sequences A large number of tests have been performed on the SEEGRID grid infrastructure using specially developed grid implementation scheme Novel results for energy and density distribution, obtained in the inhomogeneous case with applied electric field are presented
In this paper, we propose an implementation of a parallelthree-dimensional fast Fourier transform (FFT) with two-dimensional decomposition on a massively parallel cluster of multi-core processors. the proposed parall...
详细信息
ISBN:
(纸本)9783642143892
In this paper, we propose an implementation of a parallelthree-dimensional fast Fourier transform (FFT) with two-dimensional decomposition on a massively parallel cluster of multi-core processors. the proposed parallelthree-dimensional FFT algorithm is based on the multicolumn FFT algorithm. We show that a two-dimensional decomposition effectively improves performance by reducing the communication time for larger numbers of MPI processes. We successfully achieved a performance of over 401 G Flops on 256 nodes of Appro Xtrerne-X3 (648 nodes, 147.2 GFlops/node, 95.4 TFlops peak performance) for 256(3)-point EFT.
Large scale computing requires parallelization in order to arrive at solution at;reasonable time. Today parallelization is a standard in fluid problems simulation. On the other hand adaptation is a. technique that all...
详细信息
ISBN:
(纸本)9783642143892
Large scale computing requires parallelization in order to arrive at solution at;reasonable time. Today parallelization is a standard in fluid problems simulation. On the other hand adaptation is a. technique that allows for dynamic modification of the mesh as the need for locally higher resolution arises. Adaptation used during parallel simulation leads to unbalanced numerical load. this in turn decreases the efficiency of parallelization. Dynamic load balancing strategies should be applied in order to ensure proper parallelization efficiency. the paper presents the potential benefits of applying the dynamic load balancing to adaptive flow problems simulated in parallel environments.
the Union-Find algorithm is used for maintaining a number of non-overlapping sets from a finite universe of elements. the algorithm has applications in a number of areas including the computation of spanning trees, sp...
详细信息
ISBN:
(纸本)9783642143892
the Union-Find algorithm is used for maintaining a number of non-overlapping sets from a finite universe of elements. the algorithm has applications in a number of areas including the computation of spanning trees, sparse linear algebra, and in image processing. Although the algorithm is inherently sequential there has been some previous efforts at constructing parallel implementations. these have mainly focused on shared memory computers. In this paper we present the first scalable parallel implementation of the Union-Find algorithm suitable for distributed memory computers. Our new parallel algorithm is based on an observation of how the Find part of the sequential algorithm can be executed more efficiently. We show the efficiency of our implementation through a series of tests to compute spanning forests of very large graphs.
the proceedings contain 98 papers. the topics discussed include: starsscheck: a tool to find errors in task-based parallel programs;automated tuning in parallel sorting on multi-core architectures;estimating and explo...
ISBN:
(纸本)3642152767
the proceedings contain 98 papers. the topics discussed include: starsscheck: a tool to find errors in task-based parallel programs;automated tuning in parallel sorting on multi-core architectures;estimating and exploiting potential parallelism by source-level dependence profiling;efficient graph partitioning algorithms for collaborative grid workflow developer environments;profile-driven selective program loading;characterizing the impact of using spare-cores on application performance;a model for space-correlated failures in large-scale distributed systems;architecture exploration for efficient data transfer and storage in data-parallel applications;non-clairvoyant scheduling of multiple bag-of-tasks applications;extremal optimization approach applied to initial mapping of distributed java programs;a parallel implementation of the Jacobi-Davidson eigensolver and its application in a plasma turbulence code;and exploiting fine-grained parallelism on cell processors.
the proceedings contain 98 papers. the topics discussed include: starsscheck: a tool to find errors in task-based parallel programs;automated tuning in parallel sorting on multi-core architectures;estimating and explo...
ISBN:
(纸本)3642152902
the proceedings contain 98 papers. the topics discussed include: starsscheck: a tool to find errors in task-based parallel programs;automated tuning in parallel sorting on multi-core architectures;estimating and exploiting potential parallelism by source-level dependence profiling;efficient graph partitioning algorithms for collaborative grid workflow developer environments;profile-driven selective program loading;characterizing the impact of using spare-cores on application performance;a model for space-correlated failures in large-scale distributed systems;architecture exploration for efficient data transfer and storage in data-parallel applications;non-clairvoyant scheduling of multiple bag-of-tasks applications;extremal optimization approach applied to initial mapping of distributed java programs;a parallel implementation of the Jacobi-Davidson eigensolver and its application in a plasma turbulence code;and exploiting fine-grained parallelism on cell processors.
暂无评论