The proceedings contain 10 papers. The topics discussed include: scaling parallel 3-D FFT with non-blocking MPI collectives;exploiting data representation for fault tolerance;VCube: a provably scalable distributed dia...
ISBN:
(纸本)9781479975624
The proceedings contain 10 papers. The topics discussed include: scaling parallel 3-D FFT with non-blocking MPI collectives;exploiting data representation for fault tolerance;VCube: a provably scalable distributed diagnosis algorithm;TX: algorithmic energy saving for distributed dense matrix factorizations;CholeskyQR2: a simple and communication-avoiding algorithm for computing a tall-skinny qr factorization on a large-scale parallel system;deflation strategies to improve the convergence of communication-avoiding GMRES;a framework for parallel genetic algorithms for distributed memory architectures;the anatomy of Mr. Scan: a dissection of performance of an extreme scale GPU-based clustering algorithm;performance and portability with OpenCL for throughput-oriented HPC workloads across accelerators, coprocessors, and multicore processors;and a hierarchical tridiagonal system solver for heterogenous supercomputers.
In linear algebra, Cholesky factorization is useful in solving a system of equations with a symmetric positive definite coefficient matrix. Cholesky factorization is roughly twice as fast relative to LU factorization ...
详细信息
ISBN:
(纸本)9781467345651;9780769549033
In linear algebra, Cholesky factorization is useful in solving a system of equations with a symmetric positive definite coefficient matrix. Cholesky factorization is roughly twice as fast relative to LU factorization which applies to general matrices. In recent years, with advances in technology, a Fermi GPU card can accommodate hundreds of cores compared to the small number of 8 or 16 cores on CPU. Therefore a trend is seen to use the graphics card as a general purpose graphics processing unit (GPGPU) for parallel computation. In this work, Volkov's hybrid implementation of Cholesky factorization is evaluated on the new Fermi GPU with others and then some improvement strategies were proposed. After experiments, compared to the CPU version using Intel Math Kernel Library (MKL), our proposed GPU improvement strategy can achieve a speedup of 3.85x on Cholesky factorization of a square matrix of dimension 10,000.
XtremWeb-CH (XWCH) is a software system that makes it easy for scientists and industrials to deploy and execute their parallel and distributed applications on a public-resource computing infrastructure. The objective ...
详细信息
ISBN:
(纸本)9783540723592
XtremWeb-CH (XWCH) is a software system that makes it easy for scientists and industrials to deploy and execute their parallel and distributed applications on a public-resource computing infrastructure. The objective of XWCH is to develop a real High Performance Peer-To-Peer platform with a distributed scheduling and communication system. The main idea is to build a completely symmetric model where nodes can be providers and consumers at the same time. This paper describes the different "components" of an XWCH infrastructure and the new features proposed by this platform compared to other similar Global computing projects. It also describes the porting, the deployment and the execution of a phylogenetic CPU time consuming application on an experimental XWCH platform.
Advanced silicon and plasmonic nanophotonics is undergoing rapid progress due to its manifold applications in high data communication links and other applications in imaging and sensing. Our group has been at the fore...
详细信息
ISBN:
(纸本)9781450340618
Advanced silicon and plasmonic nanophotonics is undergoing rapid progress due to its manifold applications in high data communication links and other applications in imaging and sensing. Our group has been at the forefront of new devices and device physics. In this talk we will first review progress in our group in a wide variety of fundamental technologies and physics needed to extend the advances in nanophotonics. We will then illustrate these ideas with several new devices types that we have recently demonstrated at Columbia based on new simulation modalities. Our approach then to modeling and simulation is to use fully accurate methods and techniques and to achieve new capabilities based on massively parallel and high-performance computation. Much of our advances are based on new hardware strengths and testing with distributed and parallel systems.
Enormous amount of news articles are added and updated on the Internet round-the-clock. This requires frequent and intensive processing by the news retrieval system. The news retrieval systems in use today, barely mee...
详细信息
ISBN:
(纸本)9781450311960
Enormous amount of news articles are added and updated on the Internet round-the-clock. This requires frequent and intensive processing by the news retrieval system. The news retrieval systems in use today, barely meet this requirement. Cloudpress 2.0 presented in this paper, is designed and implemented to be scalable, robust and fault tolerant. It is designed to exploit MapReduce paradigm for fetching, processing, organizing and summarizing all the news articles and to use the power of the Cloud computing. Furthermore, it uses novel approaches for parallel processing, for storing the news articles in a distributed database and for visualizing them as a 3D visual. It also includes a novel query expansion feature for searching the news articles. Cloudpress 2.0 also allows on-the-fly, extractive summarization of news articles based on the input query.
As one of the killing applications in NGI, peer-to-peer networks (P2P for short) have rapidly developed in recent years. We survey and catalog the current hot research aspects in P2P networks, compare and review most ...
详细信息
ISBN:
(纸本)9780769527369
As one of the killing applications in NGI, peer-to-peer networks (P2P for short) have rapidly developed in recent years. We survey and catalog the current hot research aspects in P2P networks, compare and review most of the research work in P2P networks, and summarize the research means and their problems.
This paper presents an environment for distributed genetic programming using MPI. Genetic programming is a stochastic evolutionary learning methodology that can greatly benefit from parallel/distributed implementation...
详细信息
ISBN:
(纸本)3540410104
This paper presents an environment for distributed genetic programming using MPI. Genetic programming is a stochastic evolutionary learning methodology that can greatly benefit from parallel/distributed implementations. We describe the distributed system, as well as a user-friendly graphical interface to the tool. The usefulness of the distributed setting is demonstrated by the results obtained to date on several difficult problems, one of which is described in the text.
Despite the continuous advances of the last years in grid computing, the grid computing programming paradigms are dominated by the message passing concept. There is little support for other paradigms such as shared da...
详细信息
ISBN:
(纸本)9783540695004
Despite the continuous advances of the last years in grid computing, the grid computing programming paradigms are dominated by the message passing concept. There is little support for other paradigms such as shared data or associative programming. In this paper we analyze some of the existing solutions for grid shared data programming and highlight some of their drawbacks. We propose a new architecture and its core features as well as new evaluation means of its behavior in various scenarios including the next generation grid systems. In addition to the simplicity of our solution, we believe that it would allow us to easily apply further extensions.
We are developing a system for collaborative research and development for a distributed group of researchers at different institutions around the world. In a new paradigm for collaborative computational science, the c...
详细信息
We are developing a system for collaborative research and development for a distributed group of researchers at different institutions around the world. In a new paradigm for collaborative computational science, the computer code and supporting infrastructure itself becomes the collaborating instrument, just as an accelerator becomes the collaborating tool for large numbers of distributed researchers in particle physics. The design of this `Collaboratory' allows many users, with very different areas of expertise, to work coherently together, on distributed computers around the world. Different supercomputers may be used separately, or for problems exceeding the capacity of any single system, multiple supercomputers may be networked together through high speed gigabit networks. Central to this Collaboratory is a new type of community simulation code, called `Cactus'. The scientific driving force behind this project is the simulation of Einstein's equations for studying black holes, gravitational waves, and neutron stars, which has brought together researchers in very different fields from many groups around the world to make advances in the study of relativity and astrophysics. But the system is also being developed to provide scientists and engineers, without expert knowledge of parallel or distributedcomputing, mesh refinement, and so on, with a simple framework for solving any system of partial differential equations on many parallel computer systems, from traditional supercomputers to networks of workstations.
The C* language is a data-parallel extension of the C language which incorporates parallel data types. Since the C++ language provides operator overloading, a C++ library can implement the C* parallel extensions with ...
详细信息
ISBN:
(纸本)0818678763
The C* language is a data-parallel extension of the C language which incorporates parallel data types. Since the C++ language provides operator overloading, a C++ library can implement the C* parallel extensions with a similar syntax. Although library implementations are highly portable, some overheads make them impractical. The two major overheads incurred are temporaries in each operator application, and the inability to detect regular communication patterns The C++ overloading mechanism forces a temporary for each operator application. Also, regular communications in. C* are syntactically indistinguishable from general point-to-point communications. We tackled these problems extensively in a library. The template mechanism, a type parameterization in C++, is used to eliminate temporaries by delaying operator application and evaluating the entire expression at once. The polymorphic type dispatch mechanism is used to detect regular communications by assigning particular types to potentially regular communications. We have implemented the library on the CM-5, and compared its performance with the C* compiler using three simple examples. The techniques presented offers improved performance comparable to the C* compiler, which is close or 1.5 times slower in two examples, and even faster in one example.
暂无评论