Existing formats for Sparse Matrix-Vector Multiplication (SpMV) on the GPU are outperforming their corresponding implementations on multi-core CPUs. In this paper, we present a new format called Sliced COO (SCOO) and ...
详细信息
Existing formats for Sparse Matrix-Vector Multiplication (SpMV) on the GPU are outperforming their corresponding implementations on multi-core CPUs. In this paper, we present a new format called Sliced COO (SCOO) and an efficient CUDA implementation to perform SpMV on the GPU using atomic operations. We compare SCOO performance to existing formats of the NVIDIA Cusp library using large sparse matrices. Our results for single-precision floating-point matrices show that SCOO outperforms the COO and CSR format for all tested matrices and the HYB format for all tested unstructured matrices on a single GPU. Furthermore, our dual-GPU implementation achieves an efficiency of 94% on average. Due to the lower performance of existing CUDA-enabled GPUs for atomic operations on double-precision floating-point numbers the SCOO implementation for double-precision does not consistently outperform the other formats for every unstructured matrix. Overall, the average speedup of SCOO for the tested benchmark dataset is 3.33 (1.56) compared to CSR, 5.25 (2.42) compared to COO, 2.39 (1.37) compared to HYB for single (double) precision on a Tesla C2075. Furthermore, comparison to a Sandy-Bridge CPU shows that SCOO on a Fermi GPU outperforms the multi-threaded CSR implementation of the Intel MKL Library on an i7-2700 K by a factor between 5.5 (2.3) and 18 (12.7) for single (double) precision. (C) 2013 Elsevier B.V. All rights reserved.
Intel Cilk Plus extends C and C++ to enable writing composable deterministic parallel software that can exploit both the thread and vector parallelism commonly available in modern hardware.
Intel Cilk Plus extends C and C++ to enable writing composable deterministic parallel software that can exploit both the thread and vector parallelism commonly available in modern hardware.
J is an open source programming language with a rich collection of well-designed primitives and a consistent, compact, mathematics-like syntax. It has amazing array facilities, superb numeric and data processing capab...
详细信息
J is an open source programming language with a rich collection of well-designed primitives and a consistent, compact, mathematics-like syntax. It has amazing array facilities, superb numeric and data processing capabilities, and graceful features al...
The Fortran language standard has undergone significant upgrades in recent years (1990, 1995, 2003, and 2008). Numerical Computing with Modern Fortran illustrates many of these improvements through practical solutions...
详细信息
ISBN:
(数字)9781611973129
ISBN:
(纸本)9781611973112
The Fortran language standard has undergone significant upgrades in recent years (1990, 1995, 2003, and 2008). Numerical Computing with Modern Fortran illustrates many of these improvements through practical solutions to a number of scientific and engineering problems. Readers will discover techniques for modernizing algorithms written in Fortran; examples of Fortran interoperating with C or C++ programs, plus using the IEEE floating-point standard for efficiency; illustrations of parallel Fortran programming using coarrays, MPI, and OpenMP; and a supplementary website with downloadable source codes discussed in the book. Audience: This book is intended for Fortran programmers seeking to update their programming skills using the language s latest features and for C and C++ programmers who want to understand key software aspects of numerical computing using modern Fortran. It is suitable for an upper-level undergraduate or early graduate course on advanced numerical scientific computing. Contents: Introduction; Chapter 1: The Modern Fortran Source; Chapter 2: Modules for Subprogram Libraries; Chapter 3: Generic Subprograms; Chapter 4: Sparse Matrices, Defined Operations, Overloaded Assignment; Chapter 5: Object-Oriented programming for Numerical Applications; Chapter 6: Recursion in Fortran; Chapter 7: Case Study: Toward a Modern QUADPACK Routine; Chapter 8: Case Study: Quadrature Routine qag2003; Chapter 9: IEEE Arithmetic Features and Exception Handling; Chapter 10: Interoperability with C; Chapter 11: Defined Operations for Sparse Matrix Solutions; Chapter 12: Case Study: Two Sparse Least-Squares System Examples; Chapter 13: Message Passing with MPI in Standard Fortran; Chapter 14: Coarrays in Standard Fortran; Chapter 15: OpenMP in Fortran; Chapter 16: Modifying Source to Remove Obsolescent or Deleted Features; Chapter 17: Software Testing; Chapter 18: Compilers; Chapter 19: Software Tools; Chapter 20: Fortran Book Code on SIAM Web Site; Bibliography; Index.
In our development team at Sandia National Laboratories we have honed our Scrum processes to where we continually deliver high-performance engineering analysis software to our customers. We deliver despite non-ideal c...
详细信息
ISBN:
(纸本)9780769550763
In our development team at Sandia National Laboratories we have honed our Scrum processes to where we continually deliver high-performance engineering analysis software to our customers. We deliver despite non-ideal circumstances, including development work that can be categorized as exploratory research, regular use of part-time developers, team size that varies widely among Sprints, highly specialized technical skill sets and a broad range of deliverables. We believe our methodologies can be applied to many research-oriented environments such as those found in government laboratories, academic institutions and corporate research facilities. Our goal is to increase the adoption of Lean/Agile project management in these environments by sharing our experiences with those research-oriented development teams who are considering using Lean/Agile, or have started and are encountering problems. In this paper we discuss how we create and prioritize our product backlog, write our user stories, calculate our capacity, plan our Sprints, report our results and communicate our progress to customers. By providing guidance and evidence of success in these areas we hope to overcome real and perceived obstacles that may limit the adoption of Lean/Agile techniques in research-oriented development environments.
Legacy scientific codes are often repurposed to fit adaptive needs, but making such code adaptive without changing the original source programs can be challenging. Adaptive Code Collage (ACC) meets this challenge usin...
详细信息
Legacy scientific codes are often repurposed to fit adaptive needs, but making such code adaptive without changing the original source programs can be challenging. Adaptive Code Collage (ACC) meets this challenge using function-call interception in a language-neutral way at link time, transparently "catching" and redirecting function calls.
Although LAPACK is a powerful library its utilization is difficult. JLAPACK, a Java translation obtained automatically from the Fortran LAPACK sources, retains exactly the same difficult to use interface of LAPACK rou...
详细信息
Although LAPACK is a powerful library its utilization is difficult. JLAPACK, a Java translation obtained automatically from the Fortran LAPACK sources, retains exactly the same difficult to use interface of LAPACK routines. The MTJ library implements an object oriented Java interface to JLAPACK that hides many complicated details. ScalaLab exploits the flexibility of the Scala language to present an even more friendly and convenient interface to the powerful but complicated JLAPACK library. The article describes the interfacing of the low-level JLAPACK routines within the ScalaLab environment. This is performed rather easily by exploiting well suited features of the Scala language. Also, the paper demonstrates the convenience of using JLAPACK routines for linear algebra operations from within ScalaLab.
暂无评论