Peptide-spectrum matching is one of the most time-consuming portion of the database search method for assignment of tandem mass spectra to peptides. In this study, we develop a parallel algorithm for peptide-spectrum ...
详细信息
Peptide-spectrum matching is one of the most time-consuming portion of the database search method for assignment of tandem mass spectra to peptides. In this study, we develop a parallel algorithm for peptide-spectrum matching using single-instructionmultipledata (SIMD) instructions. Unlike other parallel algorithms in peptide-spectrum matching, our algorithm parallelizes the computation of matches between a single spectrum and a given peptide sequence from the database. It also significantly reduces the number of comparison operations. Extra improvements are obtained by using SIMD instructions to avoid conditional branches and unnecessary memory access within the algorithm. The implementation of the developed algorithm is based on the Streaming SIMD Extensions technology that is embedded in most Intel microprocessors. Similar technology also exists in other modern microprocessors. A simulation shows that the developed algorithm achieves an 18-fold speedup over the previous version of Real-Time Peptide-Spectrum Matching algorithm [F. X. Wu et al., Rapid Commun. Mass Sepctrom. 2006, 20, 1199-1208]. Therefore, the developed algorithm can be employed to develop real-time control methods for MS/MS.
High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and effi...
详细信息
High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to embrace better techniques and develop tools tailored to their needs. This article presents one simple technique, CPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL, two open-source tool-kits that supports this technique. In introducing PyCUDA and PyOpenCL, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a CPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems. The concept of RTCG is simple and easily implemented using existing, robust infrastructure. Nonetheless it is powerful enough to support (and encourage) the creation of custom application-specific tools by its users. The premise of the paper is illustrated by a wide range of examples where the technique has been applied with considerable success. (C) 2011 Elsevier B.V. All rights
The major parallel architecture classes are considered: single-instruction multiple-data (SIMD) computers, tightly coupled multiple-instructionmultiple-data (MIMD) computers, hypercuboid computers and constant-valenc...
详细信息
The major parallel architecture classes are considered: single-instruction multiple-data (SIMD) computers, tightly coupled multiple-instructionmultiple-data (MIMD) computers, hypercuboid computers and constant-valence MIMD computers. An argument that the PRAM model is universal over tightly coupled and hypercube systems, but not over constant-valence-topology, loosely coupled-system is reviewed, showing precisely how the PRAM model is too powerful to permit broad universality. Ways in which a model of computation can be restricted to become universal over less powerful architectures are discussed. The Bird-Meertens formalism (R.S. Bird, 1989), is introduced and it is shown how it is used to express computations in a compact way. It is also shown that the Bird-Meertens formalism is universal over all four architecture classes and that nontrivial restrictions of functional programming languages exist that can be efficiently executed on disparate architectures. The use of the Bird-Meertens formalism as the basis for a programming language is discussed, and it is shown that it is expressive enough to be used for general programming. Other models and programming languages with architecture-independent properties are reviewed
A hardware architecture that addresses at least part of the potential parallelism in each of the three levels of vision abstraction, low (sensory), intermediate (symbolic), and high (knowledge-based), is described. Th...
详细信息
A hardware architecture that addresses at least part of the potential parallelism in each of the three levels of vision abstraction, low (sensory), intermediate (symbolic), and high (knowledge-based), is described. The machine, called the image understanding architecture (IUA), consists of three different, tightly coupled parallel processors; the content addressable array parallel processor (CAAPP) at the low level, the intermediate communication associative processor (ICAP) at the intermediate level, and the symbolic processing array (SPA) at the high level. The CAAPP and ICAP levels are controlled by an array control unit (ACU) that takes its directions from the SPA level. The SPA is a multiple-instructionmultiple-data (MIMD) parallel processor, while the intermediate and low levels operat in multiple modes. The CAAPP operates in single-instruction multiple-data (SIMD) associative or multiassociative mode, and the ICAP operates in single-program multiple-data (SPMD) or MIMD mode
Achieving peak throughput on modern CPUs requires maximizing the use of single-instruction, multiple-data (SIMD) or vector compute units. single-program, multiple-data (SPMD) programming models are an effective way to...
详细信息
ISBN:
(纸本)9798400701016
Achieving peak throughput on modern CPUs requires maximizing the use of single-instruction, multiple-data (SIMD) or vector compute units. single-program, multiple-data (SPMD) programming models are an effective way to use high-level programming languages to target these ISAs. Unfortunately, many SPMD frameworks have evolved to have either overly-restrictive language specifications or under-specified programming models, and this has slowed the widescale adoption of SPMD-style programming. This paper introduces Parsimony (PARallel SIMd), a SPMD programming approach built with semantics designed to be compatible with multiple languages and to cleanly integrate into the standard optimizing compiler toolchains for those languages. We first explain the Parsimony programming model semantics and how they enable a standalone compiler IR-to-IR pass that can perform vectorization independently of other passes, improving the language and toolchain compatibility of SPMD programming. We then demonstrate a LLVM prototype of the Parsimony approach that matches the performance of ispc, a popular but more restrictive SPMD approach, and achieves 97% of the performance of hand-written AVX-512 SIMD intrinsics on over 70 benchmarks ported from the Simd Library. We finally discuss where Parsimony has exposed parts of existing language and compiler flows where slight improvements could further enable improved SPMD program vectorization.
暂无评论