版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Campus Box 1045 1 Brookings Drive St. LouisMO63130 United States MIT CSAIL 32 Vassar Street CambridgeMA02139 United States Intel Corporation 77 Reed Road HudsonMA01749 United States
出 版 物:《ACM Transactions on Parallel Computing》 (ACM Trans. Parallel Comp.)
年 卷 期:2015年第2卷第3期
页 面:1–42页
核心收录:
基 金:This work was supported in part by the National Science Foundation under Grants CNS-1017058 and CCF-1162148. Tao B. Schardl is supported in part by an NSF Graduate Research Fellowship.I-Ting Angelina Lee, Charles E. Leiserson, Tao B. Schardl, Zhunping Zhang, and Jim Sukha. 2015. On-the-fly pipeline parallelism. ACM Trans. Parallel Comput. 2, 3, Article 17 (September 2015), 42 pages. DOI: http://dx.doi.org/10.1145/2809808 This work was supported in part by the National Science Foundation under Grants CNS-1017058 and CCF-1162148. Tao B. Schardl is supported in part by an NSF Graduate Research Fellowship. Authors’ addresses: I.-T. A. Lee, 1 Brookings Drive, Campus Box 1045, St. Louis, MO 63130 email: angelee@wustl.edu C. E. Leiserson, T. B. Schardl, and Z. Zhang, MIT CSAIL, 32 Vassar Street, Cambridge, MA 02139 emails: {cel, neboat}@mit.edu, gnipnuhz@gmail.com J. Sukha, Intel Corporation, 77 Reed Road, Hudson, MA 01749 email: jim.sukha@intel.com. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or permissions@acm.org. ©c 2015 ACM 2329-4949/2015/09-ART17 $15.00 DOI: http://dx.doi.org/10.1145/2809808
主 题:Parallel programming
摘 要:Pipeline parallelism organizes a parallel program as a linear sequence of stages. Each stage processes elements of a data stream, passing each processed data element to the next stage, and then taking on a new element before the subsequent stages have necessarily completed their processing. Pipeline parallelism is used especially in streaming applications that perform video, audio, and digital signal processing. Three out of 13 benchmarks in PARSEC, a popular software benchmark suite designed for shared-memory multiprocessors, can be expressed as pipeline parallelism. Whereas most concurrency platforms that support pipeline parallelism use a construct-and-run approach, this article investigates on-the-fly pipeline parallelism, where the structure of the pipeline emerges as the program executes rather than being specified a priori. On-the-fly pipeline parallelism allows the number of stages to vary from iteration to iteration and dependencies to be data dependent. We propose simple linguistics for specifying on-the-fly pipeline parallelism and describe a provably efficient scheduling algorithm, the PIPER algorithm, which integrates pipeline parallelism into a work-stealing scheduler, allowing pipeline and fork-join parallelism to be arbitrarily nested. The PIPER algorithm automatically throttles the parallelism, precluding runaway pipelines. Given a pipeline computation with T1 work and T∞ span (critical-path length), PIPER executes the computation on P processors in TP ≤ T1/P+ O(T∞ +lg P) expected time. PIPER also limits stack space, ensuring that it does not grow unboundedly with running time. We have incorporated on-the-fly pipeline parallelism into a Cilk-based work-stealing runtime system. Our prototype Cilk-P implementation exploits optimizations such as lazy enabling and dependency folding. We have ported the three PARSEC benchmarks that exhibit pipeline parallelism to run on Cilk-P. One of these, x264, cannot readily be executed by systems that supp