the industry shift emerging forms of parallel and distributed computing (PDC), including multi-core CPUs, cloud computing, and general-purpose use of GPUs, have naturally led to increased presence of PDC elements unde...
详细信息
ISBN:
(纸本)9781450326056
the industry shift emerging forms of parallel and distributed computing (PDC), including multi-core CPUs, cloud computing, and general-purpose use of GPUs, have naturally led to increased presence of PDC elements undergraduate Computer Science curriculum recommendations, such as the new and substantial "PD" knowledge area in the joint acm/IEEE CS2013 recommendations[1]. How can undergraduate students grasp the extensive and complex range of PDC principles and practices, and apply that knowledge in problem solving, while PDC technologies continue to evolve rapidly? parallel design patterns are descriptions of effective solutions to recurring parallelprogramming problems in particular contexts and have emerged from long-standing industry practice. parallel patterns occur at all computational levels, ranging from low-level concurrent execution patterns (such as message passing or thread pool patterns) to high-level software design patterns suitable for organizing entire systems or their components (such as model-view-control or pipe and filter patterns). the sheer number of parallel patterns, which reflect the full breadth and complexity of PDC, can be quite daunting for a newcomer. However, the ubiquity of parallel patterns in all forms of parallel and distributed computation makes these patterns relevant and illuminating at all undergraduate levels. Knowledge of parallel patterns, being reusable elements of parallel design, guides problem-solving during the creation of parallel programs;and those enduring design patterns remain relevant and useful as new PDC infrastructure emerges in this rapidly evolving field. this panel presents four viewpoints representing various approaches for teaching parallel patterns to CS undergraduates at multiple academic levels. Moderator Dick Brown co-directs (with Adams and Shoop) the CSinparallel project (***), which produces and shares modular materials for incrementally adding parallelism to existing undergraduate comp
Today, almost all computer architectures are parallel and heterogeneous; a combination of multiple CPUs, GPUs and specialized processors. this creates a challenging problem for application developers who want to devel...
详细信息
ISBN:
(纸本)9781450326568
Today, almost all computer architectures are parallel and heterogeneous; a combination of multiple CPUs, GPUs and specialized processors. this creates a challenging problem for application developers who want to develop high performance programs without the effort required to use low-level, architecture specific parallelprogramming models (e.g. OpenMP for CMPs, CUDA for GPUs, MPI for clusters). Domain-specific languages (DSLs) are a promising solution to this problem because they can provide an avenue for high-level application-specific abstractions with implicit parallelism to be mapped directly to low level architecture-specific programming models; providing both high programmer productivity and high execution *** this talk I will describe an approach to building high performance DSLs, which is based on DSL embedding in a general purpose programming language, metaprogramming and a DSL infrastructure called Delite. I will describe how we transform DSL programs into efficient first-order low-level code using domain specific optimization, parallelism and locality optimization withparallel patterns, and architecture-specific code generation. All optimizations and transformations are implemented in Delite: an extensible DSL compiler infrastucture that significantly reduces the effort required to develop new DSLs. Delite DSLs for machine learning, data querying, graph analysis, and scientific computing all achieve performance competitive with manually parallelized C++ code.
this talk has two parts. the first part will discuss possible directions for computer architecture research, including architecture as infrastructure, energy first, impact of new technologies, and cross-layer opportun...
详细信息
ISBN:
(纸本)9781450326568
this talk has two parts. the first part will discuss possible directions for computer architecture research, including architecture as infrastructure, energy first, impact of new technologies, and cross-layer opportunities. this part is based on a 2012 Computing Community Consortium (CCC) whitepaper effort led by Hill, as well as other recent National Academy and ISAT studies. See: http://***/ccc/docs/init/***. the second part of the talk will discuss one or more exam-ples of cross-layer research advocated in the first part. For example, our analysis shows that many "big-memory" server workloads, such as databases, in-memory caches, and graph analytics, pay a high cost for page-based virtual memory: up to 50% of execution time wasted. Via small changes to the operating system (Linux) and hardware (x86-64 MMU), this work reduces execution time these workloads waste to less than 0.5%. the key idea is to map part of a process's linear virtual address space with a new incarnation of segmentation, while providing compatibility by mapping the rest of the virtual address space with pag-ing.
We present a simple yet effective technique for improving performance of lock-based code using the hardware lock elision (HLE) feature in Intel's upcoming Haswell processor. We also describe how to extend Haswell&...
详细信息
ISBN:
(纸本)9781450319225
We present a simple yet effective technique for improving performance of lock-based code using the hardware lock elision (HLE) feature in Intel's upcoming Haswell processor. We also describe how to extend Haswell's HLE mechanism to achieve a similar effect to our lock elision scheme entirely in hardware.
the proceedings contain 57 papers. the topics discussed include: scalable framework for mapping streaming applications onto multi-GPU systems;efficient performance evaluation of memory hierarchy for highly multithread...
ISBN:
(纸本)9781450311601
the proceedings contain 57 papers. the topics discussed include: scalable framework for mapping streaming applications onto multi-GPU systems;efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors;extending a C-like language for portable SIMD programming;DOJ: dynamically parallelizing object-oriented programs;GPU-based NFA implementation for memory efficient high speed regular expression matching;concurrent tries with efficient non-blocking snapshots;deterministic parallel random-number generation for dynamic-multithreading platforms;algorithm-based fault tolerance for dense matrix factorizations;revisiting the combining synchronization technique;FlexBFS: a parallelism-aware implementation of breadth-first search on GPU;optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA;and the boat hull model: adapting the roofline model to enable performance prediction for parallel computing.
JavaScript, the most popular language on the Web, is rapidly moving to the server-side, becoming even more pervasive. Still, JavaScript lacks support for shared memory parallelism, making it challenging for developers...
详细信息
ISBN:
(纸本)9781450319225
JavaScript, the most popular language on the Web, is rapidly moving to the server-side, becoming even more pervasive. Still, JavaScript lacks support for shared memory parallelism, making it challenging for developers to exploit multicores present in both servers and clients. In this paper we present TigerQuoll, a novel API and runtime for parallelprogramming in JavaScript. TigerQuoll features an event-based API and a parallel runtime allowing applications to exploit a mutable shared memory space. the programming model of TigerQuoll features automatic consistency and concurrency management, such that developers do not have to deal with shared-data synchronization. TigerQuoll supports an innovative transaction model that allows for eventual consistency to speed up high-contention workloads. Experiments show that TigerQuoll applications scale well, allowing one to implement common parallelism patterns in JavaScript.
Recently, graph computation has emerged as an important class of high-performance computing application whose characteristics differ markedly from those of traditional, compute-bound, kernels. Libraries such as BLAS, ...
详细信息
ISBN:
(纸本)9781450319225
Recently, graph computation has emerged as an important class of high-performance computing application whose characteristics differ markedly from those of traditional, compute-bound, kernels. Libraries such as BLAS, LAPACK, and others have been successful in codifying best practices in numerical computing. the data-driven nature of graph applications necessitates a more complex application stack incorporating runtime optimization. In this paper, we present a method of phrasing graph algorithms as collections of asynchronous, concurrently executing, concise code fragments which may be invoked both locally and in remote address spaces. A runtime layer performs a number of dynamic optimizations, including message coalescing, message combining, and software routing. Practical implementations and performance results are provided for a number of representative algorithms.
暂无评论