Assuming that the multicore revolution plays out the way the microprocessor industry expects, it seems that within a decade most programming will involve parallelism at some level. One needs to ask how this affects th...
详细信息
ISBN:
(纸本)9781605583976
Assuming that the multicore revolution plays out the way the microprocessor industry expects, it seems that within a decade most programming will involve parallelism at some level. One needs to ask how this affects the the way we teach computer science, or even how we have people think about computation. With regards to teaching there seem to be three basic choices: (1) we only train a small number of experts in parallel computation who develop a collection of libraries, and everyone else just uses them; (2) we leave our core curriculum pretty much as is, but add some advanced courses on parallelism or perhaps tack on a few lectures at the end of existing courses; or (3) we start teaching parallelism from the start and embed it throughout the curriculum withthe idea of getting students to think about parallelism as the most natural form of computation and sequential computation as a special *** talk will examine some of the implications of the third option. It will argue that thinking about parallelism, when treated in an appropriate way, might be as easy or easier that thinking sequentially. A key prerequisite, however, is to identify what the core ideas in parallelism are and how they might be layered and integrated with existing concepts. Another more difficult issue is how to cleanly integrate these ideas among courses. After all much of the success of sequential computation follows from the concept of a random access machine and its ability to serve as a simple, albeit imperfect, interface between programming languages, algorithm analysis, and hardware design. the talk will go through an initial list of some core ideas in parallelism, and an approach to integrating these ideas between parallel algorithms, programming languages, and, to some extent, hardware. this requires, however, moving away from the concept of a machine model as a interface for thinking about computation.
作者:
Sulzmann, MartinLam, Edmund S. L.Programming
Logics and Semantics Group IT University of Copenhagen Rued Langgaards Vej 7 2300 Copenhagen S Denmark School of Computing
National University of Singapore S16 Level 5 3 Science Drive 2 Singapore 117543 Singapore
Multi-set constraint rewriting allows for a highly parallel computational model and has been used in a multitude of application domains such as constraint solving, agent specification etc. Rewriting steps can be appli...
详细信息
the proceedings contain 25 papers. the topics discussed include: order-sorted dependency pairs;macros for context-free grammars;inferring precise polymorphic type dependencies in logic programs;a type system for safe ...
ISBN:
(纸本)9781605581170
the proceedings contain 25 papers. the topics discussed include: order-sorted dependency pairs;macros for context-free grammars;inferring precise polymorphic type dependencies in logic programs;a type system for safe memory management and its proof of correctness;programming with proofs and explicit contexts;towards execution time estimation in abstract machine-based languages;similarity-based reasoning in qualified logic programming;classifying integrity checking methods with regard to inconsistency tolerance;comprehending finite maps for algorithmic debugging of higher-order functional programs;parallel execution of multi-set constraint rewrite rules;a rewriting framework for the composition of access control policies;global difference constraint propagation for finite domain solvers;and dynamic variable elimination during propagation solving.
High performance SIMD text processing using the method of parallel bit streams is introduced with a case study of UTF-8 to UTF-16 transcoding. A forward transform converts byte-oriented character stream data into eigh...
详细信息
ISBN:
(纸本)9781595939609
High performance SIMD text processing using the method of parallel bit streams is introduced with a case study of UTF-8 to UTF-16 transcoding. A forward transform converts byte-oriented character stream data into eight parallel bit streams. Decoding, validation and computation of UTF-8 indexed UTF-16 bit streams are performed using bit-parallel logic and shifting operations. Conversion from UTF-8 indexing to UTF-16 indexing is performed using parallel bit deletion. the inverse transform is applied to yield high and low UTF-16 byte streams which are then merged. Combined with optimization techniques for blocks of ASCII data, speed-ups of 3 to 25 times are achieved on commodity processors compared with optimized byte-at-a-time code. Further applications of the method of parallel bit streams to bulk text processing applications are briefly discussed along with future prospects for the combination of intraregister and intrachip parallelism on multicore processors.
the next generations of supercomputers are projected to have hundreds of thousands of processors. However, as the numbers of processors grow, the scalability of applications will be the dominant challenge. this forces...
详细信息
ISBN:
(纸本)9781595939609
the next generations of supercomputers are projected to have hundreds of thousands of processors. However, as the numbers of processors grow, the scalability of applications will be the dominant challenge. this forces us to reexamine some of our fundamental ways that we approach the design and use of parallel languages and runtime systems. In this paper we show how the globally shared arrays in a popular Partitioned Global Address Space (PGAS) language, Unified parallel C (UPC), can be combined with a new collective interface to improve both performance and scalability. this interface allows subsets, or teams, of threads to perform a collective together. As opposed to MPI's communicators, our interface allows set of threads to be placed in teams instantly rather than explicitly constructing communicators, thus allowing for a more dynamic team construction and manipulation. We motivate our ideas withthree application kernels: Dense Matrix Multiplication, Dense Cholesky factorization and multidimensional Fourier transforms. We describe how the three aforementioned applications can be succinctly written in UPC thereby aiding productivity. We also show how such an interface allows for scalability by running on up to 16,384 processors on the BlueGene/L. In a few lines of UPC code, we wrote a dense matrix multiply routine achieves 28.8 TFlop/s and a 3D FFT that achieves 2.1 TFlop/s. We analyze our performance results through models and show that the machine resources rather than the interfaces themselves limit the performance.
Abstract machines provide a certain separation between platform-dependent and platform-independent concerns in compilation. Many of the differences between architectures are encapsulated in the specific abstract machi...
详细信息
High performance SIMD text processing using the method of parallel bit streams is introduced with a case study of UTF-8 to UTF-16 transcoding. A forward transform converts byte-oriented character stream data into eigh...
详细信息
the next generations of supercomputers are projected to have hundreds of thousands of processors. However, as the numbers of processors grow, the scalability of applications will be the dominant challenge. this forces...
详细信息
暂无评论