parallel programming is an excellent way to speed up computation due to the simultaneous execution of the processes so that the operation is divided into the available threads. OpenMP, available for C, C++, and Fortra...
详细信息
Complex developmental systems are constructed, the different parts of which develop simultaneously due to parallel programming of their respective developments.
Complex developmental systems are constructed, the different parts of which develop simultaneously due to parallel programming of their respective developments.
In the sciences, it is common to use the so-called "big operator" notation to express the iteration of a binary operator (the reducer) over a collection of values. Such a notation typically assumes that the ...
详细信息
Array algorithms where operations are applied to disjoint parts of an array lend themselves well to parallelism, since parallel threads can operate on the parts of the array without synchronisation. However, implement...
详细信息
We show that program synthesis can generate GPU algorithms as well as their optimized implementations. Using the scan kernel as a case study, we describe our evolving synthesis techniques. Relying on our synthesizer, ...
详细信息
Context: Writing software for the current generation of parallel systems requires significant programmer effort, and the community is seeking alternatives that reduce effort while still achieving good performance. Obj...
详细信息
Context: Writing software for the current generation of parallel systems requires significant programmer effort, and the community is seeking alternatives that reduce effort while still achieving good performance. Objective: Measure the effect of parallel programming models (message-passing vs. PRAM-like) oil programmer effort. Design, setting. and subjects: One group of subjects implemented sparse-matrix dense-vector multiplication using message-passing (MPI), and a second group solved the same problem using a PRAM-like model (XMTC). The subjects were students in two graduate-level classes: one class was taught MPI and the other was taught XMTC. Main outcome measures: Development time, program correctness. Results: Mean XMTC development time was 4.8 h less than mean MPI development time (95% confidence interval, 2.0-7.7), a 46% reduction. XMTC programs were more likely to be correct, but the difference in correctness rates was not statistically significant (p = .16). Conclusions: XMTC Solutions for this particular problem required less effort than MPI equivalents, but further Studies are necessary which examine different types of problems and different levels of programmer experience. (C) 2008 Elsevier Inc. All rights reserved.
The article describes the process of computing the Z-transform neural network on the basis of input and output signals of analyzed object. parallel algorithms for performing these calculations are presented and differ...
详细信息
The article describes the process of computing the Z-transform neural network on the basis of input and output signals of analyzed object. parallel algorithms for performing these calculations are presented and different parallel architectures with different number of processors showing their advantages and limitations are analyzed. (C) 2019 Elsevier B.V. All rights reserved.
Moore's Law will continue to increase the number of transistors on die for a couple of decades, as silicon technology moves from 65nm today to 45nm, 32 nm and 22nm in the future. Since power and thermal constraint...
详细信息
ISBN:
(纸本)1595936025
Moore's Law will continue to increase the number of transistors on die for a couple of decades, as silicon technology moves from 65nm today to 45nm, 32 nm and 22nm in the future. Since power and thermal constraints increase with frequency, multi-core or many-core microprocessors will be the way of the future. In the near future, hardware platforms will have sixteen or more cores on die to achieve more than one Tera Instructions Per second (TIPs) computation power. These cores will communicate each other through an on-die interconnect fabric with more than one TB/s on-die bandwidth and less than 30 cycles latency. Off-die D-cache will employ 3D stacked memory technology to tremendously increase off-die cache/memory bandwidth and reduce the latency. Fast copper flex cables will link CPU-DRAM on socket and optical silicon photonics will provide up to one Tb/s I/O bandwidth between boxes. The hardware system with TIPs of compute power operating on terabytes of data make this a ?tera-scale? platform. What are the software implications with the hardware changes from uniprocessor to tera-scale platform with many cores as "the way of the future?" It will be a great challenge for programming environments to help programmers develop concurrent code for most client software. A good concurrent programming environment should extend existing programming languages that typical programmers are familiar with, and bring benefits for concurrent programming. There are many research topics. Examples topics include flexible parallel programming models based on needs from applications, better synchronization mechanisms such as Transactional Memory to replace simple ?Thread + Lock? structure, nested data parallel language primitives with new protocols, fine-grained synchronization mechanisms with hardware support, maybe fine-grained message passing, advanced compiler optimizations for the threaded code, and software tools in the concurrent programming environment. A more interesting proble
Orléans Skeleton Library (OSL) is a library of parallel algorithmic skeletons in C++ on top of MPI. It provides a structured approach towards parallel programming. Skeletons in OSL are based over the bulk synchro...
详细信息
In this paper we present the Orchid system, a new portable and scalable platform for parallel programming, suitable for any type of distributed memory architecture. It includes C libraries that facilitate dynamic proc...
详细信息
In this paper we present the Orchid system, a new portable and scalable platform for parallel programming, suitable for any type of distributed memory architecture. It includes C libraries that facilitate dynamic process allocation, asynchronous process communication, and global process synchronization. It also integrates a set of flexible mechanisms for the implementation of a wide variety of Distributed Shared Memory (DSM) paradigms. As an example, two different DSM paradigms are proposed. Moreover, a new polyparametric model is suggested, which can be used in the performance evaluation of any DSM paradigm. Orchid has been successfully used for the development of a large scale application, i.e. an environment for parallel logic programming, based on attribute grammars.
暂无评论