A visit to the neighborhood PC retail store provides ample proof that we are in the multi-core era. The key differentiator among manufacturers today is the number of cores that they pack onto a single chip. The clock ...
详细信息
A visit to the neighborhood PC retail store provides ample proof that we are in the multi-core era. The key differentiator among manufacturers today is the number of cores that they pack onto a single chip. The clock frequency of commodity processors has reached its limit, however, and is likely to stay below 4 GHz for years to come. As a result, adding cores is not synonymous with increasing computational power. To take full advantage of the performance enhancements offered by the new multi-core hardware, a corresponding shift must take place in the software infrastructure - a shift to parallel computing.
The MultiFlex system is an application-to-platform mapping tool that integrates heterogeneous parallel components - H/W or S/W - into a homogeneous platform programming environment. This leads to higher quality design...
详细信息
The MultiFlex system is an application-to-platform mapping tool that integrates heterogeneous parallel components - H/W or S/W - into a homogeneous platform programming environment. This leads to higher quality designs through encapsulation and abstraction. Two high-level parallel programming models are supported by the following MultiFlex platform mapping tools: a distributed system object component (DSOC) object-oriented message passing model and a symmetrical multiprocessing (SMP) model using shared memory. We demonstrate the combined use of the MultiFlex multiprocessor mapping tools, supported by high-speed hardware-assisted messaging, context-switching, and dynamic scheduling using the StepNP demonstrator multiprocessor system-on-chip platform, for two representative applications: 1) an Internet traffic management application running at 2.5 Gb/s and 2) an MPEG4 video encoder (VGA resolution, at 30 frames/s). For these applications, a combination of the DSOC and SMP programming models were used in interoperable fashion. After optimization and mapping, processor utilization rates of 85%-91% were demonstrated for the traffic manager. For the MPEG4 decoder, the average processor utilization was 88%.
parallel programming of high-performance computers has emerged as a key technology for the numerical solution of large-scale problems arising in computational science and engineering (CSE). The authors believe that pr...
详细信息
parallel programming of high-performance computers has emerged as a key technology for the numerical solution of large-scale problems arising in computational science and engineering (CSE). The authors believe that principles and techniques of parallel programming are among the essential ingredients of any CSE as well as computer science curriculum. Today, opinions on the role and importance of parallel programming are diverse. Rather than seeing it as a marginal beneficial skill optionally taught at the graduate level, we understand parallel programming as crucial basic skill that should be taught as an integral part of the undergraduate computer science curriculum. A practical training course developed for computer science undergraduates at Aachen University is described. Its goal is to introduce young computer science students to different parallel programming paradigms for shared and distributed memory computers as well as to give a first exposition to the field of computational science by simple, yet carefully chosen sample problems. (C) 2003 Elsevier B.V. All rights reserved.
High-throughput distributed data analysis based on clustered computing is gaining increasing importance in the field of computational biology. This paper describes a parallel programming approach and its software impl...
详细信息
ISBN:
(纸本)9781424423712
High-throughput distributed data analysis based on clustered computing is gaining increasing importance in the field of computational biology. This paper describes a parallel programming approach and its software implementation using Message Passing Interface (MPI) to parallelize a computationally intensive algorithm for identifying cellular contexts. We report successful implementation on a 1,024 processor Beowulf cluster to analyze microarray data consisting of hundreds of thousands of measurements from different datasets. Detailed performance evaluation shows that data analysis that could have taken months on a stand-alone computer was accomplished in less than a day.
This paper proposes a parallel programming scheme for the cross-point array with resistive random access memory (RRAM). Synaptic plasticity in unsupervised learning is realized by tuning the conductance of each RRAM c...
详细信息
This paper proposes a parallel programming scheme for the cross-point array with resistive random access memory (RRAM). Synaptic plasticity in unsupervised learning is realized by tuning the conductance of each RRAM cell. Inspired by the spike-timing-dependent-plasticity (STDP), the programming strength is encoded into the spike firing rate (i.e., pulse frequency) and the overlap time (i.e., duty cycle) of the pre-synaptic node and post-synaptic node, and simultaneously applied to all RRAM cells in the cross-point array. Such an approach achieves parallel programming of the entire RRAM array, only requiring local information from pre-synaptic and post-synaptic nodes to each RRAM cell. As demonstrated by digital peripheral circuits implemented in 65nm CMOS, the programming time of a 40kb RRAM array is 84 ns, indicating 900X speedup as compared to state-of-the-art software approach of sparse coding in image feature extraction.
We present Chorus, a high-level parallel programming model suitable for irregular, heap-manipulating applications like mesh refinement and epidemic simulations, and JChorus. an implementation of the model on top of Ja...
详细信息
ISBN:
(纸本)9781605587349
We present Chorus, a high-level parallel programming model suitable for irregular, heap-manipulating applications like mesh refinement and epidemic simulations, and JChorus. an implementation of the model on top of Java. One goal of Chorus is to express the dynamic and instance-dependent patterns of memory access that are common in typical irregular applications. Its other focus is locality of effects the property that in many of the same applications, typical imperative commands only affect small, local regions in the shared heap Chorus addresses dynamism and locality through the unifying abstraction of an object assembly: a local region in a shared data structure equipped with a short-lived, speculative thread of control The thread of control in an assembly can only access objects within the assembly. While objects can migrate from assembly to assembly. such migration is local-i.e., objects only move from one assembly to a neighboring one-and does not lead to aliasing. programming primitives include a merge operation, by which an assembly merges with an adjacent assembly, and a split operation, which splits an assembly into smaller ones Our abstractions are race and deadlock-free, and inherently data-centric. We demonstrate that Chorus and JChorus allow natural programming of several important applications exhibiting irregular data-parallelism. We also present an implementation of JChorus based on a many-to-one mapping of assemblies to lower-level threads, and report on preliminary performance numbers.
Soft-core system allows designers to modify the components which are in the architecture they designed conveniently. In some systems, uni-core processor can not provide enough computing power to support a huge amount ...
详细信息
ISBN:
(纸本)9783642131356
Soft-core system allows designers to modify the components which are in the architecture they designed conveniently. In some systems, uni-core processor can not provide enough computing power to support a huge amount of computing for specific applications. In order to improve the performance of a multi-core system, in addition to the hardware architecture design, parallel programming is an important issue. The current parallelizing compilers are hard to parallelize the programs effectively. The programmer must think about how to allot the task to each processor in the beginning. In this paper, we present a software framework for designing parallel program. The proposed framework provides a convenient parallel programming environment for programmers to design the multi-core system's software. From the experiments, the proposed framework can parallelize the program effectively by applying the provided functions.
The MultiFlex system is an application-to-platform mapping tool that integrates heterogeneous parallel components - H/W or S/W - into a homogeneous platform programming environment. This leads to higher quality design...
详细信息
The MultiFlex system is an application-to-platform mapping tool that integrates heterogeneous parallel components - H/W or S/W - into a homogeneous platform programming environment. This leads to higher quality designs through encapsulation and abstraction. Two high-level parallel programming models are supported by the following MultiFlex platform mapping tools: a distributed system object component (DSOC) object-oriented message passing model and a symmetrical multiprocessing (SMP) model using shared memory. We demonstrate the combined use of the MultiFlex multiprocessor mapping tools, supported by high-speed hardware-assisted messaging, context-switching, and dynamic scheduling using the StepNP demonstrator multiprocessor system-on-chip platform, for two representative applications: 1) an Internet traffic management application running at 2.5 Gb/s and 2) an MPEG4 video encoder (VGA resolution, at 30 frames/s). For these applications, a combination of the DSOC and SMP programming models were used in interoperable fashion. After optimization and mapping, processor utilization rates of 85%-91% were demonstrated for the traffic manager. For the MPEG4 decoder, the average processor utilization was 88%.
SyDPaCC is a set of libraries for the Coq interactive theorem prover. It allows to develop correct functional parallel programs on distributed lists based on the transformation of naive sequential programs that are co...
详细信息
ISBN:
(纸本)9781450359337
SyDPaCC is a set of libraries for the Coq interactive theorem prover. It allows to develop correct functional parallel programs on distributed lists based on the transformation of naive sequential programs that are considered as specifications. To offer the parallelization of functions on other data structures, the first step is to implement a parallel version of the considered data structure and to provide parallel implementations of primitive functions manipulating it. This paper presents such a first step: a binary tree extension which includes new map and reduce pure functional algorithmic skeletons for binary trees. Such algorithmic skeletons are templates of parallel algorithms, realized in a functional context as higherorder functions implemented in parallel. The use of these new primitives is illustrated on example applications.
parallel programming of high-performance computers has emerged as a key technology for the numerical solution of large-scale problems arising in computational science and engineering (CSE). The authors believe that pr...
详细信息
parallel programming of high-performance computers has emerged as a key technology for the numerical solution of large-scale problems arising in computational science and engineering (CSE). The authors believe that principles and techniques of parallel programming are among the essential ingredients of any CSE as well as computer science curriculum. Today, opinions on the role and importance of parallel programming are diverse. Rather than seeing it as a marginal beneficial skill optionally taught at the graduate level, we understand parallel programming as crucial basic skill that should be taught as an integral part of the undergraduate computer science curriculum. A practical training course developed for computer science undergraduates at Aachen University is described. Its goal is to introduce young computer science students to different parallel programming paradigms for shared and distributed memory computers as well as to give a first exposition to the field of computational science by simple, yet carefully chosen sample problems. (C) 2003 Elsevier B.V. All rights reserved.
暂无评论