The Pilot library offers a new method for programmingparallel clusters in C. Formal elements from Communicating Sequential Processes (CSP) were used to realize a process/channel model of parallel computation that red...
详细信息
ISBN:
(纸本)9781424465330
The Pilot library offers a new method for programmingparallel clusters in C. Formal elements from Communicating Sequential Processes (CSP) were used to realize a process/channel model of parallel computation that reduces opportunities for deadlock and other communication errors. This simple model, plus an application programming interface (API) fashioned on C's formatted I/O, are designed to make the library easy for novice scientific C programmers to learn. Optional runtime services including deadlock detection help the programmer to debug communication issues. Pilot forms a thin layer on top of standard Message Passing Interface (MPI), preserving the letter's portability and efficiency, with little performance impact. MPI's powerful collective operations can still be accessed within the conceptual model.
Although Java was not specifically designed for the computationally intensive numeric applications that are the typical fodder of highly parallel machines, its widespread popularity and portability make it an interest...
详细信息
Although Java was not specifically designed for the computationally intensive numeric applications that are the typical fodder of highly parallel machines, its widespread popularity and portability make it an interesting candidate vehicle for massively parallel programming. With the advent of high-performance optimizing Java compilers, the open question is: How can Java programs best exploit massive parallelism? The authors have been contemplating this question via libraries of Java-routines for specifying and coordinating parallel codes. It would be most desirable to have these routines written in 100%-Pure Java; however, a more expedient solution is to provide Java wrappers (stubs) to existing parallel coordination libraries, such as MPI. MPI is an attractive alternative, as like Java, it is portable. We discuss both approaches here. In undertaking this study, we have also identified some minor modifications of the current language specification that would make 100%-Pure Java parallel programming more natural.
parallel programming is difficult. The need for correct and efficient parallel programs is important and one way to meet this requirement is to work on the refinement chain. Beginning with a specification written in T...
详细信息
parallel programming is difficult. The need for correct and efficient parallel programs is important and one way to meet this requirement is to work on the refinement chain. Beginning with a specification written in TLA/sup +/ (for instance), we can transform it-or refine it-into finer grained specifications. At some step, enough structure will have appeared so that we can bridge a gap to fill this structure. We introduce a more concrete version of TLA/sup +/, CTLA, where structuring concerns are to be expressed, but where distributing, mapping or implementation problems are avoided. Indeed, we firmly believe that it is a mistake to go immediately from TLA/sup +/ to a real language like CC++, since the ditch is still too wide. A numerical example supports our claim.
We have developed two new approaches to teaching parallel computing to undergraduates using higher level tools that lead to ease of programming, good software design, and scalable programs. The first approach uses a n...
详细信息
ISBN:
(纸本)9781479913725
We have developed two new approaches to teaching parallel computing to undergraduates using higher level tools that lead to ease of programming, good software design, and scalable programs. The first approach uses a new software environment that creates a higher level of abstraction for parallel and distributed programming based upon a pattern programming approach. The second approach uses compiler directives to describe how a program should be parallelized. We have studied whether using the above tools better helps the students grasp the concepts of parallel computing across the two campuses of the University of North Carolina Wilmington and the University of North Carolina Charlotte using a televideo network. We also taught MPI and OpenMP in the traditional fashion with which we could ask the students to compare and contrast the approaches. An external evaluator conducted three surveys during the semester and analyzed the data. In this paper, we discuss the techniques we used, the assignments we gave the students, and the results of what we learned.
The multicore revolution is now happening both on the desktop and the server systems and is expected to soon enter the embedded space. For the last decades hardware manufacturers have been able to deliver more powerfu...
详细信息
The multicore revolution is now happening both on the desktop and the server systems and is expected to soon enter the embedded space. For the last decades hardware manufacturers have been able to deliver more powerful CPUs by higher clock speed and advanced memory systems. However, the frequency is no longer increasing, and instead the number of cores on each CPU is. Software development for embedded uniprocessor systems is completely dominated by imperative style programming and deeply rooted in C and scheduling of threads and processes. We believe that the multicore challenge requires new methodologies and new tools to make efficient use the hardware. Data flow programming, which has received considerable attention over the years, is a promising candidate for design and implementation of certain classes of applications, such as complex media coding, network processing, imaging and digital signal processing, and embedded control, on parallel hardware. This talk discusses current problems areas within the embedded domain and presents the Open Dataflow framework. Traditionally, very little work has been done on real-time analysis and design of dataflow systems. The difficulties involved, which relates to the high level of dynamicity are discussed and some research ideas are presented.
Due to the huge computing resources the grid can provide, researchers have utilized the grid to run very large scale applications over a large number of computing and I/O nodes. However, since the computing nodes in g...
详细信息
Due to the huge computing resources the grid can provide, researchers have utilized the grid to run very large scale applications over a large number of computing and I/O nodes. However, since the computing nodes in grid are spread geographically over a wide area, communication latency varies significantly between nodes. Thus, running existing parallel applications over the whole grid can result in a worse performance even with larger number of computing nodes. Hence, in the grid environment, usually parallel applications still run on a cluster. It is expected that the emerging lambda network technology can be used for the backbone networks of grids and improve the communication performance between computing nodes. In this paper, we show the potential benefit of the lambda network for the parallel applications in grid environment. Our measurement results reveal that the NAS parallel benchmark over lambda grid can achieve more than 50% higher performance than a single cluster case. In addition, the results show that the parallel programming library such as MPI still needs to be improved with respect to the tolerance on the network delay and the topology awareness.
To specify dataflow applications efficiently is one of the greatest challenges facing Network-on-Chip (NoC) simulation and exploration. BTS (Behavior-level Traffic Simulation) was proposed to specify behavior-level ap...
详细信息
To specify dataflow applications efficiently is one of the greatest challenges facing Network-on-Chip (NoC) simulation and exploration. BTS (Behavior-level Traffic Simulation) was proposed to specify behavior-level applications more efficiently than conventional message-passing programming model does. To alleviate the complexity in parallel programming, BTS has the computation tasks implemented as sequential modules with data shared among them. Also parameterization was proposed in BTS to produce pseudo messages pointing to the shared data, and to fulfill data-driven scheduling. As substitute for the conventional parallel applications, BTS-based ones inherit their computation-models and the underlying scheduling schemes. The pseudo messages are consistent with those in the ancestors in function and size. Then BTS-based applications and conventional ones will produce identical traffic and identical results for NoC simulation. Case studies showed that BTS may boost the application specification by reusing the existing sequential codes, especially domain-specific languages implemented as libraries of sequential sub-routines.
This paper addresses the issues of programming a multi-level parallel *** computer has an architecture that combines multi-level parallelism for efficient *** exploit the full potential of this architecture,special fe...
详细信息
This paper addresses the issues of programming a multi-level parallel *** computer has an architecture that combines multi-level parallelism for efficient *** exploit the full potential of this architecture,special features are added to its programming language along with special functions in its *** base language is similar to *** keep the original openCL hierarchical (global,local and private) memory organization while extending openCL with features and library functions for message passing and remote function *** also add short vectors types and operations that frequently used in graphics and image *** features and library functions facilitate effective parallel programming using a combination of multi-level parallelism.
This paper is devoted to the research of bitmap image processing based on wavelet functions. The Daubechies wavelet function was used as a mathematical model for filtering, compression and smoothing of two-dimensional...
详细信息
This paper is devoted to the research of bitmap image processing based on wavelet functions. The Daubechies wavelet function was used as a mathematical model for filtering, compression and smoothing of two-dimensional signals, because the analysis of existing wavelet functions showed that the Daubechies wavelet family is most effective for image processing. OpenMP parallel programming in C/C++ was used for the parallelization of computing processes in image processing problems.
parallel programming has to date remained inaccessible to the average scientific programmer. parallel programming languages are generally foreign to most scientific applications programmers who only speak Fortran. Aut...
详细信息
parallel programming has to date remained inaccessible to the average scientific programmer. parallel programming languages are generally foreign to most scientific applications programmers who only speak Fortran. Automatic parallelization techniques have so far proved unsuccessful in extracting large amounts of parallelism from sequential codes and do not encourage development of new, inherently parallel algorithms. In addition, there is a lack of consistency of programmer interface across architectures which requires programmers to invest a lot of effort in porting code from one parallel machine to another. This paper discusses the object oriented Fortran language and support routines developed at Mississippi State in support of parallelizing complex field simulations. This interface is based on Fortran to ease its acceptance by scientific programmers and is implemented on top of the Unix operating system for portability.< >
暂无评论