The authors describe the design and implementation of C40PVM, a PVM runtime environment for TMS320C40 networks. With our C40PVM runtime environment, parallel applications can then be easily developed on C40 systems an...
详细信息
The authors describe the design and implementation of C40PVM, a PVM runtime environment for TMS320C40 networks. With our C40PVM runtime environment, parallel applications can then be easily developed on C40 systems and be ported over to other parallel computing platforms. The performance of our runtime environment is also analyzed by using a DSP application on vector quantization.
Cloud computing has gained significant traction in recent years. The Map-Reduce framework is currently the most dominant programming model in cloud computing settings. In this paper, we describe Granules, a lightweigh...
详细信息
Cloud computing has gained significant traction in recent years. The Map-Reduce framework is currently the most dominant programming model in cloud computing settings. In this paper, we describe Granules, a lightweight, streaming-based runtime for cloud computing which incorporates support for the Map-Reduce framework. Granules provides rich lifecycle support for developing scientific applications with support for iterative, periodic and data driven semantics for individual computations and pipelines. We describe our support for variants of the Map-Reduce framework. The paper presents a survey of related work in this area. Finally, this paper describes our performance evaluation of various aspects of the system, including (where possible) comparisons with other comparable systems.
Irregular applications, which rely on pointer-based data structures, are often difficult to parallelize. The input-dependent nature of their execution means that traditional parallelization techniques are unable to ex...
详细信息
Irregular applications, which rely on pointer-based data structures, are often difficult to parallelize. The input-dependent nature of their execution means that traditional parallelization techniques are unable to exploit any latent parallelism in these algorithms. Instead, we turn to optimistic parallelism, where regions of code are speculatively run in parallel while runtime mechanisms ensure proper execution. The performance of such optimistically parallelized algorithms is often dependent on the schedule for parallel execution; improper choices can prevent successful parallel execution. We demonstrate this through the motivating example of Delaunay mesh refinement, an irregular algorithm, which we have parallelized optimistically using the Galois system. We apply several scheduling policies to this algorithm and investigate their performance, showing that careful consideration of scheduling is necessary to maximize parallel performance.
In this paper, we present "rules of thumb" for the efficient and straight-forward parallelization of cellular neural networks (CNNs) processing image data on cluster architectures. The rules result from the ...
详细信息
In this paper, we present "rules of thumb" for the efficient and straight-forward parallelization of cellular neural networks (CNNs) processing image data on cluster architectures. The rules result from the application and optimization of the simple but effective structural data parallel approach, which is based on the SPMD model. Digital gray-scale images were used to evaluate the optimized parallel cellular neural network program. The process of parallelizing the algorithm employs HPF to generate an MPI-based program.
Poor single event upset (SEU) and single event latchup (SEL) immunity are of major concern in high speed RF phase lock loops (PLLs) incorporated in many of current commercial satellites. As a result, greater demands a...
详细信息
Poor single event upset (SEU) and single event latchup (SEL) immunity are of major concern in high speed RF phase lock loops (PLLs) incorporated in many of current commercial satellites. As a result, greater demands are placed at the system level to compensate for this. These include reloading programming every clock cycle, parallel interfaces and redundancy, which result in increased size, weight, complexity and power. We present in this paper a 1.1 Ghz integer N PLL which is inherently SEL immune, has SEU rates less than 10/sup -9/ errors/bit-day (orders of magnitude better than currently available), excellent phase noise performance and standby current up to 100 krads(Si) total dose. This part is currently being manufactured on Peregrine Semiconductor's 0.8 /spl mu/m ultra thin silicon on sapphire UTSi/sup R/ process.
Peachy parallel Assignments are high-quality assignments for teaching parallel and distributed computing. They are selected competitively for presentation at the Edu* workshops. All of the assignments have been succes...
详细信息
ISBN:
(数字)9781665422963
ISBN:
(纸本)9781665404495
Peachy parallel Assignments are high-quality assignments for teaching parallel and distributed computing. They are selected competitively for presentation at the Edu* workshops. All of the assignments have been successfully used in class and they are selected based on the their ease of adoption by other instructors and for being cool and inspirational to students. This paper presents a paper-and-pencil assignment asking students to analyze the performance of different system configurations and an assignment in which students parallelize a simulation of the evolution of simple living organisms.
Pervasive Grid Computing Platforms include centralized computing nodes (e. g. parallel servers) as well as decentralized and mobile devices. Pervasive Grid applications include data- and computing-intensive components...
详细信息
Pervasive Grid Computing Platforms include centralized computing nodes (e. g. parallel servers) as well as decentralized and mobile devices. Pervasive Grid applications include data- and computing-intensive components which can be mapped also onto decentralized and mobile nodes. The effective and practical success of this mapping resides also in deriving proper configurations of applications which consider the limited memory capabilities of those resources. In this paper we target this issue by showing how we can study and configure the memory requirements of an Emergency Management application. We present our solutions by using the ASSISTANT programming model for Pervasive Grid applications.
Typical grid computing scenarios involve many distributed hardware and software components. The more components that are involved, the more likely it is that one of them may fail. In order for grid computing to succee...
详细信息
ISBN:
(纸本)9780769520261
Typical grid computing scenarios involve many distributed hardware and software components. The more components that are involved, the more likely it is that one of them may fail. In order for grid computing to succeed, there must be a simple mechanism to determine which component failed and why. Instrumentation of all grid applications and middleware is an important part of the solution to this problem. However, it must be possible to control and adapt the amount of instrumentation data produced in order to not be flooded by this data. We describe a scalable, high-performance instrumentation activation mechanism that addresses this problem.
This tutorial provides an opportunity to experiment with a new language designed to support the safe, secure, and productive development of parallel programs. ParaSail is a new language with pervasive parallelism coup...
详细信息
ISBN:
(纸本)9781450310284
This tutorial provides an opportunity to experiment with a new language designed to support the safe, secure, and productive development of parallel programs. ParaSail is a new language with pervasive parallelism coupled with extensive compile-time checking of annotations in the form of assertions, preconditions, postconditions, etc. ParaSail does all checking at compile time, and eliminates race conditions, null dereferences, uninitialized data access, numeric overflow, out of bounds indexing, etc. as well as statically checking the truth of all user-written assertions. After a short introduction to the language, attendees will receive a prototype ParaSail compiler and an accompanying ParaSail Virtual Machine interpreter for writing and testing ParaSail programs. The tutorial/workshop will finish with a group discussion and feedback on the experience of using this new language.
In this paper, we analyze the performance of the floating point digital signal processor (DSP) TMS320C6711 for an implementation of video coding motion. Two relevant motion estimation techniques were implemented: BMA ...
详细信息
In this paper, we analyze the performance of the floating point digital signal processor (DSP) TMS320C6711 for an implementation of video coding motion. Two relevant motion estimation techniques were implemented: BMA (block matching algorithm) and BMGT (block matching using geometric transforms). These have been combined with fast block matching algorithms to speed up the process. In order to increase the DSP performance, we have optimized some programming mechanisms like: the level of code parallelism, hand designed assembly code and an efficient usage of internal memory as cache. This implementation has shown that real-time motion estimation of BMA type, can be implemented in this DSP. However. BMGT type motion estimation cannot be done by one DSP alone in-real time applications, due to its high computational complexity.
暂无评论