COSMO-SkyMed constellation will acquire data from its four SAR satellites in several image modes, and will generate focused data products. As images will be acquired at fine geometric resolution and will cover medium ...
详细信息
COSMO-SkyMed constellation will acquire data from its four SAR satellites in several image modes, and will generate focused data products. As images will be acquired at fine geometric resolution and will cover medium sized swath, the SAR processing involved will result well suited to parallel programming implementation.
Object-oriented programming, design patterns, and frameworks are common techniques that have been used to reduce the complexity of sequential programming. We have applied these techniques to the more difficult domain ...
详细信息
Object-oriented programming, design patterns, and frameworks are common techniques that have been used to reduce the complexity of sequential programming. We have applied these techniques to the more difficult domain of parallel programming. This paper describes CO 2 P 3 S, a pattern-based parallel programming system that generates parallel programs from parallel design patterns. We demonstrate CO 2 P 3 S by applying a new design pattern called the Wavefront pattern to three problems. We show that it is quick and easy to use CO 2 P 3 S to generate structurally correct parallel programs with good speed-ups on shared-memory computers.
The article considers the theoretical basis and structural solutions of Galua's basis for the construction of microprocessor computer machine elements. The methodology for realization of vertical information techn...
详细信息
The article considers the theoretical basis and structural solutions of Galua's basis for the construction of microprocessor computer machine elements. The methodology for realization of vertical information technology is presented.
In this paper, we use the tensor product notation as the framework of a programming methodology for designing block recursive algorithms on various computer networks. In our previous works, we propose a programming me...
详细信息
In this paper, we use the tensor product notation as the framework of a programming methodology for designing block recursive algorithms on various computer networks. In our previous works, we propose a programming methodology for designing block recursive algorithms on shared memory and distributed-memory multiprocessors without considering the interconnection of processors. We extend the work to consider the block recursive algorithms on direct networks and multistage interconnection networks. We use parallel prefix computation as an example to illustrate the methodology. First, we represent the prefix computation problem as a computational matrix which may not be suitable for deriving algorithms on specific computer networks. In this methodology, we add two steps to derive tensor product formulas of parallel prefix algorithms on computer networks: (1) decompose the computational matrix into two submatrices, and (2) construct an augmented matrix. The augmented matrix can be factorized so that each term is a tensor product formula and can fit into a specified network topology. With the augmented matrix, the input data is also extended. It means, in addition to the input data, an auxiliary vector as temporary storage is used The content Of temporary storage is relevant to the decomposition of the original computational matrix. We present the methodology to derive various parallel prefix algorithms on hypercube, omega, and baseline networks and verify correctness of the resulting tensor product formulas using induction.
Concurrent programming may allow the programmer to create fast responsive systems without the cost overhead of full parallelism. One method of achieving concurrency is to use multiple threads of execution in a single ...
详细信息
ISBN:
(纸本)0780372522
Concurrent programming may allow the programmer to create fast responsive systems without the cost overhead of full parallelism. One method of achieving concurrency is to use multiple threads of execution in a single address space. By declaring many threads within the confines of a single process, a programmer can achieve potential parallelism at low overhead. Can concurrent implementation always outperform sequential implementation on a uniprocessor? This project involves both sequentially and concurrently implementing a sort algorithm, called Bucket Sort. The core research aims to investigate the performance factors by several variations of the number of threads and problem size. Experimental results are presented on both implementations, and the design considerations of concurrent programming are also studied.
In this extended abstract we sketch the employment of programmable logic for the acceleration of the simulation of pulsed neural networks. We compare our approach to solutions which are based on DSPs and digital neuro...
详细信息
In this extended abstract we sketch the employment of programmable logic for the acceleration of the simulation of pulsed neural networks. We compare our approach to solutions which are based on DSPs and digital neuroprocessors. Our solution is a rapid prototyping accelerator board which is based on a data flow concept. The accelerator provides three module sockets with a rather simple 32Bit interface. The design is focused on a maximal data through-put to and from each module. Due to the architecture a very high parallelism between the modules can be achieved Two programmable devices on each module are supported by the on-board programming and test unit, which provides in-circuit programming by the host during operation. As a result the accelerator delivers a high performance and flexibility without introducing a complex interface or handling. Any programmable device, FPGA, CPLD or special architectures like Kress-Arrays may be used on a module of this accelerator board, hence coarse and fine grain architectures can be used.
Program performance may be improved by efficiently programming some key sections of the software. We present a methodology for converting selected portions of source code into automatically scalable multithreaded rout...
详细信息
Program performance may be improved by efficiently programming some key sections of the software. We present a methodology for converting selected portions of source code into automatically scalable multithreaded routines, without forcing programmers to concentrate on parallel programming issues. These developed routines can be reused across various projects, operating systems and system architectures. To support this methodology two separate but tightly coupled tools have been developed -PARSA/sup (TM)/ software development environment (SDE) and the ThreadMan/sup (TM)/ thread manager. The SDE addresses programming issues by allowing a graphical object based approach to develop multithreaded routines that abstracts the users from parallel programming. ThreadMan manages the software developed using SDE. ThreadMan is a user-level thread manager that automatically spawns and schedules threads at runtime. Two examples have been developed using this methodology to demonstrate that there is virtually no degradation in performance when compared to sequential code, in a single processor system and scalability is achieved as the number of processors is increased.
As the Internet began its exponential growth into a global information environment, software was often unreliable, slow and had difficulty in interoperating with other systems. Supercomputing node counts also continue...
详细信息
As the Internet began its exponential growth into a global information environment, software was often unreliable, slow and had difficulty in interoperating with other systems. Supercomputing node counts also continue to follow high growth trends. Supercomputer and grid resource management software must mature into a reliable computational platform in much the same way that web services matured for the Internet. DOGMA The Next Generation (DOGMA-NG) improves on current resource management approaches by using tested off-the-shelf enterprise technologies to build a robust, scalable, and extensible resource management platform. Distributed web service technologies constitute the core of DOGMA-NG's design and provide fault tolerance and scalability. DOGMA-NG's use of open standard web technologies and efficient management algorithms promises to reduce management time and accommodate the growing size of future supercomputers. The use of web technologies also provides the opportunity for anew parallel programming paradigm, enterprise web services parallel programming, that also gains benefit from the scalable, robust component architecture.
A parallel programming paradigm dictates the way in which an application is to be expressed. It also restricts the algorithms that may be used in the application. Unfortunately, runtime systems for parallel computing ...
详细信息
A parallel programming paradigm dictates the way in which an application is to be expressed. It also restricts the algorithms that may be used in the application. Unfortunately, runtime systems for parallel computing often impose a particular programming paradigm. For a wider choice of algorithms, it is desirable to support more than one paradigm. In this paper we consider SilkRoad II, a variant of the Cilk runtime system for cluster computing. What is unique about SilkRoad II is its memory model which supports multiple paradigms with the underlying software distributed shared memory. The RC-dag memory consistency model of SilkRoad II is introduced. Our experimental results show that the stronger RC-dag can achieve performance comparable to LC of Cilk while supporting a bigger set of paradigms with rather good performance.
暂无评论