In this paper we discuss the runtime support required for the parallelization of unstructured data-parallelapplications on nonuniform and adaptive environments. the approach presented is reasonably general and is app...
详细信息
ISBN:
(纸本)0818675829
In this paper we discuss the runtime support required for the parallelization of unstructured data-parallelapplications on nonuniform and adaptive environments. the approach presented is reasonably general and is applicable to a wide variety of regular as well as irregular applications. We present performance results for the solution of an unstructured mesh on a cluster of heterogeneous workstations.
Dedicated Cluster parallel Computers (DCPCs) are emerging as low-cost high performance environments for many important applications in science and engineering. A significant class of applicationsthat perform well on ...
详细信息
ISBN:
(纸本)0818675829
Dedicated Cluster parallel Computers (DCPCs) are emerging as low-cost high performance environments for many important applications in science and engineering. A significant class of applicationsthat perform well on a DCPC are coarse-grain applicationsthat involve large amounts of file I/O. Current research in parallel file systems for distributed systems is providing a mechanism for adapting these applications to the DCPC environment. We present the parallel Virtual File System (PVFS), a system that provides disk striping across multiple nodes in a distributedparallel computer and file partitioning among tasks in a parallel program. PVFS is unique among similar systems in that it uses a streams-based approach that represents each file access with a single set of request parameters and decouples the number of network messages from details of the files striping and partitioning. PVFS also provides support for efficient collective file accesses and allows overlapping file partitions. We present results of early performance experiments that show PVFS achieves excellent speedups in accessing moderately sized file segments.
High resolution real-time images can be obtained using the Phase Shift Beamforming method which utilises CORDIC to compute the large number of complex multiplications needed. An optimised word-parallel pipelined CORDI...
详细信息
ISBN:
(纸本)1864352094
High resolution real-time images can be obtained using the Phase Shift Beamforming method which utilises CORDIC to compute the large number of complex multiplications needed. An optimised word-parallel pipelined CORDIC architecture is introduced with appropriate controllers to form a fast beamforming system. Also described is an analogue front-end for implementing window functions and synchronous complex sampling. the current system is configured to form 2-d images, however using additional CORDIC processors a 3-d beamformer can be realised.
Custom computers use SRAM-based Field Programmable Gate Arrays as a co-processor resource in addition to the CPU. this allows algorithm developers to adapt not only the software but also the hardware of the computer o...
详细信息
Custom computers use SRAM-based Field Programmable Gate Arrays as a co-processor resource in addition to the CPU. this allows algorithm developers to adapt not only the software but also the hardware of the computer on an application-by-application basis. Algorithm speedups of up to hundreds of times compared to standard software have been reported. this paper investigates the applicability of custom computing techniques for video processingapplications.
In some real surroundings, signal source localization can more adequately be accomplished withdistributed source models. When the signal sources are distributed over an area, we cannot directly use well-known DOA est...
详细信息
In some real surroundings, signal source localization can more adequately be accomplished withdistributed source models. When the signal sources are distributed over an area, we cannot directly use well-known DOA estimation methods, because these methods are established based on the point source assumption. In this paper, we propose a two-dimensional distributed signal source model. then, we address the estimation of the elevation and azimuth angles of distributed sources based on the proposed model.
this paper presents the 'Planned Direct Transfer' programming model, developed by Mercury Computer Systems to meet the requirements of embedded high-performance computing applications. In this model, data tran...
详细信息
ISBN:
(纸本)0818672552
this paper presents the 'Planned Direct Transfer' programming model, developed by Mercury Computer Systems to meet the requirements of embedded high-performance computing applications. In this model, data transfers are 'Planned' before they occur, resulting in low software overhead execution;they are also 'Direct' - they do not require intermediate data copying. this paper locates the Planned Direct Transfer (PDT) model in the landscape of the standard approaches of Shared Memory and Message Passing.
Blocking locks are commonly used in parallel programs to improve application performance and system throughput. However, most implementations of such locks suffer from two major problems - latency and scalability. In ...
详细信息
ISBN:
(纸本)0818672552
Blocking locks are commonly used in parallel programs to improve application performance and system throughput. However, most implementations of such locks suffer from two major problems - latency and scalability. In this paper, we propose an implementation of blocking locks using scheduler adaptation which exploits the interaction between thread schedulers and locks. By experimentation using well-known multiprocessor applications on a KSR2 multiprocessor, we demonstrate how such an implementation considerably reduces the latency and improves the scalability of blocking locks.
this paper investigates the performance and robustness of the space-time integration technique by applying it to two different scenarios;a target moving in a straight line through a sonobuoy field;and a target underta...
详细信息
ISBN:
(纸本)1864352094
this paper investigates the performance and robustness of the space-time integration technique by applying it to two different scenarios;a target moving in a straight line through a sonobuoy field;and a target undertaking a major course change. Although most techniques will track a target moving in a straight line quite successfully many have difficulty Packing the target as it manoeuvres through a slow rum. the performance of this approach is compared withthe more traditional extended Kalman Filter algorithm where the advantages and dis-advantages of each technique is highlighted. Results are presented for scenarios using both simulated and sea trial data.
TOP-C is a task-oriented parallel C interface. It presents a master-slave task architecture that greatly eases the parallelization of code. It is intended for applications where a compiler would have difficulty recogn...
详细信息
ISBN:
(纸本)0818675829
TOP-C is a task-oriented parallel C interface. It presents a master-slave task architecture that greatly eases the parallelization of code. It is intended for applications where a compiler would have difficulty recognizing opportunities for data-parallelism. the model has been implemented for both shared memory processors and networks of workstations. there is also a sequential version useful during development, which runs the same application code. Ease-of-use has been a strong motivation behind its design. For this reason, TOP-C is organized in a SPMD style, with one primary subroutine call to invoke it. Its main features are: (a) task-parallelism, (b) a single shared, global data structure, and (c) restricted master-slave communication.
A model for virtual memory in a distributed memory parallel computer is proposed. It uses a novel parallel computing operating system framework and leads to the definition of two strategies for implementing parallel v...
详细信息
A model for virtual memory in a distributed memory parallel computer is proposed. It uses a novel parallel computing operating system framework and leads to the definition of two strategies for implementing parallel virtual memory. Careful analysis and simulation results indicate that dynamic page allocation performs better for applicationsthat exhibit some locality of reference of public data and for applications whose data space does not fit in the physical memory available. Static page allocation is more efficient in cases of poor locality and small data space (no virtual memory needed).
暂无评论