In this paper we present the results of a parallel implementation of a heart field simulation algorithm. The application of biomagnetic fields offers a wide range for using parallel algorithms. Pathological changes in...
详细信息
ISBN:
(纸本)0818678763
In this paper we present the results of a parallel implementation of a heart field simulation algorithm. The application of biomagnetic fields offers a wide range for using parallel algorithms. Pathological changes in the human body, especially in the heart muscle, can be diagnosed and localised by means of biomagnetic field parameters. The gain of this diagnose method is to fit an individual reference modell of the heart field of a patient. Based on differences between the reference modell and the real measured biomagnetic field parameters, the type and the position of defects in the heart can be located. The most time consuming components of the whole algorithm are the matrix computations, especially the matrix inversion. The matrix inversion can be implemented on a paralleldistributed memory system. In this paper we discuss the routing, the parallel matrix inversion, and the speed up for different network topologies that depends on the number of processors and different problem sizes.
On the point of that it is very difficult to keep load balancing among processors for the nonuniform loop in compile-time and it must be at the price of extra overhead to use dynamic methods, this paper has proposed a...
详细信息
ISBN:
(纸本)0818678763
On the point of that it is very difficult to keep load balancing among processors for the nonuniform loop in compile-time and it must be at the price of extra overhead to use dynamic methods, this paper has proposed an adaptive hybrid scheduling way, in which the processes of distribution of loop are divided into a few rounds and the block size in each round is determined adaptively according to the average overhead due to dynamic scheduling. Several experiment results have also exposed the effect of scheduling parameter, which could be selected by programmers according to the probability that a fetching processor may not perform an additional task fetching.
This paper presents a new rapid thread replacement mechanism which is important in multithread technology. Analysis to the memory system indicates that the memory utilization decreases with the increase of cache hit r...
详细信息
ISBN:
(纸本)0818678763
This paper presents a new rapid thread replacement mechanism which is important in multithread technology. Analysis to the memory system indicates that the memory utilization decreases with the increase of cache hit ratio. The parallelism between thread computation and thread replacement is found by analyzing their working processes. Based on these, we advance a rapid multithread replacement mechanism which overlaps the thread replacement with thread computation. More especially, with finite hardware contexts, this mechanism can play the same role of infinite contexts by tolerating the replacement overhead. By modifing the general thread switching model, we bulid the thread replacement model and evaluate this mechanism in theory and experiment methods. At last, we discuss the hardware implementation and put forward the problems to be resolved in the future.
Fast and efficient communication is one of the major design goals not only for parallel systems but also for clusters of workstations. The proposed model of the high performance communication device ATOLL (1) features...
详细信息
ISBN:
(纸本)0818678763
Fast and efficient communication is one of the major design goals not only for parallel systems but also for clusters of workstations. The proposed model of the high performance communication device ATOLL (1) features very low latency for the start of communication operations and reduces the software overhead for communication specific functions. To close the gap between off-the-shelf microprocessors and the communication system a highly sophisticated processor interface implements atomic start of communication, MMU support, and a flexible event scheduling scheme. The interconnectivity of ATOLL provided by four independent network ports combined with cut-through routing allows the configuration of a large variety of network topologies. A software transparent error correction mechanism significantly reduces the required protocol overhead. The presented simulation results promise high performance and low-latency communication.
This paper proposes an efficient parallel approach to texture classification for image retrieval. The idea behind this method is to pre-extract texture features in terms of texture energy measurement associated with a...
详细信息
This paper proposes an efficient parallel approach to texture classification for image retrieval. The idea behind this method is to pre-extract texture features in terms of texture energy measurement associated with a 'tuned' mask and store them in a multi-scale and multi-orientation texture class database via a two-dimensional linked list for query. Thus each texture class sample in the database can be traced by its texture energy in a two-dimensional row sorted matrix. The parallel searching strategies are introduced for fast identifying the entities closest to the input texture throughout the given texture energy matrix. In contrast to the traditional search methods, our approach incorporates different computation patterns for different cases of available processor numbers and concerns with robust and work-optimal parallel algorithms for row-search and minimum-find based on the accelerated cascading technique and the dynamic processor allocation scheme. Applications of the proposed parallel search and multisearch algorithms to both single image classification and multiple image classification are discussed. The time complexity analysis shows that our proposal will speed up the classification tasks in a simple but dynamic manner. Examples are presented of the texture classification task applied to image retrieval of Brodatz textures, comprising various orientations and scales.
Clusters of networked commercial, off-the-shelf (COTS) workstations are presently used for computation-intensive tasks that were typically assigned to parallel computers in the past. However, it is hardly possible to ...
详细信息
ISBN:
(纸本)0818678135
Clusters of networked commercial, off-the-shelf (COTS) workstations are presently used for computation-intensive tasks that were typically assigned to parallel computers in the past. However, it is hardly possible to predict the timing behavior of such systems or to give guarantees about execution times. In this paper we show how our SONiC(Shared Objects Net-interconnected Computer) system can control timing and partitioning of a workstation as a step towards a distributed real-time system built from COTS components. SONiC provides a class-based programming interface for creation of replicated shared objects of arbitrary, user-defined sizes. Weak consistency protocols are employed to improve system's performance. Our Scheduling Service ensures the requested interactive behavior of a workstation while simultaneously giving a specified number of CPU cycles to parallel tasks. Using off-line scheduling methods we are able to implement real-time Guaranteed Services on COTS workstations.
Automatic model generation is studied as part of a hybrid modeling strategy using simulation for performance analysis. Two major steps have to be carried out in this context. The program which is being investigated ha...
详细信息
ISBN:
(纸本)0818678763
Automatic model generation is studied as part of a hybrid modeling strategy using simulation for performance analysis. Two major steps have to be carried out in this context. The program which is being investigated has to be translated into a model. During the translation, runtime has to be estimated for numerous computational blocks of statements which are replaced by simple delays. For performance estimation, the model has finally to be analyzed by an evaluation teal. Model evaluation as well as runtime estimation of computational blocks requires values of some variables, the control variables. We discuss the problem of automatic definition of control variables in general and consider some important cases. For the implementation of a model generating tool, we concentrate on parallel Fortran programs using message passing primitives for processes communication.
The proceedings contains 32 papers. Topics discussed include algorithms for parallelization, distributed computer systems and networking, software tools and environments, parallel finite and boundary elements, applica...
详细信息
The proceedings contains 32 papers. Topics discussed include algorithms for parallelization, distributed computer systems and networking, software tools and environments, parallel finite and boundary elements, applications in fluid flour and applications in applied science.
parallelcomputing on clusters of workstations is receiving much attention from the research community. Unfortunately, many aspects of parallelcomputing over this parallelcomputing engine is not very well understood...
详细信息
parallelcomputing on clusters of workstations is receiving much attention from the research community. Unfortunately, many aspects of parallelcomputing over this parallelcomputing engine is not very well understood. Some of these issues include the workstation architectures, the network protocols, the communication-to-computation ratio, the load balancing strategies, and the data partitioning schemes. The aim of this paper is to assess the strengths and limitations of a cluster of workstations by capturing the effects of the above issues. This has been achieved by evaluating the performance of this computing environment in the execution of a parallel ray tracing application through analytical modeling and extensive experimentation.
The composition and performance of music is a plural activity that combines the outcomes of a number of procedures, many of which involve functions that operate in parallel. in terms of sound-synthesis operations, a s...
详细信息
The composition and performance of music is a plural activity that combines the outcomes of a number of procedures, many of which involve functions that operate in parallel. in terms of sound-synthesis operations, a significant number of generative and signal-processing operations involve a combination of concurrent elements, ranging from the production of simultaneous notes by a single instrument to the superimposition of totally independent outputs, where a number of different components contribute to the audio spectrum. The traditional computer processor is a serial device, restricted for the most part to the execution of instructions as a single stream of events. Thus processes that require the aggregation of functions executed in parallel must be simulated by some means of cyclical tasking and data accumulation. In the case of digital audio synthesis and signal processing applications, the resultant effects on overall processor performance quickly become significant, thus limiting the number of individual components that can be handled in real time. About ten years ago, the Music Technology Group at the University of Durham started a series of investigations into the construction of computing architectures for audio applications that embraced a significant degree of true parallelism, based in the first instance on the INMOS Transputer. This article describes some of the most important outcomes of this particular Line of investigation, and highlights aspects that hold a particular relevance for future designs of parallel audio processors.
暂无评论