the proceedings contain 69 papers. the special focus in this conference is on parallel Numerics, parallel Computing in Image processing, Video processing, and Multimedia. the topics include: Non-standard parallel solu...
ISBN:
(纸本)3540656413
the proceedings contain 69 papers. the special focus in this conference is on parallel Numerics, parallel Computing in Image processing, Video processing, and Multimedia. the topics include: Non-standard parallel solution strategies for distributed sparse linear systems;optimal tridiagonal solvers on mesh interconnection networks;parallel pivots LU algorithm on the cray T3E;experiments withparallel one-sided and two-sided algorithms for SVD;combined systolic array for matrix portrait computation;a class of explicit two-step runge-kutta methods with enlarged stability regions for parallel computers;a parallel strongly implicit algorithm for solution of diffusion equations;a parallel algorithm for lagrange interpolation on k-ary n-cubes;long range correlations among multiple processors;a monte-carlo method with inherent parallelism for numerical solving partial differential equations with boundary conditions;blocking techniques in numerical software;HPF and numerical libraries;an object library for parallel sparse array computation;performance analysis and derived parallelization strategy for a SCF program at the hartree fock level;computational issues in optimizing ophthalmic lens;parallel finite element modeling of solidification processes;architectural approaches for multimedia processing;on parallel reconfigurable architectures for image processing;parallel multiresolution image segmentation with watershed transformation and solving irregular inter-processor data dependency in image understanding tasks.
In this paper, we have designed an efficient parallel algorithm for performing 3 D image reconstruction. In our framework, we have considered 3 D image to be reconstructed from a series of 2 D images, produced using U...
详细信息
In this paper, we have designed an efficient parallel algorithm for performing 3 D image reconstruction. In our framework, we have considered 3 D image to be reconstructed from a series of 2 D images, produced using Ultrasonography, Computer Tomography, etc. the paper discusses a general parallel algorithm for 3 D image reconstruction over CRCW, CREW and EREW PRAM models. We have developed efficient implementations of this algorithm over a vector machines, a distributed system comprising of a cluster of Work Stations and various interconnection network like mesh network and reconfigurable bus network. the performance of the above algorithms are tested using simulation experiments performed for 3 D image reconstruction of the vitreous region of the eye using ophthalmic ultrasonograms. A novel approximation scheme has also been proposed for a drastic improvement in performance for specific kinds of image. Results indicate the time complexities of the algorithms are in resonance with expected theoretical values and image obtained has a uncompromising level of accuracy.
Given a typical parallel system and a collection of applications that are to execute on the system, a common problem is determining an effective allocation of processors among the applications. In this paper a learnin...
详细信息
ISBN:
(纸本)1581131461
Given a typical parallel system and a collection of applications that are to execute on the system, a common problem is determining an effective allocation of processors among the applications. In this paper a learning approach is applied to processor allocation. the approach is to use a stochastic learning automaton (SLA) as a decision tool. An SLA uses values of the current state description, makes an allocation decision, evaluates its decision at some later time, modifies its decision making process, and tries to find the best allocation strategy by learning from its previous mistakes. the method is applied to the problem of allocating processors to parallel applications in a distributed system such as a cluster of workstations, and is validated through simulation. the result of this study show that a learning approach that utilizes a stochastic learning automaton is effective at making processor allocation decisions in a parallel system.
this paper presents a system to produce efficient implementations of parallel array-based algorithms from high-level specifications. It is structured as a transformation through a series of progressively more detailed...
详细信息
ISBN:
(纸本)3540664432
this paper presents a system to produce efficient implementations of parallel array-based algorithms from high-level specifications. It is structured as a transformation through a series of progressively more detailed representations. this allows the use of high-level programming features without losing the fine control of low-level languages. During the transformation process, parallel implementation decisions are introduced. Finally, a representation is reached which can be translated to C+MPI.
Withthe proliferation of workstation clusters connected by high-speed networks, providing efficient system support for concurrent applications engaging in nontrivial interaction has become an important problem. Two p...
详细信息
Withthe proliferation of workstation clusters connected by high-speed networks, providing efficient system support for concurrent applications engaging in nontrivial interaction has become an important problem. Two principal barriers to harnessing parallelism are: one, efficient mechanisms that achieve transparent dependency maintenance while preserving semantic correctness, and two, scheduling algorithmsthat match coupled processes to distributed resources while explicitly incorporating their communication costs. this paper describes a set of performance features, their properties, and implementation in a system support environment called DUNES that achieves transparent dependency maintenance - IPC, file access, memory access, process creation/termination, process relationships - under dynamic load balancing. the two principal performance features are push/pull-based active and passive end-point caching and communication-sensitive load balancing. Collectively, they mitigate the overhead introduced by the transparent dependency maintenance mechanisms. Communication-sensitive load balancing, in addition, affects the scheduling of distributed resources to application processes where both communication and computation costs are explicitly taken into account. DUNES' architecture endows commodity operating systems with distributed operating system functionality while achieving transparency with respect to their existing application base. DUNES also preserves semantic correctness with respect to single processor semantics. We show performance measurements of a UNIX based implementation on Sparc and x86 architectures over high-speed LAN environments. We show that significant performance gains in terms of system throughput and parallel application speed-up are achievable.
Detectors for high data rate wideband code-division multiple access (CDMA) networks must contend with both multiple access interference (MAI) and intersymbol interference (ISI). Our reverse link detector suppresses th...
详细信息
ISBN:
(纸本)0780356535
Detectors for high data rate wideband code-division multiple access (CDMA) networks must contend with both multiple access interference (MAI) and intersymbol interference (ISI). Our reverse link detector suppresses the MAI and ISI with a feedforward filter (FFF) which linearly processes the samples at the output of parallel chip matched filters (CMFs), and a feedback filter (FBF) which linearly processes detected symbols. We propose a fully-connected (FC) architecture whose synergy arises from sharing FFF and FBF contents among users to enable efficient multiuser recursive least-squares (RLS) adaptation towards the minimum mean-square error (MMSE) coefficients. Simulations validate the theoretical BER analysis and demonstrate that the FC detector trains in several hundred symbols, avoids severe error propagation and supports 8 simultaneous 2 Mb/s QPSK users in an asynchronous CDMA system with a processing gain of 8, a delay spread of 1.25 us, an average SNR of 15 dB, with uncoded BERs less than 10-4 and 10-4 with 97% and 99.3% probabilities, respectively.
the problem of maintenance of materialized views has been the object of increased research activity recently mainly because of applications related to data warehousing. Many sequential view maintenance algorithms are ...
详细信息
A dynamic speculative multithreaded processor automatically extracts thread level parallelism from sequential binary applications without software support. the hardware is responsible for partitioning the program into...
详细信息
A dynamic speculative multithreaded processor automatically extracts thread level parallelism from sequential binary applications without software support. the hardware is responsible for partitioning the program into threads and managing inter-thread dependencies. Current published dynamic thread partitioning algorithms work by detecting loops, procedures, or partitioning at fixed intervals. Research has thus far examined these algorithms in isolation from one another. this paper makes two contributions. First, it quantitatively compares different dynamic partitioning algorithms in the context of a fixed architecture. the architecture is a single-chip shared memory multiprocessor enhanced to allow thread and value speculation. Second, this paper presents a new dynamic partitioning algorithm called MEM-slicing. Insights into the development and operation of this algorithm are presented. the technique is particularly suited to irregular, non-numeric programs, and greatly outperforms other algorithms in this domain. MEM-slicing is shown to be an important tool to enable the automatic parallelization of irregular binary applications. Over SPECint95, an average speedup of 3.4 is achieved on 8 processors.
In the solution of large-scale numerical problems, parallel computing is becoming simultaneously more important and more difficult. the complex organization of today's multi-processors with several memory hierarch...
详细信息
In the solution of large-scale numerical problems, parallel computing is becoming simultaneously more important and more difficult. the complex organization of today's multi-processors with several memory hierarchies has forced the scientific programmer to make a choice between simple but unscalable code and scalable but extremely complex code that does not port to other architectures. this paper describes how the SMARTS runtime system and the POOMA C++ class library for high-performance scientific computing work together to exploit data parallelism in scientific applications while hiding the details of managing parallelism and data locality from the user. We present innovative algorithms, based on the macro-dataflow model, for detecting data parallelism and efficiently executing data-parallel statements on shared-memory multiprocessors. We also describe how these algorithms can be implemented on clusters of SMPs.
A dynamic speculative multithreaded processor automatically extracts thread level parallelism from sequential binary applications without software support. the hardware is responsible for partitioning the program into...
详细信息
A dynamic speculative multithreaded processor automatically extracts thread level parallelism from sequential binary applications without software support. the hardware is responsible for partitioning the program into threads and managing inter-thread dependencies. Current published dynamic thread partitioning algorithms work by detecting loops, procedures, or partitioning at fixed intervals. Research has thus far examined these algorithms in isolation from one another. this paper makes two contributions. First, it quantitatively compares different dynamic partitioning algorithms in the context of a fixed architecture. the architecture is a single-chip shared memory multiprocessor enhanced to allow thread and value speculation. Second, this paper presents a new dynamic partitioning algorithm called MEM-slicing. Insights into the development and operation of this algorithm are presented. the technique is particularly suited to irregular, non-numeric programs, and greatly outperforms other algorithms in this domain. MEM-slicing is shown to be an important tool to enable the automatic parallelization of irregular binary applications. Over SPECint95, an average speedup of 3.4 is achieved on 8 processors.
暂无评论