In this paper we present a runtime library design based on the two-phase collective I/O technique for irregular applications. The design is motivated by the requirements of a large number of ASCI (Accelerated Strategi...
详细信息
ISBN:
(纸本)0818682272
In this paper we present a runtime library design based on the two-phase collective I/O technique for irregular applications. The design is motivated by the requirements of a large number of ASCI (Accelerated Strategic computing Initiative) applications, although the design and interface is general enough to be used from any irregular applications. We present two designs, namely, "Collective I/O" and "Pipelined Collective I/O". In the first scheme, all processors participate in the I/O at the same time, making scheduling of I/O requests simpler but creating a possibility of contention at the I/O nodes. In the second approach, processors are grouped into several groups, SO that only one group performs I/O simultaneously, while the nest group performs communication to rearrange data, and this entire process is pipelined. This reduces the contention at the I/O nodes but requires more complicated scheduling and a possible degradation in communication performance. We obtained up to 40 MBytes/sec. application level performance on the Caltech's Intel Paragon (with 16 IO nodes, each containing one disk) which includes on-the-fly reordering costs. We observed up to 60 MBytes/sec on the ASCI/Red machine with only three I/O nodes (with RAIDS).
In this paper, we describe a new scheme for checkpointing parallel applications on message-passing scalable distributed memory systems. The novelty of our scheme is that a checkpointed application can be restored, fro...
详细信息
Decision support systems use On-Line Analytical Processing (OLAP) to analyze data by posing complex queries that require different views of data. Traditionally, a relational approach (ROLAP) has been taken to build su...
详细信息
ISBN:
(纸本)0818680679
Decision support systems use On-Line Analytical Processing (OLAP) to analyze data by posing complex queries that require different views of data. Traditionally, a relational approach (ROLAP) has been taken to build such systems. More recently, multi-dimensional database techniques (MOLAP) have been applied to decision-support applications. Data is stored in multidimensional arrays which is a natural way to express the multi-dimensionality of the enterprise and is more suited for analysis. Precomputed aggregate calculations in a Data cube can provide efficient query processing for OLAP applications. In this paper we present algorithms and results for in-memory data cube construction on distributed memory machines.
The aim of this paper is to present the results of our survey on the applications and problems arising from parallel processing in the area of Airlines Reservations Systems. We used the Reservation System of Olympic A...
详细信息
The aim of this paper is to present the results of our survey on the applications and problems arising from parallel processing in the area of Airlines Reservations Systems. We used the Reservation System of Olympic Airways as our basic model. We shall describe the systems, the problems which arise and finally we shall give our propositions and solutions.
Solving large computationally intensive problems requires significant processing power and the current trend in achieving this power is not by increasing the throughput of individual processors, but to sub-divide the ...
详细信息
Solving large computationally intensive problems requires significant processing power and the current trend in achieving this power is not by increasing the throughput of individual processors, but to sub-divide the problem into co-operating tasks so that many processors may be used to solve the problem. Stand-alone workstations, such as the PC and the SUN machines, delivering tens of millions of operations per second are common place, but to achieve high parallel efficiency, distributed systems must support rapid communication between a large number of processors on a single coherent network. parallel programming packages have been developed to assist in parallel program development, such as the parallel Virtual Machine (PVM) and High Performance Fortran (HPF). In this paper we present an analysis of PVM and HPF within the context of the boundary element method. Domain decomposition is used to sub-divide and distribute a two-dimensional potential problem onto a collection of networked stand-alone workstations using PVM and HPF. We observe that the development of parallel applications using HPF is very efficient and straight forward when compared with PVM. However, PVM offers a better performance efficiency and a greater degree of freedom than HPF, since PVM allows both coarse and fine-grained parallelism. The performance of PVM and HPF are further compared with that of the 3L parallel Fortran on a T800 transputer network.
The present paper deals with the simulation of the nonlinear and time dependent behaviour of complex structures in engineering. To overcome the limiting factors of such computer simulations - the computer run time and...
详细信息
The present paper deals with the simulation of the nonlinear and time dependent behaviour of complex structures in engineering. To overcome the limiting factors of such computer simulations - the computer run time and the memory requirement - we use an dynamic-explicit time integration procedure for the solution of the semi-discrete equations of motion, which is very well suited for parallel processing. At first we give a brief review of the theoretical background of the mechanical modelling and the dynamic-explicit technique for the solution of the semi-discrete equations of motion. Then the concept of parallelisation is discussed.
Two-dimensional (2D) Discrete Fourier Transform (DFT) frequently needs to be performed in the digital image processing. Although the computing time of 2D DFT can be dramatically reduced by using 2D Fast Fourier Transf...
详细信息
ISBN:
(纸本)0819425885
Two-dimensional (2D) Discrete Fourier Transform (DFT) frequently needs to be performed in the digital image processing. Although the computing time of 2D DFT can be dramatically reduced by using 2D Fast Fourier Transform (FFT), the processing speed of a very large array is yet intolerable. The development of parallel processing system promotes the application of 2D FFT. In this paper, we present the implementation of 2D FFT as a general procedure by row-column method and vector-radix method based on a general-purpose massively parallel processing system-DAWN 1000 developed in China. Even though the 2D FFT has parallel characteristics in nature, the requirement of corner-turning and the existence of data communication make its implementation more complicated. We analyze the impact of the machine capacity and the computing complexity on the algorithm efficiency and evaluate the implementation in terms of the arithmetic operations as well as the data transfer. The comparison of the two methods shows the fact that each method has its own advantages and disadvantages. Combining their traits, we design a new implementation algorithm concerning its flexibility, the efficiency and the complexity of the communication. As an example, we fulfill the spaceborne SAR Image processing by using the new approach.
Object dataflow is a popular approach used in parallel rendering. The data representing the 3D scene is statically distributed among processors and objects are fetched and cached only on demand. Most previous object d...
详细信息
Object dataflow is a popular approach used in parallel rendering. The data representing the 3D scene is statically distributed among processors and objects are fetched and cached only on demand. Most previous object dataflow methods were implemented on shared memory architectures and exploited spatial coherency to reduce hardware cache misses. In this paper, we propose an efficient model for object dataflow parallel volume rendering on message passing machines. The algorithm is introduced and its ray storage mechanism is used to support latency hiding by postponing computation on inactive rays. Memory usage is optimized by letting objects migrate and replicate at different processors rather than the common static assignments. Our cache-only-memory approach uses a distributed-directory scheme to trace the location of objects at other nodes. A mechanism to minimize network congestion was implemented which optimizes channel utilization. Unlike previous methods, our approach can benefit from temporal coherence and effectively minimizes communication costs during animation on limited-bandwidth multiprocessing environments. We report results of the algorithm's implementation on several platforms like Cray T3D, Convex SPP and DEC-alpha cluster of workstations (COWs), and achieved higher efficiency and scalability than existing algorithms.
In parallel and distributed systems, an important issue in managing a decentralized task queue is load balancing among multiple processors. In this paper, we propose a scheme for this problem by using a symmetric broa...
详细信息
In parallel and distributed systems, an important issue in managing a decentralized task queue is load balancing among multiple processors. In this paper, we propose a scheme for this problem by using a symmetric broadcast network (SBN) which provides an efficient and robust communication pattern between processors. We compare the performance of SBN-based load balancing algorithm with randomization-based algorithm, gradient algorithm, and extended gradient algorithm on a broad range of computing and communication platforms. All four algorithms were first implemented on an 8-processor Intel's iPSC-2, a hypercube-based multicomputer. Then, the programs were ported to parallel Virtual Machine (PVM). Using PVM, we compared all four algorithms on (i) an 8-processor bus-based Silicon Graphics multiprocessor (SGI), (ii) two DEC's Alpha workstations connected by a Local Area Network, and (iii) SGI and the two DEC Alpha's connected by Internet. We found that our SBN-based algorithm performed well over a wide range of workloads, and computer and communication configurations.
This paper considers whether the seemingly disparate fields of Computational Intelligence (CI) and computer architecture can profit from each others' principles, results and experience. In the process, we identify...
详细信息
ISBN:
(纸本)0818681306
This paper considers whether the seemingly disparate fields of Computational Intelligence (CI) and computer architecture can profit from each others' principles, results and experience. In the process, we identify important common issues, such as parallelism, distribution of data and control, granularity and regularity. We present two novel computer architectures which have profited from principles found in CI, and identify two constraints on CI to eliminate the hidden influence of the von Neumann model of computation.
暂无评论