This paper presents an efficient parallel algorithm for computing the mutual range-join of N sets of numbers on shared-nothing hypercube computers. The algorithm iteratively joins each set to the mutual range-join of ...
详细信息
This paper presents an efficient parallel algorithm for computing the mutual range-join of N sets of numbers on shared-nothing hypercube computers. The algorithm iteratively joins each set to the mutual range-join of the preceding sets. Each join is performed on all processors of the hypercube in parallel. The algorithm uses a global sorting method to distribute the elements of the first set evenly across all processors in increasing order, a new data balancing technique to distribute the elements of subsequent sets to match the intermediate set at each processor and to compensate for join skew, and a new efficient local range-join procedure. We analyse the performance of this algorithm and demonstrate that it improves on the best previously published algorithm for this problem when the join selectivity factor is small. The method can also be applied to similar problems such as band-join and equi-join.
We present the Operator design pattern which can be used for the design of both sequential and data parallel applications. To reach this goal, we show how the participants of this pattern can be implemented either for...
详细信息
We present the Operator design pattern which can be used for the design of both sequential and data parallel applications. To reach this goal, we show how the participants of this pattern can be implemented either for a sequential or a parallel execution. Besides, reusing the sequential design for a parallel application decreases the cost of parallelization by allowing the maintenance of a unique application for the two execution environments. The proposed approach for parallel programming does not require any dedicated compiler or code pre-processing. Nothing but object oriented features, such as inheritance and polymorphism, is used to provide the distributed behaviour of the parallel participants of the pattern. The Operator pattern can help to solve many design issues in relation with the development of reusable software components for data collection processing. Moreover, we are confident that many programmers who want to migrate their applications towards parallelism can find it helpful.
Collective communications such as broadcast and reduction are commonly used in data parallel programs. It is important to understand the performance of such primitive communications to characterize parallel systems an...
详细信息
Collective communications such as broadcast and reduction are commonly used in data parallel programs. It is important to understand the performance of such primitive communications to characterize parallel systems and analyze the performance of parallel applications running on specific parallel systems. We measured the performance of collective communication operations on several multiprocessor systems. In this paper, we report experimental results for collective communication performance on distributed memory systems. We also describe the performance prediction of data parallel programs using the performance of the primitives.
We propose a new approach for developing parallel I/O- and compute-intensive applications. At a high level of abstraction, a macro data flow description describes how processing and disk access operations are combined...
详细信息
We propose a new approach for developing parallel I/O- and compute-intensive applications. At a high level of abstraction, a macro data flow description describes how processing and disk access operations are combined. This high-level description (CAP) is precompiled into compilable and executable C++ source language. parallel file system components specified by CAP are offered as reusable CAP operations. Low-level parallel file system components can, thanks to the CAP formalism, be combined with processing operations in order to yield efficient pipelined parallel I/O and compute intensive programs. The underlying parallel system is based on commodity components (PentiumPro processors, Fast Ethernet) and runs on top of WindowsNT. The CAP-based parallel program development approach is applied to the development of an I/O and processing intensive tomographic 3D image visualization application. Configurations range from a single PentiumPro I-disk system to a four PentiumPro 27-disk system. We show that performances scale well when increasing the number of processors and disks. With the largest configuration, the system is able to extract in parallel and project into the display space between three and four 512/spl times/512 images per second. The images may have any orientation and are extracted from a 100 MByte 3D tomographic image striped over the available set of disks.
Software development for high-performance (parallel/distributed) computing (HPC) is a non-trivial process; its complexity can be primarily attributed to the increased degrees of freedom that have to be resolved and tu...
详细信息
Software development for high-performance (parallel/distributed) computing (HPC) is a non-trivial process; its complexity can be primarily attributed to the increased degrees of freedom that have to be resolved and tuned in such an environment. Performance prediction tools enable a developer to evaluate various available design alternatives and can assist in HPC application software development. In this paper, we first present a novel "interpretive" approach for accurate and cost-effective performance prediction. The approach has been used to develop an interpretive HPF/Fortran 90D application performance prediction framework. The accuracy and usability of the performance prediction framework are experimentally validated. We then outline the stages typically encountered during application software development for parallel/distributed HPC and highlight the significance and requirements of a performance prediction tool at the relevant stages. Numerical results using benchmarking kernels and application codes are presented to demonstrate the application of the interpretive performance prediction framework at different stages of the software development process.
Gang scheduling is a job scheduling policy for parallel computers that combines elements of space-sharing and time-sharing. In this paper we analyze the performance of gang scheduling policies that allow the remapping...
详细信息
Gang scheduling is a job scheduling policy for parallel computers that combines elements of space-sharing and time-sharing. In this paper we analyze the performance of gang scheduling policies that allow the remapping of an executing job to a new set of processors. Most previously proposed gang-scheduling policies do not allow such job remapping under the assumption that it is prohibitively expensive. Through a detailed trace-driven simulation, we analyze the tradeoff between the benefits and overheads of such job relocation. Our results show that gang-scheduling policies that support such job relocation offer significant performance gains over policies that do not use remapping.
The emergence of small portable computers and the advances in wireless networking have made mobile computing today a reality. Information systems and databases are among the applications that make mobile computing att...
详细信息
The emergence of small portable computers and the advances in wireless networking have made mobile computing today a reality. Information systems and databases are among the applications that make mobile computing attractive. While the topic of querying data in wireless and mobile systems has received a lot of attention, techniques to efficiently update data in these systems while providing transaction semantics are not fully developed. We present a novel protocol that uses the broadcast facility to help mobile units do some of the work of verifying if the transactions being run by them need to be aborted. Only when the mobile unit cannot detect any conflict is the server involved in completing the verification. Of course, if the transaction can commit, the server will install the valves in the central database and notify the mobile units (again, using the broadcast channel). The protocol uses a modified version of optimistic control. We study the performance of the protocol by means of a detailed simulation.
Decision support systems use online analytical processing (OLAP) to analyze data by posing complex queries that require different views of data. Traditionally, a relational approach (ROLAP) has been taken to build suc...
详细信息
Decision support systems use online analytical processing (OLAP) to analyze data by posing complex queries that require different views of data. Traditionally, a relational approach (ROLAP) has been taken to build such systems. More recently, multi-dimensional database techniques (MOLAP) have been applied to decision-support applications. Data is stored in multi-dimensional arrays, which is a natural way to express the multi-dimensionality of the enterprise and is more suited for analysis. Precomputed aggregate calculations in a data cube can provide efficient query processing for OLAP applications. In this paper, we present algorithms and results for in-memory data cube construction on distributed-memory machines.
Based on a geometric method, this paper presents an efficient and authenticated cryptoscheme for establishing secure group-oriented data communications in Internet environments. We assume that the Internet environment...
详细信息
ISBN:
(纸本)0818682272
Based on a geometric method, this paper presents an efficient and authenticated cryptoscheme for establishing secure group-oriented data communications in Internet environments. We assume that the Internet environments consist of many hosts, and each host has many users attached to it. The secure group-oriented communication scheme proposed in this paper incorporates the public-key distribution and the trigonometry concepts as the basic theory. Since this scheme does not need any trusted key distribution center to distribute the common secret session key between two parties and can reduce the computation time needed for securely sending messages to a group of receivers by using multiplication operations instead of modular exponentiation, it is quite suitable to be used in Internet environments so that the key distribution is convenient, time-saving and reparable. Furthermore, an authentication protocol is also proposed. Such a protocol can not only identify both the sender and the receiver of a group correctly, but can also make sure the transmitted message has reached its destination safely.
We have been investigating the efficiency of Genetic Algorithms (GA) for solving for a variety of real problems. During our investigations we have concluded that the large amount computational time required to find GA...
详细信息
We have been investigating the efficiency of Genetic Algorithms (GA) for solving for a variety of real problems. During our investigations we have concluded that the large amount computational time required to find GA based solutions on conventional computers is restrictive. We are therefore developing an innovative new computer architecture, suitable for the solution of large scale problems using GAs. In this paper we introduce the SIMD-GA (Single Instruction stream Multiple Data stream Genetic Algorithm), and discuss its hardware design and implementation. By taking advantage of the recent advances is HDLs (Hardware Description Language) and FPGA's (Field Programmable Gate Array) we have been able to quickly develop and prototype a PE (Processing Element) for a SIMD-GA. This approach allows us to build a cost-effective parallel processing architecture to overcome the problem of the computational time required for traditional sequential GA implementation.
暂无评论