there have been many parallel external sorting algorithms reported such as NOW-Sort, SPsort, and hill sort, etc. they are for sorting large-scale data stored in the disk, but they differ in the speed, throughput, and ...
详细信息
ISBN:
(纸本)3540203591
there have been many parallel external sorting algorithms reported such as NOW-Sort, SPsort, and hill sort, etc. they are for sorting large-scale data stored in the disk, but they differ in the speed, throughput, and cost-effectiveness. Mostly they deal with data that are uniformly distributed in their value range. Few research results have been yet reported for parallel external sort for data with arbitrary distribution. In this paper, we present two distribution-insensitive parallel external sorting algorithms that use sampling technique and histogram counts to achieve even distribution of data among processors, which eventually contribute to achieve superb performance. Experimental results on a cluster of Linux workstations show up to 63% reduction in the execution time compared to previous NOW-sort.
the revolution in computing brought about by the Internet is changing the nature of computing from a personalized computing environment to a ubiquitous computing environment in which both data and computational resour...
详细信息
ISBN:
(纸本)9729881618
the revolution in computing brought about by the Internet is changing the nature of computing from a personalized computing environment to a ubiquitous computing environment in which both data and computational resources are network-distributed. Client-server communications protocols permit parallel ad hoc queries of frequently-updated databases, but they do not provide the functionality to automatically perform continual queries to track changes in those data sources through time. the lack of persistence of the state of data resources requires users to repeatedly query databases and manually compare the results of searches through time. To date, continual query systems have lacked both external and internal scalability. Herein we describe CQServer, a scalable, platform- and implementation-independent system that uses a distributed object infrastructure for heterogeneous enterprise computation of both content- and time-based continual queries.
the decomposition issue of tasks for VLSI simulation oil distributed memory, multi computers is discussed in this paper. Mathematical and physical analyses are given for exploiting the parallelisms of these operations...
详细信息
ISBN:
(纸本)078037889X
the decomposition issue of tasks for VLSI simulation oil distributed memory, multi computers is discussed in this paper. Mathematical and physical analyses are given for exploiting the parallelisms of these operations. An efficient decomposition algorithm is proposed. Using this algorithm, we can decompose a large-scale circuit into N sub-circuits of similar size while keeping the interconnect set of nodes to a minimum, which is beneficial to dynamic load distribution and balance later. this algorithm can be implemented in a parallel environment processing. Some experimental results of this decomposition algorithm are presented. Finally, the conclusion and future work are included.
Based on Luo's parallel algorithm [4] for certain Toeplitz cyclic tridiagonal systems on distributed-memory multicomputer, we present an improved algorithm. Its communication mechanism is simple and redundant comp...
详细信息
ISBN:
(纸本)3540200541
Based on Luo's parallel algorithm [4] for certain Toeplitz cyclic tridiagonal systems on distributed-memory multicomputer, we present an improved algorithm. Its communication mechanism is simple and redundant computing is small for solving massively systems. the numerical experiments show that the parallel efficiency of the improved algorithm is higher than Luo's algorithm [4].
Using the unneeded computation power in the internet for distributedcomputing is getting more and more eligible. To increase the willingness to provide unneeded computing power, a secure platform is needed for the ex...
详细信息
ISBN:
(纸本)3540200541
Using the unneeded computation power in the internet for distributedcomputing is getting more and more eligible. To increase the willingness to provide unneeded computing power, a secure platform is needed for the execution of untrusted code. We present the architecture of the JX operating system, which can be used to safely execute untrusted code. the problem of erroneous agents crashing the system is solved by using Java - a typesafe language - as implementation language. the resource consumption of the agents is controlled by a security manager, that inspects every interaction between an agent and a system service. If the security policy does not approve the use of a system service, the access can be denied. An agent execution system build upon JX is presented to illustrate the security problems occurring and the solutions provided by the operating system JX.
the paper presents the design and development of an online remote trace measurement and analysis system. the work combines the strengths of the TAU performance system withthat of the VNG distributedparallel trace an...
详细信息
this work addresses the pattern of turbulent kinetic energy generated by distortion and the effect of external disturbances on boundary layer transition. this is investigated with direct numerical simulation of grid t...
详细信息
this work addresses the pattern of turbulent kinetic energy generated by distortion and the effect of external disturbances on boundary layer transition. this is investigated with direct numerical simulation of grid turbulence convected through a linear turbine blade cascade. Comparisons are made with results from earlier computations of flow through the same cascade with a turbulence free inflow and an inflow with migrating wakes. the distribution of turbulence in the passage strongly depends on the mean flow field and can partly be explained by the travel time needed for the inlet turbulence to drift to a certain location. this results in a local amplification of turbulence near the leading edge stagnation region and in the passage on the pressure side near the trailing edge. the penetration of disturbances into the blade boundary layers induces bypass transition. In particular, the transition pattern on the suction side of the blade differs significantly for the three types of inflow. (C) 2003 Elsevier Science Inc. All rights reserved.
this work presents a simple but effective approach for two representative linear algebra operations to be solved in parallel on Ethernet- based clusters: matrix multiplication and LU matrix factorization. the main obj...
详细信息
In this paper, we propose a distributed algorithm for solving a resource location problem in distributed systems. the proposed algorithm is fully distributed in the sense that it assumes no centralized control, and ha...
详细信息
ISBN:
(纸本)3540203591
In this paper, we propose a distributed algorithm for solving a resource location problem in distributed systems. the proposed algorithm is fully distributed in the sense that it assumes no centralized control, and has a remarkable property such that it can always find a target node satisfying a certain property, if any. the result of simulations implies that: (1) the performance of the underlying load sharing scheme can be significantly improved by increasing the preciseness of a node location, and (2) in the proposed scheme, the average number of inquiries per location is bounded by a very small value (e.g., only two inquiries are enough even when the underlying system consists of 100 nodes).
Solving systems of linear equations is central in scientific computation. In this paper, we focus on using Intel's Pentium Streaming SIMD Extensions (SSE) for parallel implementation of LU-decomposition algorithm....
详细信息
ISBN:
(纸本)3540203591
Solving systems of linear equations is central in scientific computation. In this paper, we focus on using Intel's Pentium Streaming SIMD Extensions (SSE) for parallel implementation of LU-decomposition algorithm. Two implementations (non-SSE and SSE) of LU-decomposition are compared. Moreover, two different variants of the algorithm for the SSE version are also compared. Our results demonstrate an average performance of 2.25 times faster than the non-SSE version. this speedup is higher than 1.74 times the speedup of Intel's SSE implementation. the source of the speedup is highly reusing of loaded data by efficiently organizing SSE instructions.
暂无评论