Within the last several years, a number of parallel algorithms for the join operation have been proposed. However, almost all of these algorithms do not take advantage of the underlying parallel storage structures or ...
详细信息
Within the last several years, a number of parallel algorithms for the join operation have been proposed. However, almost all of these algorithms do not take advantage of the underlying parallel storage structures or data declustering methods of the operand relations. The paper introduces the concept of parallel storage structure or declustering aware parallel join algorithm. Two classes of parallel join algorithms, which take advantage of the underlying parallel B/sup +/-tree index, are proposed and analyzed. One class is based on the range-partition strategy. The other is based on the hash-partition strategy. The parallel execution times of the algorithms are linearly proportional to max{N/P, M/P}, where N and M are the numbers of tuples of the operand relations and P is the number of processing nodes. The proposed parallel join algorithms are compared with well known parallel join algorithms in practice. Theoretical and experimental results show that the proposed algorithms are more efficient than others in case of at least one operand relation having a parallel B/sup +/-tree index on the join attributes.
How to discover high-level knowledge modeled by complicated functions, ordinary differential equations and difference equations in databases automatically is a very important and difficult task in KDD research. In thi...
详细信息
How to discover high-level knowledge modeled by complicated functions, ordinary differential equations and difference equations in databases automatically is a very important and difficult task in KDD research. In this paper, high-level knowledge modeled by ordinary differential equations (ODEs) is discovered in dynamic data automatically by an asynchronous parallel evolutionary modeling algorithm (APHEMA). A numerical example is used to demonstrate the potential of APEA. The results show that the dynamic models discovered automatically in dynamic data by computer sometimes can compare with the models discovered by human.
Existing position-based unicast routing algorithms, which forward packets in the geographic direction of the destination, require that the forwarding node knows the positions of all neighbors in its transmission range...
详细信息
Poset belief propagation, or PBP, is a flexible generalization of ordinary belief propagation which can be used to (approximately) solve many probabilistic inference problems. In this paper, we summarize some experime...
详细信息
Poset belief propagation, or PBP, is a flexible generalization of ordinary belief propagation which can be used to (approximately) solve many probabilistic inference problems. In this paper, we summarize some experimental results comparing the performance of PBP to conventional BP techniques.
Although algorithms for the reconfigurable mesh (R-Mesh) are fast, they are very difficult to implement because most algorithms employ buses that span Ω(N) processors. On such buses, the constant bus-delay assumption...
详细信息
ISBN:
(纸本)1892512416
Although algorithms for the reconfigurable mesh (R-Mesh) are fast, they are very difficult to implement because most algorithms employ buses that span Ω(N) processors. On such buses, the constant bus-delay assumption, that is central to all R-Mesh algorithms, does not hold. In this paper, we consider a powerful restriction of the R-Mesh, called LR-Mesh, and show that a large class of fundamental LR-Mesh algorithms can be efficiently implemented using limited delay buses. We introduce a new measure of bus delay, called "bends-cost", and describe an LR-Mesh implementation for which bends-cost is a faithful measure of the actual bus delay. Next, we show that an important class of LR-Mesh algorithms (that includes algorithms for prefix sums, multiple addition, and sorting) can be implemented efficiently using buses whose delay is at most D, a parameter of the design. In particular, if the technology used can support a delay of D = NΕ for an arbitrarly small constant Ε > 0, then the running times of these algorithms are within a constant of their idealized LR-Mesh counterparts.
In this work, we study the distributed memory architecture, the parallelization of Hansen's algorithm which is an interval Branch-and-Bound algorithm for solving the continuous global optimization problem with ine...
详细信息
In this work, we study the distributed memory architecture, the parallelization of Hansen's algorithm which is an interval Branch-and-Bound algorithm for solving the continuous global optimization problem with inequality constraints. Since this algorithm is dynamic and irregular, we propose, in particular, some parallel algorithms dealing with balancing the load with respect to the quantity and the quality of boxes. Our proposed techniques are based on the criterion of the "best-first strategy" and also on the cyclic redistribution of the boxes. The numerical simulations are performed using the PROFIL/BIAS libraries 1,2 for computation and "MPI/C++" environment for communication.
Given two strings X and Y of lengths m and n, respectively, the all-substrings longest common subsequence (ALCS) problem obtains the lengths of the subsequences common to X and any substring of Y. The sequential algor...
详细信息
A codesign is the simultaneous design of hardware and software subsystems. In our codesign, we exploit the highly parallel nature of matrix multiplication which cannot be exploited in our purely software implementatio...
详细信息
ISBN:
(纸本)1892512416
A codesign is the simultaneous design of hardware and software subsystems. In our codesign, we exploit the highly parallel nature of matrix multiplication which cannot be exploited in our purely software implementation. The hardware part of our codesign system is responsible for performing the arithmetic operations. This includes the matrix multiplier, which performs concurrent multiplication and addition operations of matrix multiplication. Our matrix multiplier is modeled in VHDL and runs on an ARC-PCI FPGA board. The purpose of the software part of our codesign system is to provide I/O to the hardware. This part is implemented on a PC with a C program and a device driver to communicate with the board. We present the performance comparison of our codesign and purely software implementation, as well as the performance comparison of existing parallel implementations. Examples of applications that require large, fast matrix multiplication are bipartite graph determination (non-existence of odd cycles), Economics (Leontief input-output model), power-invariant transformations (power systems), Cryptography, and genetics modeling (Markov chains).
Deadlock prevention for routing messages has a central role in communication networks, since it directly influences the correctness of parallel and distributed systems. In this paper, we extend some of the computation...
详细信息
A self-organized approach to manage a distributed proxy system called Adaptive Distributed Caching (ADC) has been proposed previously [8]. We model each proxy as an autonomous agent that is equipped to decide how to d...
详细信息
暂无评论