In this paper we present a knowledge base for the generation and optimization of query execution plans for parallel database systems. This knowledge base builds the basis of a novel extended blackboard architecture fo...
详细信息
In this paper we present a knowledge base for the generation and optimization of query execution plans for parallel database systems. This knowledge base builds the basis of a novel extended blackboard architecture for the optimization of parallel query execution plans, which is based on a bottom-up building block approach. It allows to find (near) optimal execution plans out of a possibly huge search space in reasonable time.
The extraction of functional dependencies is a fundamental activity in the database design recovery process. Existing algorithms for this task are computationally expensive and appear to be impractical if applied to l...
详细信息
The extraction of functional dependencies is a fundamental activity in the database design recovery process. Existing algorithms for this task are computationally expensive and appear to be impractical if applied to large legacy database instances, e.g., their performance deteriorates when number of attributes or/and instances is large. This paper presents strategies for parallelising the functional dependencies discovery process. We propose three parallel discovery models which are based on horizontal, vertical, and matrix database table slicing techniques. We exploit both program parallelism and data parallelism in our implementations. The results are discovery approaches that are more applicable to large real world databases.
We study efficient parallel solutions to the problem of selecting r elements at specified ranks from a set of n arbitrary elements, known as multiselection, on a hypercube with p processors, p,r/spl les/n. We propose ...
详细信息
ISBN:
(纸本)0818682596
We study efficient parallel solutions to the problem of selecting r elements at specified ranks from a set of n arbitrary elements, known as multiselection, on a hypercube with p processors, p,r/spl les/n. We propose two parallel algorithms based on different approaches, where one requires processors to operate in the SIMD mode, and the other in the MIMD mode. Our SIMD algorithm runs in time O((log n log log n) min{r, log n}) when p=/spl Theta/(n), and O(n/sup /spl epsiv// min{r, (1-/spl epsiv/) log n}) when p=n/sup /spl epsiv// for any 0
作者:
V.B. FyodorovRAS
Institute for High Performance Computer Systems Moscow Russia
Three types of optoelectronic architectures for N/spl times/N high-performance switching fabrics with multibit word parallel data transmission through connected pairs of free-space optical channels are considered. The...
详细信息
Three types of optoelectronic architectures for N/spl times/N high-performance switching fabrics with multibit word parallel data transmission through connected pairs of free-space optical channels are considered. The fabrics differ in their functional capability to realize selfrouting strictly nonblocking conflict-free networking under arbitrary call requests. The possibility of creating such networks from laser and photodetector arrays, smart pixel structures, free-space optics, lenslets or selfoc lens arrays, and electronic control circuits is discussed.
A method that generates a static network for a dedicated parallel computer from an application program is proposed. The article describes the heuristic code scheduling algorithm that becomes necessary for the generati...
详细信息
A method that generates a static network for a dedicated parallel computer from an application program is proposed. The article describes the heuristic code scheduling algorithm that becomes necessary for the generation of the network. Furthermore, it describes the method of dependency analysis that becomes necessary for the insertion of the data transfer instruction and network generation. A network generation system was developed by this research based on the above algorithm. A parallel computer simulator was developed, and the performance of the generated network and the parallelized program was estimated. It achieved 87.0-94.5% parallel processing efficiency with 32 processors when an experiment was made by using some sample programs. The validity of the generated network and the parallelized program is shown with these results.
The architectural performance gain of a microprocessor is going to saturate because of the small gain of instruction level parallelism. In this paper, we discuss the design points and some tentative solutions to overc...
详细信息
The architectural performance gain of a microprocessor is going to saturate because of the small gain of instruction level parallelism. In this paper, we discuss the design points and some tentative solutions to overcome this bottleneck and propose a processor architecture called Very Large Data Path. This architecture broadens the window of instruction analysis to extract 10 times of parallel gain compared with the conventional superscaler processors. This paper discusses the system elements and shows some preliminary evaluation results.
The DSM system we propose in this paper is implemented completely at the operating system level as a component of RHODOS' Memory (Space) Manager. In addition, it is integrated with RHODOS' existing invalidatio...
详细信息
The DSM system we propose in this paper is implemented completely at the operating system level as a component of RHODOS' Memory (Space) Manager. In addition, it is integrated with RHODOS' existing invalidation-based DSM allowing the programmers to choose the consistency protocol best suited to their application. These factors enable RHODOS DSM to provide the user with a transparent, efficient and scalable shared memory programming environment. In this paper, we describe the logical design, implementation and performance study of an update based DSM which strictly adheres to the above criteria. These criteria allow the user to program using a familiar model while taking advantage of the greater scalability of COWs.
A c-vertex-ranking of a graph G for a positive integer c is a labeling of the vertices of G with integers such that, for any label i, deletion of all vertices with labels >i leaves connected components, each having...
详细信息
A c-vertex-ranking of a graph G for a positive integer c is a labeling of the vertices of G with integers such that, for any label i, deletion of all vertices with labels >i leaves connected components, each having at most c vertices with label i. We present a parallel algorithm to find a c-vertex-ranking of a partial k-tree using the minimum number of ranks. This is the first parallel algorithm for c-vertex-ranking of a partial k-tree G, and takes O(log n) time using a polynomial number of processors on the common CRCW PRAM for any positive integer c and any fixed integer k, where n is the number of vertices in G.
Intermodule bandwidth is one of the major constraints on the performance of current and future parallel systems. We propose and evaluate several high performance bus based parallelarchitectures, including bus based c...
详细信息
Intermodule bandwidth is one of the major constraints on the performance of current and future parallel systems. We propose and evaluate several high performance bus based parallelarchitectures, including bus based cyclic networks (BCNs) and quotient cyclic networks (BQCNs), which are particularly efficient in view of their respective intermodule communication patterns. The intercluster connection in a BCN is defined on a set of nodes whose addresses are cyclic shifts of one another. The node degree of a basic BCN is 3, while those of BQCNs and enhanced BCNs can vary from a small constant (e.g., 2) to as large as required, thus providing flexibility and effective tradeoff between cost and performance. A variety of algorithms can be performed efficiently on these networks, thus providing the versatility of BCNs and BQCNs.
The main contribution of this work is to fathom the power and flexibility of the Mesh with Hybrid Buses via simulation. We propose two algorithms that perform an O(1) time stepwise simulation of an N-processor dynamic...
详细信息
The main contribution of this work is to fathom the power and flexibility of the Mesh with Hybrid Buses via simulation. We propose two algorithms that perform an O(1) time stepwise simulation of an N-processor dynamic Priority CRCW-PRAM endowed with M memory cells. Our first algorithm uses a Mesh with Hybrid Buses of size max{N, MN/sup /spl epsiv//2/}/spl times/MN/sup /spl epsiv//2/ for some fixed constant /spl epsiv/, 0
暂无评论