We have selected and presented aspects of probability theory and have emphasized their applications to computer system design and analysis, Some general references for further reading on probability theory are given i...
详细信息
We have selected and presented aspects of probability theory and have emphasized their applications to computer system design and analysis, Some general references for further reading on probability theory are given in the brief bibliographic section that follows.
The running time of Shellsort, with the number of passes restricted to O (log N ), was thought for some time to be Θ(N 3 2 ) , due to general results of Pratt. Sedgewick recently gave an O(N 4 3 ) bound, but extensio...
详细信息
The running time of Shellsort, with the number of passes restricted to O (log N ), was thought for some time to be Θ(N 3 2 ) , due to general results of Pratt. Sedgewick recently gave an O(N 4 3 ) bound, but extensions of his method to provide better bounds seem to require new results on a classical problem in number theory. In this paper, we use a different approach to achieve O(N 1 + ε √1 g N ) for any ε0.
The Trie Hashing (TH), defined by Litwin, is one of the fastest access methods for dynamic and ordered files. The hashing function is defined in terms of a trie, which is basically a binary tree where a character stri...
详细信息
The Trie Hashing (TH), defined by Litwin, is one of the fastest access methods for dynamic and ordered files. The hashing function is defined in terms of a trie, which is basically a binary tree where a character string is associated implicitly with each node. This string is compared with a prefix of the given key in the search process, and depending on the result either the left or the right child is chosen as the next node to visit. The leaf nodes point to buckets which contain the records. The buckets are on a disk, whereas the trie itself is in the core memory. In this paper we consider concurrent execution of the TH operations. In addition to the usual search, insertion and deletion operations, we also include range queries among the concurrent operations. Our algorithm locks only leaf nodes and at most two nodes need to be locked simultaneously by any operation regardless of the number of buckets being accessed. The modification required in the basic data structure in order to accommodate concurrent operations is very minor.
The performance of two basic external sorting algorithms, distributive sorting and mergesort, is compared in an environment where even the main memory usage involves a cost. Performance is measured by total execution ...
详细信息
The performance of two basic external sorting algorithms, distributive sorting and mergesort, is compared in an environment where even the main memory usage involves a cost. Performance is measured by total execution time and main memory space-time integral. For optimal behavior, both algorithms prefer a small block size and a similar order of external sort. Their memory requirement is similar during the external phase, but during the internal phase distributive sorting requires a larger working space. For small records, the optimal behavior of distributive sorting is obtained with less external passes and its space-time integral is smaller. For large records, the number of passes at the optimal point of mergesort is similar or even less than that of distributive sorting, resulting in a smaller space-time integral. In all cases, the performance of distributive sorting degrades more mildly around the local optima.
We consider a recursive sorting algorithm in which, in each invocation, a new variable and a new procedure (using the variable globally) are defined and the procedure is passed to recursive calls. This algorithm is pr...
详细信息
We consider a recursive sorting algorithm in which, in each invocation, a new variable and a new procedure (using the variable globally) are defined and the procedure is passed to recursive calls. This algorithm is proved correct with Hoare-style pre- and postassertions. We also discuss the same algorithm expressed as a functional program.
Simulation quality is only as good as the quality of the models you'resimulating. Knowing the types of component and board-interconnect models available from chip- andboard-level-simulator vendors can help you mak...
详细信息
Simulation quality is only as good as the quality of the models you'resimulating. Knowing the types of component and board-interconnect models available from chip- andboard-level-simulator vendors can help you make your design work the firsttime around.
This paper discusses the relationship between parallelism granularity and system overhead of dataflow computersystems,and indicates that a trade-off between them should be determined to obtain optimal efficiency of t...
详细信息
This paper discusses the relationship between parallelism granularity and system overhead of dataflow computersystems,and indicates that a trade-off between them should be determined to obtain optimal efficiency of the overall *** the basis of this discussion,a macro-dataflow computational model is established to exploit the task-level *** as a macro-dataflow computer,an Ex- perimental Distributed Dataflow Simulation System (EDDSS) is developed to examine the effectiveness of the macro-dataflow computational model.
External sorting is usually accomplished by first creating sorted runs, then merging the runs. In the merge phase, writing and calculating can be overlapped by reading if two input buffers are used for each sorted run...
详细信息
External sorting is usually accomplished by first creating sorted runs, then merging the runs. In the merge phase, writing and calculating can be overlapped by reading if two input buffers are used for each sorted run. If the memory is very large, the input buffers will be large and using two input buffers per sorted run will be more efficient than using only one input buffer per run and risking reduced overlap of reading and writing. In many cases, merging time can be cut in half. We derive a formula for estimating the total time for merging for a given memory size, file size, number of merging passes and for a given disk drive. We present an extreme example where in spite of having two buffers per run, significant non-overlap occurs. However, in realistic problems, we show that making one merge pass with two input buffers per run is near optimal. This contradicts earlier results on merging which do not take large memory into account.
A software system is described for the compression of a large look-up table to a smaller one, consistent with a worst-case error predefined by the user. The tables and a suitable source code for accessing them are aut...
详细信息
A software system is described for the compression of a large look-up table to a smaller one, consistent with a worst-case error predefined by the user. The tables and a suitable source code for accessing them are automatically generated, with very little user intervention. The techniques of linear interpolation and the partitioning of one table into several are shown to be particularly attractive for reducing the table size, especially when the considerable effort of manual generation to a known accuracy is removed. The use of linear interpolation incurs only a small speed penalty when executed on a digital signal processor and the large reductions in table size thus achieved can make the method a faster and more reliable alternative to either the exact or approximate evaluation of many functions.
For each value to be sorted in the process of the parallel bubble sort computation, we evaluate the exact time necessary to route the value to its final position. Using this evaluation we design some efficient paralle...
详细信息
For each value to be sorted in the process of the parallel bubble sort computation, we evaluate the exact time necessary to route the value to its final position. Using this evaluation we design some efficient parallel sorting algorithms that can be implemented on a mesh-connected processor array and analyze their time complexities. Our algorithms are some combinations of the parallel bubble sorts in different directions, and their control hardware is very simple. Although the time complexities of our algorithms are O(N 1 2 log N) , they are as fast as the implementations of Batcher's bitonic sort and odd-even merge sort on the mesh-connected processor array for practical values of N , 1 ≤ N ≤ 128 2 . We also show a parallel sort that is very fast in the average case for practical values of N , 1 ≤ N ≤ 128 2 .
暂无评论