In this paper we present new results on sequential and parallel construction of optimal and almost-optimal length-restricted prefix-free codes. We show that lengthrestricted prefix-free codes with error l/n(k) for any...
详细信息
In this paper we present new results on sequential and parallel construction of optimal and almost-optimal length-restricted prefix-free codes. We show that lengthrestricted prefix-free codes with error l/n(k) for any k > 0 can be constructed in O(n log n) time, or in O(logn) time with n CREW processors. A length-restricted code with error l/n(k) for any k <= L/log(Phi)n, where Phi = (1 + root 5)/2, can be constructed in O(logn) time with n/log n CREW processors. We also describe an algorithm for the construction of optimal length-restricted codes with maximum codeword length L that works in O(L) time with n CREW processors.
作者:
THOMPSON, CDDivision of Computer Science
University of California Abstract Authors References Cited By Keywords Metrics Similar Download Citation Email Print Request Permissions
This paper surveys nine designs for VLSI circuits that compute N-element Fourier transforms. The largest of the designs requires O(N2 log N) units of silicon area; it can start a new Fourier transform every O(log N) t...
详细信息
This paper surveys nine designs for VLSI circuits that compute N-element Fourier transforms. The largest of the designs requires O(N2 log N) units of silicon area; it can start a new Fourier transform every O(log N) time units. The smallest designs have about 1/Nth of this throughput, but they require only 1/Nth as much area.
This paper presents several deterministic algorithms for selecting the kth largest record from a set of n records on any n-node hypercubic network. All of the algorithms are based on the selection algorithm of Cole an...
详细信息
This paper presents several deterministic algorithms for selecting the kth largest record from a set of n records on any n-node hypercubic network. All of the algorithms are based on the selection algorithm of Cole and Yap, as well as on various sorting algorithms for hypercubic networks. Our fastest algorithm runs in O(lg n lg* n) time, very nearly matching the trivial n(lg n) lower bound. Previously, the best upper bound known for selection was O(lgn lg lg n). A key subroutine in our O(lgn lg* n) time selection algorithm is a sparse version of the Sharesort algorithm that sorts n records using p processors, p greater than or equal to n, in O(lg n(lg lg p lg lg(p/n))(2)) time.
My research focuses on methods to analyze and mine large datasets as well as their practical realizations and applications. The key question of interest to me is: How can we effectively and efficiently distill useful ...
详细信息
My research focuses on methods to analyze and mine large datasets as well as their practical realizations and applications. The key question of interest to me is: How can we effectively and efficiently distill useful information from large, complex, and potentially noisy datasets? To approach this question, we are developing systems for scalable data analysis and data mining, for working with incomplete and noisy data, for data-intensive optimization, as well as for extracting structured information from natural-language text. This article highlights some of my work in these areas.
In this work we present a parallel algorithm for the Concurrent Atomistic Continuum (CAC) formulation that can be integrated into existing molecular dynamics codes. The CAC methodology is briefly introduced and its pa...
详细信息
In this work we present a parallel algorithm for the Concurrent Atomistic Continuum (CAC) formulation that can be integrated into existing molecular dynamics codes. The CAC methodology is briefly introduced and its parallel implementation in LAMMPS is detailed and then demonstrated through benchmarks that compare CAC simulation results with corresponding all-MD (molecular dynamics) results. The parallel efficiency of the algorithm is demonstrated when simulating systems represented by both atoms and finite elements. The verification benchmarks include dynamic crack propagation and branching in a Si single crystal, wave propagation and scattering in a Si phononic crystal, and phonon transport through the phase interface in a PbTe/PbSe heteroepitaxial system. In each of these benchmarks the CAC algorithm is shown to be in good agreement with MD-only models. This parallel CAC algorithm thus offers one of the first scalable multiscale material simulation methodologies that relies solely on atomic-interaction models. (c) 2022 Published by Elsevier Inc.
This article points out that our parallel algorithm provides the maximum planar subgraph and it is compared with the maximal planar subgraph provided by Jayakumar et al. in the above paper. The space-time product comp...
详细信息
This article points out that our parallel algorithm provides the maximum planar subgraph and it is compared with the maximal planar subgraph provided by Jayakumar et al. in the above paper. The space-time product complexity is also compared.
We discuss parallel algorithms to compute the ghost layer in computational, distributed memory, recursively adapted meshes. Its creation is a fundamental, necessary task in executing most parallel, element-based compu...
详细信息
We discuss parallel algorithms to compute the ghost layer in computational, distributed memory, recursively adapted meshes. Its creation is a fundamental, necessary task in executing most parallel, element-based computer simulations. Common methods differ in that the ghost layer may either be inherently part of the mesh data structure that is maintained and modified, or kept separate and constructed/deleted as needed. In this work, we present a design following the latter approach, which we chose for its modularity of algorithms and data structures. We target arbitrary adaptive, nonconforming forest-of-trees meshes of mixed element shapes, such as cubes, prisms, and tetrahedra, and restrict ourselves to ghost elements across mesh faces. Our algorithm has low code complexity and redundancy since we reduce it to generic co dimension-1 subalgorithms that can be flexibly combined. We recover older algorithms for cubic elements as special cases and optimize further using recursive, amortized tree searches and traversals.
Background: Alignment-free methods are a popular approach for comparing biological sequences, including complete genomes. The methods range from probability distributions of sequence composition to first and higher-or...
详细信息
Background: Alignment-free methods are a popular approach for comparing biological sequences, including complete genomes. The methods range from probability distributions of sequence composition to first and higher-order Markov chains, where a k-th order Markov chain over DNA has 4k formal parameters. To circumvent this exponential growth in parameters, variable-length Markov chains (VLMCs) have gained popularity for applications in molecular biology and other areas. VLMCs adapt the depth depending on sequence context and thus curtail excesses in the number of parameters. The scarcity of available fast, or even parallel software tools, prompted the development of a parallel implementation using lazy suffix trees and a hash-based alternative. Results: An extensive evaluation was performed on genomes ranging from 12Mbp to 22Gbp. Relevant learning parameters were chosen guided by the Bayesian Information Criterion (BIC) to avoid over-fitting. Our implementation greatly improves upon the state-of-the-art even in serial execution. It exhibits very good parallel scaling with speed-ups for long sequences close to the optimum indicated by Amdahl's law of 3 for 4 threads and about 6 for 16 threads, respectively. Conclusions: Our parallel implementation released as open-source under the GPLv3 license provides a practically useful alternative to the state-of-the-art which allows the construction of VLMCs even for very large genomes significantly faster than previously possible. Additionally, our parameter selection based on BIC gives guidance to endusers comparing genomes.
The fastest known NC algorithm for maximal matching is a simple reduction to Luby's NC algorithm for maximal independent sets. It runs in O(log(2) n) time using m(3) Delta processors on an EREW PRAM. In this paper...
详细信息
The fastest known NC algorithm for maximal matching is a simple reduction to Luby's NC algorithm for maximal independent sets. It runs in O(log(2) n) time using m(3) Delta processors on an EREW PRAM. In this paper, we present an algorithm that finds a maximal matching of a given n-vertex m-edge graph with maximum degree Delta in O((Delta(2)/p). log(2) n) time using p .(m + n)/log n processors on an EREW PRAM for any 1 less than or equal to p less than or equal to Delta(2). In particular for p = Delta(2), our algorithm has the same running time as the algorithm of Luby but uses a much smaller number of processors (Delta(2) .(m + n)/log n instead of m(3) Delta).
This paper considers the problem of scheduling dynamic parallel computations to achieve linear speedup without using significantly more space per processor than that required for a single-processor execution. Utilizin...
详细信息
This paper considers the problem of scheduling dynamic parallel computations to achieve linear speedup without using significantly more space per processor than that required for a single-processor execution. Utilizing a new graph-theoretic model of multithreaded computation, execution efficiency is quantified by three important measures: T-1 is the time required for executing the computation on a 1 processor, T-infinity is the time required by an infinite number of processors, and S-1 is the space required to execute the computation on a 1 processor. A computation executed on P processors is time-efficient if the time is O(T-1/P + T-infinity), that is, it achieves linear speedup when P = O(T-1/T-infinity), and it is space-efficient if it uses O(S1P) total space, that is, the space per processor is within a constant factor of that required for a 1-processor execution. The first result derived from this model shows that there exist multithreaded computations such that no execution schedule can simultaneously achieve efficient time and efficient space. But by restricting attention to "strict" computations-those in which all arguments to a procedure must be available before the procedure can be invoked-much more positive results are obtainable. Specifically, for any strict multithreaded computation, a simple online algorithm can compute a schedule that is both time-efficient and space-efficient. Unfortunately, because the algorithm uses a global queue, the overhead of computing the schedule can be substantial. This problem is overcome by a decentralized algorithm that can compute and execute a P-processor schedule online in expected time O(T-1/P + T-infinity lg P) and worst-case space O(S1P lg P), including overhead costs.
暂无评论