We present parallel algorithms for computation of time-slot assignments in time-division multiplex (TDM) switching systems. The algorithms apply to a general class of TDM switching systems called hierarchical switchin...
详细信息
We present parallel algorithms for computation of time-slot assignments in time-division multiplex (TDM) switching systems. The algorithms apply to a general class of TDM switching systems called hierarchical switching systems (HSS), which have a three-stage switching structure. The algorithms are based on modeling the time-slot assignment problem as a network-flow problem. Previous algorithms for finding an optimal time-slot assignment in these switching systems are inherently sequential and no parallel algorithms are known for this problem. If M is the number of users of the switching system, N is the switch-size, and L is the length of an optimal time-slot assignment, the best-known sequential TSA algorithm runs in O(M(2) . min(N, root M) . min(L, M(2))) time. We first describe an algorithm using L/2 processors with running time O(M(3) log L) on a PRAM model of computation. We then generalize it to P less than or equal to L/2 processors, with running time O(M(3) log P + M(2) . min(N, root M) . min(L/P, M(2))). An efficient implementation of the algorithm on a hypercube multiprocessor with P processors has the same time-complexity. A massively parallel version of the algorithm runs in O(M(2) log M log L) time on M L/2 processors. Finally, we discuss how the above algorithms can be applied to the class of SS/TDMA switching systems.
parallel algorithms for planar graph isomorphism and several related problems are presented. Two models of parallel computation are considered: the CREW-PRAM model and the two-dimensional array of processors. The resu...
详细信息
parallel algorithms for planar graph isomorphism and several related problems are presented. Two models of parallel computation are considered: the CREW-PRAM model and the two-dimensional array of processors. The results include O( square root n)-time mesh algorithms for finding a good separating cycle and the triconnected components of a planar graph, and for solving the single-function coarsest partitioning problem.< >
Perceptual grouping is a key intermediate-level vision problem. parallel solutions to this problem are characterized by uneven distribution of symbolic features among the processors, unbalanced workload, and irregular...
详细信息
Perceptual grouping is a key intermediate-level vision problem. parallel solutions to this problem are characterized by uneven distribution of symbolic features among the processors, unbalanced workload, and irregular interprocessor data dependency caused by the input image. In this paper, we propose two load-balancing techniques for parallelizing perceptual grouping on distributed-memory machines. By using an initial workload estimate, we first partition the computations to distribute the workload across the processors. In addition, we asynchronously perform ongoing task migrations to adapt to the unbalanced workload which may evolve differently from the initial estimate. We also discuss two strategies to manage the irregular interprocessor data dependency. To illustrate our ideas, perceptual grouping steps used in an integrated vision system for building detection are used as examples. Our experimental results show that, given 8K extracted line se,aments from a 1K x 1K image, both the line and junction grouping steps can be completed in 0.644 s on a 32-node SP2 and in 0.585 s on a 32-node T3D. For the same grouping steps, a serial implementation requires 10.550 s and 10.023 s on a single node of SP2 and T3D, respectively. The implementations were performed using the message passing interface standard and are portable to other high performance computing platforms. (C) 1998 Academic Press.
This paper presents parallel algorithms for priority queue operations on a p-processor EREW-PRAM. The algorithms are based on a new data structure, the Min-path Heap (MH), which is obtained as an extension of the trad...
详细信息
This paper presents parallel algorithms for priority queue operations on a p-processor EREW-PRAM. The algorithms are based on a new data structure, the Min-path Heap (MH), which is obtained as an extension of the traditional binary-heap organization. Using an MH, it is shown that insertion of a new item or deletion of the smallest item from a priority queue of n elements can be performed in O(logn/p + log logn) parallel time, while construction of an MH from a set of n items takes O(n/p + logn) time. The given algorithms for insertion and deletion achieve the best possible running time for any number of processors p, with p is an element of O(logn/(log logn)), while the MH construction algorithm employs up to theta(n/logn) processors optimally. The paper ends with a brief discussion of the applicability of MH's to the development of efficient parallel algorithms for some important combinatorial problems.
A distributed rule-based system for automatic speech recognition is described. Acoustic property extraction and feature hypothesization are performed by the application of sequences of operators. These sequences, call...
详细信息
A distributed rule-based system for automatic speech recognition is described. Acoustic property extraction and feature hypothesization are performed by the application of sequences of operators. These sequences, called plans, are executed by cooperative expert programs. Experimental results on the automatic segmentation and recognition of phrases, made of connected letters and digits, are described and discussed.
A new, parallel approach for generating Bresenham-type lines is developed. Coordinate pairs which approximate straight lines on a square grid are derived from line equations. These pairs serve as a basis for the devel...
详细信息
A new, parallel approach for generating Bresenham-type lines is developed. Coordinate pairs which approximate straight lines on a square grid are derived from line equations. These pairs serve as a basis for the development of four new parallel algorithms. One of the algorithms uses the fact that straight line generation is equivalent to a vector prefix sums calculation. The algorithms execute on a binary tree of processors. Each node in the tree performs a simple calculation that involves only additions and shifts. All four algorithms have time complexity O(log2n) where n in the form 2m denotes the number of points generated and n-1 is the number of processors in the tree. This compares to O(n) for Bresenham's algorithm executed on a sequential processor. Pipelining can be used to achieve a constant time per line generation as long as line length is less than n.
The channel routing problem of a set of two-terminal nets in the knock-knee model is considered. A new approach to route all the nets within d tracks, where d is the density, such that the corresponding layout can be ...
详细信息
The channel routing problem of a set of two-terminal nets in the knock-knee model is considered. A new approach to route all the nets within d tracks, where d is the density, such that the corresponding layout can be realized with three layers is developed. The routing and the layer assignment algorithms run in O(log n) time with n/log n processors on the CREW PRAM model under the reasonable assumption that all terminals lie in the range [1, N], where N = O(n).
This paper presents efficient and portable implementations of a powerful image enhancement process, the Symmetric Neighborhood Filter (SNF), and an image segmentation technique that makes use of the SNF and a variant ...
详细信息
This paper presents efficient and portable implementations of a powerful image enhancement process, the Symmetric Neighborhood Filter (SNF), and an image segmentation technique that makes use of the SNF and a variant of the conventional connected components algorithm which we call delta-Connected Components. We use efficient techniques for distributing and coalescing data as well as efficient combinations of task and data parallelism. The image segmentation algorithm makes use of an efficient connected components algorithm based on a novel approach for parallel merging. The algorithms have been coded in SPLIT-C and run on a variety of platforms, including the Thinking Machines CM-5, IBM SP-1 and SP-2, Gray Research T3D, Meiko Scientific CS-2, Intel Paragon, and workstation clusters. Our experimental results are consistent with the theoretical analysis (and provide the best known execution times for segmentation, even when compared with machine-specific implementations). Our test data include difficult images from the Landsat Thematic Mapper (TM) satellite data.
The main results of this paper are efficient parallel algorithms, MSP and LOCATE, for computing minimal spanning trees and locating minimal paths in directed graphs, respectively. Algorithm MSP has time complexityO(lo...
详细信息
The main results of this paper are efficient parallel algorithms, MSP and LOCATE, for computing minimal spanning trees and locating minimal paths in directed graphs, respectively. Algorithm MSP has time complexityO(log3 n) usingO(n 3/logn) processors, while LOCATE has time complexityO(logn) usingO(n 2) processors. Algorithm MSP is derived from sequential algorithms, when the unbounded parallelism model is used.
Tensor factorization has proven useful in a wide range of applications, from sensor array processing to communications, speech and audio signal processing, and machine learning. With few recent exceptions, all tensor ...
详细信息
Tensor factorization has proven useful in a wide range of applications, from sensor array processing to communications, speech and audio signal processing, and machine learning. With few recent exceptions, all tensor factorization algorithms were originally developed for centralized, in-memory computation on a single machine;and the few that break away from this mold do not easily incorporate practically important constraints, such as non-negativity. A new constrained tensor factorization framework is proposed in this paper, building upon the Alternating Direction Method of Multipliers (ADMoM). It is shown that this simplifies computations, bypassing the need to solve constrained optimization problems in each iteration;and it naturally leads to distributed algorithms suitable for parallel implementation. This opens the door for many emerging big data-enabled applications. The methodology is exemplified using non-negativity as a baseline constraint, but the proposed framework can incorporate many other types of constraints. Numerical experiments are encouraging, indicating that ADMoM-based non-negative tensor factorization (NTF) has high potential as an alternative to state-of-the-art approaches.
暂无评论