Data parallel visual reconstruction and partitioning algorithms and the associated code are developed for a vector random access machine (V-RAM). Finite element algorithms are constructed for solving the one-dimension...
详细信息
Data parallel visual reconstruction and partitioning algorithms and the associated code are developed for a vector random access machine (V-RAM). Finite element algorithms are constructed for solving the one-dimensional visual reconstruction problem with the input data consisting of a symmetrical top hat loading for the modeling of interacting step discontinuities. The advantage of the V-RAM implementation is the general code applicability to a variety of architectures. A specific implementation is performed on a distributed Array Processor (DAP) simulator on the VAX 6000–420. Execution times on the DAP simulator are obtained and are found to be in agreement with the algorithmic complexities of the V-RAM code.
A distributed arithmetic implementation for two-dimensional FIR digital filters for real-time imageprocessing is presented. parallelism and pipelining are two features of the proposed filter structure that contribute...
详细信息
A distributed arithmetic implementation for two-dimensional FIR digital filters for real-time imageprocessing is presented. parallelism and pipelining are two features of the proposed filter structure that contribute to its high-speed performance. Speed performance and hardware complexity are evaluated, and the effects of finite-precision arithmetic are considered.< >
The authors present a general approach of self-scheduling a non-uniform parallel loop on a distributed-memory machine. The approach has two phases: a static scheduling phase and a dynamic scheduling phase. In addition...
详细信息
The authors present a general approach of self-scheduling a non-uniform parallel loop on a distributed-memory machine. The approach has two phases: a static scheduling phase and a dynamic scheduling phase. In addition to reduce scheduling overhead, using the static scheduling phase allows the data needed by the statically scheduled iterations to be prefetched. The dynamic scheduling phase balances the workload. Data distribution methods for self-scheduling are also the focus of this paper. The authors classify the data distribution methods into four categories and present partial duplication, a method that allows the problem size to grow linearly in the number of processors. The experiments conducted on a 64-node NCUBE show that as much as 79% improvement is achieved over static scheduling on the generation of a false-color image.
parallel evolution strategies are demonstrating to be worthwhile in a variety of contexts. In this paper, besides the classical genetic and evolutionary strategies, a hybrid evolutionary approach which incorporates me...
详细信息
parallel evolution strategies are demonstrating to be worthwhile in a variety of contexts. In this paper, besides the classical genetic and evolutionary strategies, a hybrid evolutionary approach which incorporates memory of the search history within the structure is analyzed. The parallel evolution algorithms are mapped on a distributed memory MIMD multicomputer whose processors are configured in a torus topology. The simulations are conducted using the quadratic assignment problem as an artificial environment. The relationship between genetic representations and recombination operators is investigated. The experimental results obtained show the value of structures richer than bit strings and the effectiveness of memory for the evolution process.< >
A solution to the partitioning problem is presented for a class of data parallel algorithms (including for example explicit difference methods for time-dependent PDE, and imageprocessing algorithms based on local fil...
详细信息
A solution to the partitioning problem is presented for a class of data parallel algorithms (including for example explicit difference methods for time-dependent PDE, and imageprocessing algorithms based on local filters). Conditions are formulated, that characterize the optimal partitioning. From them, an explicit formula for the optimal partitioning is derived, which is valid in special cases. For the general case, the conditions provide a basis for the formulation of iterative partitioning algorithms. One such algorithm is proposed. The partitioning algorithm is intended as a tool to be used in utility routines or, ultimately, compilers, to enhance SPMD programming of MIMD-type computers with distributed memory. Results from an application in image analysis show that the algorithm is suitable for this purpose.
Constraints of the form SIGMA(n = 0)N-1 P(n) = 1, and P n is-an-element-of [0, 1], i.e., unit-simplex constraints, arise frequently in nonlinear neural optimization problems (e.g. traveling salesman, graph matching, a...
详细信息
Constraints of the form SIGMA(n = 0)N-1 P(n) = 1, and P n is-an-element-of [0, 1], i.e., unit-simplex constraints, arise frequently in nonlinear neural optimization problems (e.g. traveling salesman, graph matching, and fuzzy membership). Current methods for incorporating such constraints suffer from one or more of the following disadvantages: (i) they do not strictly confine the search to the unit simplex, with the result that it proceeds over a space of unnecessarily high dimension, and can converge to inconsistent solutions;(ii) they may introduce unwanted additional constraints;(iii) they may possess numerical instabilities;and (iv) they induce nonlocal connection patterns, which is disadvantageous from an implementation standpoint. In this paper we present a stable deterministic approach for incorporating unit-simplex constraints based on a hierarchical deformable-template structure. This approach (i) guarantees strict confinement of the search to the unit-simplex constraint set without introducing unwanted constraints;(ii) leads to a hierarchical, rather than a global, network interconnection structure;(iii) allows multiresolution processing;and (iv) allows easy closed-form incorporation of certain other inherently global constraints, such as general recursive symmetries. Selected examples are presented which illustrate and demonstrate large-scale application of the template method.
This conference proceedings contains 70 papers. The topics discussed are imageprocessing applications;performance analysis of parallel and multiprocessing programs: applications of parallel architectures to simulatio...
详细信息
ISBN:
(纸本)0818627751
This conference proceedings contains 70 papers. The topics discussed are imageprocessing applications;performance analysis of parallel and multiprocessing programs: applications of parallel architectures to simulation and real time control;parallel languages;network attached storage systems;parallel algorithms;systems support for languages;molecular dynamics;software tools;object oriented programming;computational fluid dynamics;parallel computer architectures;load balancing;computational methods;interprocessor communication;distributed memory multicomputers;compilers;conversions of sequential to parallel programs;parallel program applications;debugging;hypercubes;irregular problems;and systems issues.
In most instances the boundaries between textured regions are defined by the gray level contrasts which result from the local interaction between the texture elements in each region. In such cases, the boundaries can ...
详细信息
ISBN:
(纸本)0819409391
In most instances the boundaries between textured regions are defined by the gray level contrasts which result from the local interaction between the texture elements in each region. In such cases, the boundaries can be accurately characterized by gray level edge segments. Using these edge segments to localize the texture boundary directly addresses the major problem associated with texture segmentation, namely the localization verses classification accuracy conflict. The accuracy of segmentation methods which rely only on spatially distributed properties to characterize the texture, is limited to the spacial extent of the property used. In contrast, gray level edges are significantly more localized. However, before they can be of any use, the gray level edge segments defining the texture boundary must be isolated from the edges defining the texture elements. In this paper, we define a set of properties to do this. We also incorporate these properties into a paralleldistributed algorithm which is used to segment a set of sample texture images.
This thesis describes the partitioning of the linear image filtering problem for multiple DSP systems interconnected by slotted ring networks. An analysis of various partitioning methods is presented. The block parall...
This thesis describes the partitioning of the linear image filtering problem for multiple DSP systems interconnected by slotted ring networks. An analysis of various partitioning methods is presented. The block parallel transform method, based on overlap-and-save processing in the frequency domain, is identified as an effective algorithm for image filtering with multiple processors. In this algorithm, image blocks are distributed to separate processors where filtering operations occur. The slotted ring architecture is shown to effectively support the block parallel transform method. Other architectures are identified which can provide comparable performance. An implementation of the block parallel transform method on a representative slotted ring system is described. processing times correspond well with the analytical model and speed increases are roughly proportional to the number of processors used. A system with four processors filtered a 128 x 128 image 6 to 7 times faster than PC-based software.
Several methods for parallel affine image warping on a linear processor array are considered. The methods were implemented on the Carnegie Mellon Warp machine and the Carnegie Mellon-Intel Corporation iWarp computer (...
详细信息
Several methods for parallel affine image warping on a linear processor array are considered. The methods were implemented on the Carnegie Mellon Warp machine and the Carnegie Mellon-Intel Corporation iWarp computer (treated as a linear array), and performance figures are provided. Both systolic methods, which feed one of the images in a stream, and non-systolic methods, which partition both images, are treated. A scanline method that combines some of the features of both, but which requires a fast transposed method is also described. The authors articulate three characteristics that affect the design of parallelimage warping algorithms: affine warping is easily invertible, the mapping is known at the start of execution, and nearby input pixels map to nearby output pixels. The authors conclude that non-systolic methods give slightly better execution time and are easier to programs than systolic methods but require much larger processor memories.< >
暂无评论