The authors present a general approach of self-scheduling a non-uniform parallel loop on a distributed-memory machine. The approach has two phases: a static scheduling phase and a dynamic scheduling phase. In addition...
详细信息
The authors present a general approach of self-scheduling a non-uniform parallel loop on a distributed-memory machine. The approach has two phases: a static scheduling phase and a dynamic scheduling phase. In addition to reduce scheduling overhead, using the static scheduling phase allows the data needed by the statically scheduled iterations to be prefetched. The dynamic scheduling phase balances the workload. Data distribution methods for self-scheduling are also the focus of this paper. The authors classify the data distribution methods into four categories and present partial duplication, a method that allows the problem size to grow linearly in the number of processors. The experiments conducted on a 64-node NCUBE show that as much as 79% improvement is achieved over static scheduling on the generation of a false-color image.
parallel evolution strategies are demonstrating to be worthwhile in a variety of contexts. In this paper, besides the classical genetic and evolutionary strategies, a hybrid evolutionary approach which incorporates me...
详细信息
parallel evolution strategies are demonstrating to be worthwhile in a variety of contexts. In this paper, besides the classical genetic and evolutionary strategies, a hybrid evolutionary approach which incorporates memory of the search history within the structure is analyzed. The parallel evolution algorithms are mapped on a distributed memory MIMD multicomputer whose processors are configured in a torus topology. The simulations are conducted using the quadratic assignment problem as an artificial environment. The relationship between genetic representations and recombination operators is investigated. The experimental results obtained show the value of structures richer than bit strings and the effectiveness of memory for the evolution process.< >
A solution to the partitioning problem is presented for a class of data parallel algorithms (including for example explicit difference methods for time-dependent PDE, and imageprocessing algorithms based on local fil...
详细信息
A solution to the partitioning problem is presented for a class of data parallel algorithms (including for example explicit difference methods for time-dependent PDE, and imageprocessing algorithms based on local filters). Conditions are formulated, that characterize the optimal partitioning. From them, an explicit formula for the optimal partitioning is derived, which is valid in special cases. For the general case, the conditions provide a basis for the formulation of iterative partitioning algorithms. One such algorithm is proposed. The partitioning algorithm is intended as a tool to be used in utility routines or, ultimately, compilers, to enhance SPMD programming of MIMD-type computers with distributed memory. Results from an application in image analysis show that the algorithm is suitable for this purpose.
Development of hearing machines with brain-like performance can be approached by faithfully modeling the human auditory nervous system. As an initial step in this effort, a composite cochlear model has been built whic...
详细信息
Development of hearing machines with brain-like performance can be approached by faithfully modeling the human auditory nervous system. As an initial step in this effort, a composite cochlear model has been built which processes acoustic signals in a manner consistent with physiological responses recorded from the mammalian auditory nerve. The model integrates a current state of knowledge about cochlear function and provides a coherent picture of the representation of acoustic signals in the pattern of neural impulses distributed across a tonotopically organized array of auditory-nerve fibers. The model explicitly represents the active electro-mechanical responses of outer hair cells as a possible mechanism for adaptive control of basilar membrane damping, and introduces lateral coupling of resistive elements in the cochlear mechanics based on this mechanism. The output of the model is represented as an acoustic image being transferred to the central auditory nervous system in a massively parallel fashion over the auditory nerve. The impact of the cochlear nonlinearity on this image is explored. Experiments on applying this model to speech analysis show advantages of the cochlear modeling approach over traditional linear analysis methods.
This conference proceedings contains 70 papers. The topics discussed are imageprocessing applications;performance analysis of parallel and multiprocessing programs: applications of parallel architectures to simulatio...
详细信息
ISBN:
(纸本)0818627751
This conference proceedings contains 70 papers. The topics discussed are imageprocessing applications;performance analysis of parallel and multiprocessing programs: applications of parallel architectures to simulation and real time control;parallel languages;network attached storage systems;parallel algorithms;systems support for languages;molecular dynamics;software tools;object oriented programming;computational fluid dynamics;parallel computer architectures;load balancing;computational methods;interprocessor communication;distributed memory multicomputers;compilers;conversions of sequential to parallel programs;parallel program applications;debugging;hypercubes;irregular problems;and systems issues.
In most instances the boundaries between textured regions are defined by the gray level contrasts which result from the local interaction between the texture elements in each region. In such cases, the boundaries can ...
详细信息
ISBN:
(纸本)0819409391
In most instances the boundaries between textured regions are defined by the gray level contrasts which result from the local interaction between the texture elements in each region. In such cases, the boundaries can be accurately characterized by gray level edge segments. Using these edge segments to localize the texture boundary directly addresses the major problem associated with texture segmentation, namely the localization verses classification accuracy conflict. The accuracy of segmentation methods which rely only on spatially distributed properties to characterize the texture, is limited to the spacial extent of the property used. In contrast, gray level edges are significantly more localized. However, before they can be of any use, the gray level edge segments defining the texture boundary must be isolated from the edges defining the texture elements. In this paper, we define a set of properties to do this. We also incorporate these properties into a paralleldistributed algorithm which is used to segment a set of sample texture images.
This thesis describes the partitioning of the linear image filtering problem for multiple DSP systems interconnected by slotted ring networks. An analysis of various partitioning methods is presented. The block parall...
This thesis describes the partitioning of the linear image filtering problem for multiple DSP systems interconnected by slotted ring networks. An analysis of various partitioning methods is presented. The block parallel transform method, based on overlap-and-save processing in the frequency domain, is identified as an effective algorithm for image filtering with multiple processors. In this algorithm, image blocks are distributed to separate processors where filtering operations occur. The slotted ring architecture is shown to effectively support the block parallel transform method. Other architectures are identified which can provide comparable performance. An implementation of the block parallel transform method on a representative slotted ring system is described. processing times correspond well with the analytical model and speed increases are roughly proportional to the number of processors used. A system with four processors filtered a 128 x 128 image 6 to 7 times faster than PC-based software.
The authors present a parallel storage scheme to distribute the elements of an N*N matrix over N memory banks, where N is any (odd or even) power of two, such that any rows, columns, forward and backward diagonals, an...
详细信息
The authors present a parallel storage scheme to distribute the elements of an N*N matrix over N memory banks, where N is any (odd or even) power of two, such that any rows, columns, forward and backward diagonals, and square or rectangular blocks can be accessed simultaneously without memory conflict. They present a simple scheme for address generation, which requires only logic operations and can be completed in constant time. They present two network implementation methods for data alignments for this storage scheme. Different from previously proposed routing algorithms, the algorithms for hypercube routing in this paper are free from network conflict. They do not require buffering and time length of a 'step' is shorter, therefore they are more efficient in terms of both hardware cost and speed. The authors also present a simple MIN implementation scheme for the realization of the data alignments. Schemes for processing smaller matrices efficiently on larger scale systems are also developed.< >
Several methods for parallel affine image warping on a linear processor array are considered. The methods were implemented on the Carnegie Mellon Warp machine and the Carnegie Mellon-Intel Corporation iWarp computer (...
详细信息
Several methods for parallel affine image warping on a linear processor array are considered. The methods were implemented on the Carnegie Mellon Warp machine and the Carnegie Mellon-Intel Corporation iWarp computer (treated as a linear array), and performance figures are provided. Both systolic methods, which feed one of the images in a stream, and non-systolic methods, which partition both images, are treated. A scanline method that combines some of the features of both, but which requires a fast transposed method is also described. The authors articulate three characteristics that affect the design of parallelimage warping algorithms: affine warping is easily invertible, the mapping is known at the start of execution, and nearby input pixels map to nearby output pixels. The authors conclude that non-systolic methods give slightly better execution time and are easier to programs than systolic methods but require much larger processor memories.< >
This paper presents a system for dynamic intelligent scheduling and control of reconfigurable parallel processors. The purpose of the system is to provide a rapid prototyping capability for computer vision/image proce...
详细信息
This paper presents a system for dynamic intelligent scheduling and control of reconfigurable parallel processors. The purpose of the system is to provide a rapid prototyping capability for computer vision/imageprocessing tasks. The scheduler particularly addresses the problems of algorithms with execution times that depend on the image data and processing scenarios that vary dynamically based on the input image. Since conventional scheduling methods cannot produce schedules for most tasks of this type, a dynamic controller is used to schedule the task and reconfigure the machine “on the fly.” This dynamic scheduling system attempts to balance the overall processing scenario with the needs of the individual routines that make up the task. This paper discusses the implementation of the DISC ( D ynamic I ntelligent S cheduling and C ontrol) system. Emphasis is on the scheduling heuristics as they apply to a reconfigurable parallel processor, the information in the system database, and the use of the system for prototyping computer vision/imageprocessing tasks on a partitionable parallel system.
暂无评论