Many avionics systems use specialized parallel architectures to speed processing and to increase system reliability. The software used therein is frequently divided into tasks and executed concurrently on multiple pro...
详细信息
Many avionics systems use specialized parallel architectures to speed processing and to increase system reliability. The software used therein is frequently divided into tasks and executed concurrently on multiple processors under strict real-time constraints critical to the mission's successful performance. Scheduling and planning are needed for effectively managing the computational resources on such avionics architectures. Since most real-time scheduling problems are known to be NP-hard, an approximation approach that applies heuristic methods using conventional computer algorithms has been used to solve these scheduling problems. Artificial intelligence (AI) planners have been used extensively in manufacturing scheduling and operations research. In this paper, we demonstrate the idea of using AI planners to perform scheduling through an example. We derive a solution to scheduling several image tasks on a distributed computer system, using the AI planner PRODIGY. The basic characteristics of AI planners in general and the PRODIGY solver in particular are described, the domain theory and problem specification for our problem through the PRODIGY description language PDL are presented.
An edge detection process in computer vision and imageprocessing detects any types of significant features appearing as discontinuities in intensities. This paper presents our experience with parallelizing an edge de...
详细信息
An edge detection process in computer vision and imageprocessing detects any types of significant features appearing as discontinuities in intensities. This paper presents our experience with parallelizing an edge detection application algorithm that reduces noise and unnecessary detail in a gray-scale image from a coarse level to a fine level of resolution by using an edge focusing technique. Numerical methods and parallel implementations of edge focusing are presented. The image detection algorithms are implemented on three representative massage-passing architectures: a low-cost heterogeneous PVM network, an Intel iPSC/860 hypercube, and a CM-5 massively parallel multicomputer. Our objectives are to provide insight into implementation and performance issues for imageprocessing applications on general-purpose message-passing architectures, to investigate implications on network variations, and to evaluate the computing scalabilities on the three network systems by examining execution and communication patterns of the image edge detection application.
The problem considered in this paper is the definition of an efficient parallel algorithm for texture analysis of an image. The target architectures are distributed-memory general-purpose MIMD parallel machine. The so...
详细信息
The problem considered in this paper is the definition of an efficient parallel algorithm for texture analysis of an image. The target architectures are distributed-memory general-purpose MIMD parallel machine. The solutions here proposed are based on two different methods: the Statistic Feature Matrix and the Wavelet Decomposition.
The Discrete Wavelet Transform (DWT) is becoming a widely used tool in imageprocessing and other data analysis areas. A non-conventional variation of a spatio-temporal 3D DWT has been developed in order to analyze mo...
详细信息
The Discrete Wavelet Transform (DWT) is becoming a widely used tool in imageprocessing and other data analysis areas. A non-conventional variation of a spatio-temporal 3D DWT has been developed in order to analyze motion in time-sequential imagery. The computational complexity of this algorithm is Θ(n3), where n is the number of samples in each dimension of the input image sequence. methods are needed to increase the speed of these computations for large data sets. Fortunately, wavelet decomposition is very amenable to parallelization. Coarse-grained parallel versions of this process have been design and implemented on three different architectures: a distributed network represented by a distributed network of Sun SPARCstation 2 workstations;two Intel hypercubes (an iPSC/2 and an iPSC/860);and a Thinking Machines Corporation CM-5, a massively parallel SPMD. This non-conventional 3D wavelet decomposition is very suitable for course-grain implementation on parallel computers with proper load balancing. Close to linear speedup over serial implementations has been achieved using a distributed network. Near-linear speedup was obtained on the hypercubes and the CM-5 for a variety of image-processing applications.
学位级别:Ph.D., Doctor of Philosophy, Ph.D./Interdisciplinary
Object-driven dataflow (ODF) is a methodology that provides a system to build applications on heterogeneous multiprocessing systems or networks that support multiple parallelprocessing topologies in a hierarchical st...
详细信息
Object-driven dataflow (ODF) is a methodology that provides a system to build applications on heterogeneous multiprocessing systems or networks that support multiple parallelprocessing topologies in a hierarchical structure. Its discriminating features are driven by the generalization of macro-dataflow computing to heterogeneous systems and the focus on large system development through object-oriented computing. Natural heterogeneous processing is examined to gain insight into the principles and conceptual models of heterogeneous processing which are integrated into ODF. ODF was developed to be applicable to both research and product development in the area of imageprocessing. To enable this, it must provide load-time (not compile time) adaptability without sacrificing performance. The intent is to take advantage of the inherent parallelism in dataflow computing on high level, while allowing individual tasks to be developed with traditional imperative methods, if desired. The ODF determines the execution flow on a task level, in addition to providing the communication framework for intra-task communication. Dynamic binding is used to initiate the appropriate code on each of the nodes given the availability of the different types of nodes, object types, and the specific application. The objects are commonly decomposed to the grain-size appropriate to a given task on the available processing node.
An edge detection process in computer vision and imageprocessing detects any types of significant features appearing as discontinuities in intensities. This paper presents our experience with parallelizing an edge de...
详细信息
ISBN:
(纸本)0818664274
An edge detection process in computer vision and imageprocessing detects any types of significant features appearing as discontinuities in intensities. This paper presents our experience with parallelizing an edge detection application algorithm that reduces noise and unnecessary detail in a gray-scale image from a coarse level to a fine level of resolution by using an edge focusing technique. Numerical methods and parallel implementations of edge focusing are presented. The image detection algorithms are implemented on three representative message-passing architectures: a low-cost heterogeneous PVM network, an Intel iPSC/860 hypercube, and a CM-5 massively parallel multicomputer. Our objectives are to provide insight into implementation and performance issues for imageprocessing applications on general-purpose message-passing architectures, to investigate implications an network variations, and to evaluate the computing scalabilities on the three network systems by examining execution and communication patterns of the image edge detection application.< >
The problem considered in this paper is the definition of an efficient parallel algorithm for texture analysis of an image. The target architectures are distributed-memory general-purpose MIMD parallel machines. The s...
详细信息
The problem considered in this paper is the definition of an efficient parallel algorithm for texture analysis of an image. The target architectures are distributed-memory general-purpose MIMD parallel machines. The solutions proposed here are based on two different methods, the Statistic Feature Matrix and the Wavelet Decomposition.< >
The Discrete Wavelet Transform (DWT) is becoming a widely used tool in imageprocessing and other data analysis areas. A non-conventional variation of a spatiotemporal 3D DWT has been developed in order to analyze mot...
详细信息
The Discrete Wavelet Transform (DWT) is becoming a widely used tool in imageprocessing and other data analysis areas. A non-conventional variation of a spatiotemporal 3D DWT has been developed in order to analyze motion in time-sequential imagery. The computational complexity of this algorithm is /spl Theta/(n/sup 3/), where n is the number of samples in each dimension of the input image sequence. methods are needed to increase the speed of these computations for large data sets. Fortunately, wavelet decomposition is very amenable to parallelization. Coarse-grained parallel versions of this process have been designed and implemented on three different architectures: a distributed network represented by a distributed network of Sun SPARCstation 2 workstations: two Intel hypercubes (an iPSC/2 and an iPSC/860); and a Thinking Machines Corporation CM-5, a massively parallel SPMD. This non-conventional 3D wavelet decomposition is very suitable for coarse-grain implementation on parallel computers with proper load balancing. Close to linear speedup over serial implementations has been achieved using a distributed network. Near-linear speedup was obtained on the hypercubes and the CM-5 for a variety of image-processing applications.< >
Segmentation and other imageprocessing operations rely on convolution calculations with heavy computational and memory access demands. The article presents an analysis of a texture segmentation application containing...
详细信息
Segmentation and other imageprocessing operations rely on convolution calculations with heavy computational and memory access demands. The article presents an analysis of a texture segmentation application containing a 96/spl times/96 convolution. Sequential execution required several hours an single processor systems with over 99% of the time spent performing the large convolution. 70% to 75% of execution time is attributable to cache misses within the convolution. We implemented the same application on CM-5, iPSC/860 and PVM distributed memory multicomputers, tailoring the parallel algorithms to each machine's architecture. parallelization significantly reduced execution time, taking 49 seconds on a 512 node CM-5 and 6.5 minutes on a 32 node iPSC/860. The results indicate for large kernel convolutions the size and bandwidth of the fast memory store is more important than processor power or communication overhead.< >
暂无评论