The binary-swap and the parallel-pipelined methods are two popular image composition methods for volume rendering on distributed memory multicomputers. However, these methods either restrict the number of processors t...
详细信息
ISBN:
(纸本)0769509908
The binary-swap and the parallel-pipelined methods are two popular image composition methods for volume rendering on distributed memory multicomputers. However, these methods either restrict the number of processors to a power of two or require many steps to transform image data that results in high communication overheads. In this paper, we present an efficient image composition method, the rotate-tiling (RT), for parallel volume rendering on distributed memory multicomputers. The RT method can fully utilize all available processors and minimize the communication overheads. In addition, we provide data compression method, the template run-length encoding (TRLE), to further reduce the communication data size. To evaluate the performance of the RT method, we compare the proposed method with the binary-swap method and the parallel-pipelined method. Both theoretical analysis and experimental test are conducted. In the theoretical analysis, we analyze the best performance bound of the RT method in terms of the startup time, the data transmission time, the number of processors, and the number of initial block of a sub-image. In the experimental test, we have implemented these three methods on an SP2 parallel machine. Three volume datasets are used as test samples. The experimental results show that our method outperforms the binary-swap and the parallel-pipelined methods for all test samples and match the results analyzed in the theoretical analysis. For the TRLE method, the experimental results show that the TRLE method can further reduce the composition time for these three methods.
Software pipelining is an instruction-level loop scheduling method for achieving high performance fine-grain parallelism on VLIW (very long instruction word) processors. This paper presents a novel software pipelining...
详细信息
ISBN:
(纸本)9539676940
Software pipelining is an instruction-level loop scheduling method for achieving high performance fine-grain parallelism on VLIW (very long instruction word) processors. This paper presents a novel software pipelining method for non-pipelining parallel processors based on integer scaling and retiming transformations. This approach generalises and simplifies the analogous extended retiming model of T.W. O'Neil et al. (see Proc. ISCA 12th Int. Conf. parallel & distributed Computing Syst., p.292-7, 1999; Proc. of ICASSP'99 Conf., vol.4 p.2001-4, 1999). Matrix techniques are used in order to simplify the corresponding graph transformations. Some general properties taken from algebraic graph theory are applied in order to obtain general scheduling techniques: node and cycle methods. The two-phase scheduling method considered is first defined by means of two standard linear programming problems. We transform the corresponding problems into some variants of the maximum cost-to-time ratio problem and shortest path problem, in order to obtain efficient polynomial time algorithms. An example of software pipelining optimization of a digital correlator is also given.
This paper describes a parallel implementation developed to improve the time performance of the Iterative Closest Point Algorithm. Within each iteration, the correspondence calculations are distributed among the proce...
详细信息
ISBN:
(纸本)0769509843
This paper describes a parallel implementation developed to improve the time performance of the Iterative Closest Point Algorithm. Within each iteration, the correspondence calculations are distributed among the processor resources. At the end of each iteration, the results of the correspondence determination are communicated back to a central processor and the current transformation is calculated A number of additional techniques were developed that served to improve upon this basic scheme. Calculating the partial sums within each distributed resource made it unnecessary to transmit the correspondence values back to the central processor, which reduced the communication overhead, and improved time performance. Randomly distributing the points among the processor resources resulted in a better load balancing, which further improved time performance. We also found that thinning the image by randomly removing a certain percentage of the points did not improve the performance, when viewed as the progression of mse with time. The method was implemented and tested on a 22 node Beowulf class cluster. For a large image, linear performance improvements were obtained for up to 16 processors, while they held for up to 8 processors with a smaller image.
雷达遥感图像的处理,由于受单机内存空间的限制,一般采用I/O函数随机访问磁盘图像文件的方式,因此完成整幅图像的处理需要耗费大量的时间,很难达到实际应用的需要。基于分布式共享内存网络系统JIAJIA软件将多台微机的物理内存连接构成一个较大的共享内存空间,实现了多台微机对遥感图像同步、方便、快捷的处理。通过对SAR图像几何纠正、图像滤波、监督分类串行算法的分析,发展了相应的并行处理算法,并在8台运行Linux操作系统,主频400MHz,内存256兆的Pentium II PC机上进行了实验,都获得了超线性加速比的实验结果。
The proceedings contain 15 papers from the conference on parallel and distributed methods for image processing II. The topics discussed include: parallel DSP with memory and I/O processors;analog VLSI implementation o...
详细信息
The proceedings contain 15 papers from the conference on parallel and distributed methods for image processing II. The topics discussed include: parallel DSP with memory and I/O processors;analog VLSI implementation of a morphological associative memory;real-time parallel video imageprocessing on a PC cluster;thread concept for automatic task parallelization in image analysis;new parallel vision environment in heterogeneous networked computing and toolkit for parallelimageprocessing.
parallel systems provide a robust approach for high performance computing. Lately the use of parallel computing has become more available as new;parallel environments have evolved. Low cost and high performance of off...
详细信息
ISBN:
(纸本)0819438626
parallel systems provide a robust approach for high performance computing. Lately the use of parallel computing has become more available as new;parallel environments have evolved. Low cost and high performance of off-the-shelf PC processors have made PC-based multiprocessor systems popular. These systems typically contain two or four processors. Standardized POSIX-threads have formed an environment for the effective utilization of several processors, Moreover, distributed computing using networks of workstations has increased. The motivation for this work is to apply these techniques in computer vision. The Hough Transform (HT) is a well-known method for detecting global features in digital images. However, in practice, the sequential HT is a slow method with large images. We study the behavior of line detecting HT with both message passing workstation networks and shared-memory, multiprocessor systems. parallel approaches suggested in this paper seem to decrease the computation time of HT significantly. Thus, the methods are useful for real-world applications.
Clustering is a basic operation in imageprocessing and computer vision, and it plays an important role in unsupervised pattern recognition and image segmentation. While there are many methods for clustering, the sing...
详细信息
Clustering is a basic operation in imageprocessing and computer vision, and it plays an important role in unsupervised pattern recognition and image segmentation. While there are many methods for clustering, the single-link hierarchical clustering is one of the most popular techniques. In this paper, with the advantages of both optical transmission and electronic computation, we design efficient parallel hierarchical clustering algorithms on the arrays with reconfigurable optical buses (AROB). We first design three efficient basic operations which include the matrix multiplication of two N x N matrices, finding the minimum spanning tree of a graph with N vertices, and identifying the connected component containing a specified vertex. Based on these three data operations, an O(log N) time parallel hierarchical clustering algorithm is proposed using N-3 processors. Furthermore, if the connectivity of the AROB with four-port connection is allowed, two constant time clustering algorithms can be also derived using N-4 and N-3 processors, respectively. These results improve on previously known algorithms developed on various parallel computational models. (C) 2000 Academic Press.
We shall show how to extract various important information on the motion of a target (enemy aircraft) from a sequence of its image frames obtained using a single imaging sensor which is mounted on an aircraft. Specifi...
详细信息
ISBN:
(纸本)0819437638
We shall show how to extract various important information on the motion of a target (enemy aircraft) from a sequence of its image frames obtained using a single imaging sensor which is mounted on an aircraft. Specifically, we present an algorithm which estimates following 12 parameters of an enemy aircraft from its image sequence;position (three), linear velocity (three), attitude (three), instantaneous angular velocity (three) and optional acceleration (three). To extract the attitude, we use a matching algorithm for the captured image frame to the modeled image. The objective function is the value of correlation between these two images, and we use simulated annealing and downhill simplex methods for maximizing the objective function. Finally, estimation of the position and the attitude is accomplished with a 12-state extended Kalman filter. Through simulations, we show that the proposed algorithm is superior to conventional algorithms which do not use the attitude information.
暂无评论