We propose the distributed fractal image compression and decompression on a parallel virtual machine (PVM) system. We apply a regional search for the fractal image compression to reduce the communication cost on the d...
详细信息
ISBN:
(纸本)0769509517
We propose the distributed fractal image compression and decompression on a parallel virtual machine (PVM) system. We apply a regional search for the fractal image compression to reduce the communication cost on the distributed system PVM. The regional search is a partitioned iterated function system search from a region of the image instead of over the whole image. Because the area surrounding a partitioned block is similar to this block possibly, finding the fractal codes by regional search has a higher compression ratio and less compression time. When implemented on the PVM, the fractal image compression using regional search reduces the compression time with lower compression loss. When we compress the image Lena with an image size of 1024/spl times/1024 using a region size of 512/spl times/512 on the PVM with 4 Pentium ii-300 PCs, the compression time is 13.6 seconds, the compression ratio is 6.34 and the PSNR is 38.59. However, it takes 176 seconds, have a compression ratio of 6.30 and have a PSNR of 39.68 by the conventional fractal image compression. In addition, when the region size is 128/spl times/128, the compression time is 7.8 seconds, the compression ratio is 7.53 and the PSNR is 36.67. In the future, we can apply this method to the fractal image compression using neural networks.
雷达遥感图像的处理,由于受单机内存空间的限制,一般采用I/O函数随机访问磁盘图像文件的方式,因此完成整幅图像的处理需要耗费大量的时间,很难达到实际应用的需要。基于分布式共享内存网络系统JIAJIA软件将多台微机的物理内存连接构成一个较大的共享内存空间,实现了多台微机对遥感图像同步、方便、快捷的处理。通过对SAR图像几何纠正、图像滤波、监督分类串行算法的分析,发展了相应的并行处理算法,并在8台运行Linux操作系统,主频400MHz,内存256兆的Pentium ii PC机上进行了实验,都获得了超线性加速比的实验结果。
The proceedings contain 15 papers from the conference on parallel and distributed methods for image processing ii. The topics discussed include: parallel DSP with memory and I/O processors;analog VLSI implementation o...
详细信息
The proceedings contain 15 papers from the conference on parallel and distributed methods for image processing ii. The topics discussed include: parallel DSP with memory and I/O processors;analog VLSI implementation of a morphological associative memory;real-time parallel video imageprocessing on a PC cluster;thread concept for automatic task parallelization in image analysis;new parallel vision environment in heterogeneous networked computing and toolkit for parallelimageprocessing.
Software pipelining is an instruction-level loop scheduling method for achieving high performance fine-grain parallelism on VLIW (very long instruction word) processors. This paper presents a novel software pipelining...
详细信息
ISBN:
(纸本)9539676940
Software pipelining is an instruction-level loop scheduling method for achieving high performance fine-grain parallelism on VLIW (very long instruction word) processors. This paper presents a novel software pipelining method for non-pipelining parallel processors based on integer scaling and retiming transformations. This approach generalises and simplifies the analogous extended retiming model of T.W. O'Neil et al. (see Proc. ISCA 12th Int. Conf. parallel & distributed Computing Syst., p.292-7, 1999; Proc. of ICASSP'99 Conf., vol.4 p.2001-4, 1999). Matrix techniques are used in order to simplify the corresponding graph transformations. Some general properties taken from algebraic graph theory are applied in order to obtain general scheduling techniques: node and cycle methods. The two-phase scheduling method considered is first defined by means of two standard linear programming problems. We transform the corresponding problems into some variants of the maximum cost-to-time ratio problem and shortest path problem, in order to obtain efficient polynomial time algorithms. An example of software pipelining optimization of a digital correlator is also given.
This paper describes a parallel implementation developed to improve the time performance of the Iterative Closest Point Algorithm. Within each iteration, the correspondence calculations are distributed among the proce...
详细信息
ISBN:
(纸本)0769509843
This paper describes a parallel implementation developed to improve the time performance of the Iterative Closest Point Algorithm. Within each iteration, the correspondence calculations are distributed among the processor resources. At the end of each iteration, the results of the correspondence determination are communicated back to a central processor and the current transformation is calculated A number of additional techniques were developed that served to improve upon this basic scheme. Calculating the partial sums within each distributed resource made it unnecessary to transmit the correspondence values back to the central processor, which reduced the communication overhead, and improved time performance. Randomly distributing the points among the processor resources resulted in a better load balancing, which further improved time performance. We also found that thinning the image by randomly removing a certain percentage of the points did not improve the performance, when viewed as the progression of mse with time. The method was implemented and tested on a 22 node Beowulf class cluster. For a large image, linear performance improvements were obtained for up to 16 processors, while they held for up to 8 processors with a smaller image.
We have performed the first detailed spatially resolved spectroscopy of Cas A in the 1.6-10 keV energy range, using data taken with the MECS spectrometer on board the BeppoSAX Observatory. The well calibrated point sp...
详细信息
We have performed the first detailed spatially resolved spectroscopy of Cas A in the 1.6-10 keV energy range, using data taken with the MECS spectrometer on board the BeppoSAX Observatory. The well calibrated point spread function in the central region of the MECS allowed us to perform a spatial deconvolution of the data at full energy resolution. We eventually generated a set of spectra, covering a region of similar to3 ' radius around the centre of Cas A. The results obtained by fitting these spectra using a non-equilibrium ionisation plasma model and a power law, improve our knowledge about chemical and physical parameters of the Cas A SuperNova Remnant: (i) a single thermal component is sufficient to fit all the spectra;(ii) kT is rather uniformly distributed with a minimum in the east and a maximum in the west, and no evidence is found for high kT expected from the interaction of the main shock with the ISM;(iii) from the distribution of the values of the ionisation parameter n(e)t we infer the presence of two distinct components: the first (a) with n(e) in the range 1-10 cm(-3), the second (b) with values ten times higher;if we associate component a, to the CSM and component b to the ejecta, the mass ratio M(a)/nd(b) less than or equal to 1/10 indicates a progenitor star that lost only a small fraction of the envelope during its pre-SN life. In this hypothesis the distribution of component b across the remnant suggests that the explosion was not spherically symmetric;(iv) the distribution of abundances indicates that we are detecting a CSM component with almost solar composition, and an ejecta component enriched in heavier elements. Abundances found for a-elements are consistent with the current view that Cas A was produced by the explosion of a massive star. A low Fe overabundance can be an indication that at the moment of the explosion the mass-cut was rather high, locking most of the produced Ni-56 into the stellar remnant.
There are two fundamental problems to be solved in any scalable computer system: tolerate and hide latency of remote accesses, and, tolerate and hide idling due to synchronization among parallel processes. Architectur...
详细信息
There are two fundamental problems to be solved in any scalable computer system: tolerate and hide latency of remote accesses, and, tolerate and hide idling due to synchronization among parallel processes. Architectures which can not solve these issues will fail in building large-scale parallelprocessing systems. One possible solution for tolerating memory and synchronization latency is the introduction of threads and fast context switching mechanism among threads. Systems which support this technique are called multithreaded systems. Multimedia applications usually require large computing power and thus, massivelly parallel systems are good candidates for such tasks. Additionally, multimedia applications usually involve the processing of huge amount of data (e.g. audio or video information), therefore both the classical shared or distributed memory parallel systems may be inadequate for fulfilling all the needs. Finally, multimedia applications (e.g. imageprocessing) in some cases may require other computing model than current commodity RISC processors can provide. A range of multithreaded architectures can be idealistic for multimedia applications, which is massively parallel, has distributed memory for the sake of scalability. Such architectures, which support remote memory accesses, may be a proper combination of different computing models, e.g. von Neumannn and dataflow ones. In this paper, the design space of multithreaded architectures is introduced, and a certain architecture, called KUMP/D (Kyushu University Multimedia Processor on Datarol-ii) is described. It is also shown how a multi-threaded architecture can be built in a short design cycle by using a commercial high-end microprocessor and easily programmable hardware devices.
Clustering is a basic operation in imageprocessing and computer vision, and it plays an important role in unsupervised pattern recognition and image segmentation. While there are many methods for clustering, the sing...
详细信息
Clustering is a basic operation in imageprocessing and computer vision, and it plays an important role in unsupervised pattern recognition and image segmentation. While there are many methods for clustering, the single-link hierarchical clustering is one of the most popular techniques. In this paper, with the advantages of both optical transmission and electronic computation, we design efficient parallel hierarchical clustering algorithms on the arrays with reconfigurable optical buses (AROB). We first design three efficient basic operations which include the matrix multiplication of two N x N matrices, finding the minimum spanning tree of a graph with N vertices, and identifying the connected component containing a specified vertex. Based on these three data operations, an O(log N) time parallel hierarchical clustering algorithm is proposed using N-3 processors. Furthermore, if the connectivity of the AROB with four-port connection is allowed, two constant time clustering algorithms can be also derived using N-4 and N-3 processors, respectively. These results improve on previously known algorithms developed on various parallel computational models. (C) 2000 Academic Press.
暂无评论