In this paper we present a homography algorithm to produce image mosaics using parallelism to solve a multiple Singular Value Decomposition (SVD) system. We analyse four state of art SVD methods and choose the one whi...
详细信息
In this paper we present a homography algorithm to produce image mosaics using parallelism to solve a multiple Singular Value Decomposition (SVD) system. We analyse four state of art SVD methods and choose the one which better suites the expected size of the matrices derived from the datasets of interest. Then we use cuda to accelerate the solution of the transformation homogeneous matrices.
image fusion is a technique to combine multiple images from a single sensor or multiple sensors into a single composite image without introducing artifacts. This paper presents a novel implementation of Laplacian pyra...
详细信息
image fusion is a technique to combine multiple images from a single sensor or multiple sensors into a single composite image without introducing artifacts. This paper presents a novel implementation of Laplacian pyramid image fusion on field programmable gate array (FPGA). Real time image fusion using pyramid decomposition is achieved by utilizing re-usable memory architecture and parallelisation techniques to give an output in 35ms for an image of resolution 320×256, which provides a speedup of 17 times than the general purpose computer based solution and a speedup of 2.2 times compared to a GPU based implementation.
Subsurface images are widely used by the oil companies to find oil reservoirs. The construction of these images involves to collect and process a huge amount of seismic data. Generally, the oil companies use compressi...
详细信息
Subsurface images are widely used by the oil companies to find oil reservoirs. The construction of these images involves to collect and process a huge amount of seismic data. Generally, the oil companies use compression algorithms to reduce the storage and transmission costs. Currently, the compression process is developed on-site using CPU architectures, whereas the construction of the subsurface images is developed on GPU clusters. For this reason, the decompression process has to be developed on GPU architectures. So, fast and parallel decompression algorithms are required to be implemented on GPUs. We implemented an algorithm that performs the decompression of seismic traces on GPU. The algorithm is based on a 2D Lifting Wavelet Transform. The decompression algorithm was developed in CUDA 6.5 and implemented into a GeForce GTX660 GPU. This algorithm was tested using different data sets supplied by an oil company. Experimental results allowed us to establish how the compression ratio affects the performance of our algorithm. Additionally, we also show how the number of threads per block affects this performance.
When training morphological operators that are locally defined with respect to a neighborhood window, one must deal with the trade off between window size and statistical precision of the learned operator. More precis...
详细信息
When training morphological operators that are locally defined with respect to a neighborhood window, one must deal with the trade off between window size and statistical precision of the learned operator. More precisely, too small windows result in large restriction errors due to the constrained operator space and, on the other hand, too large windows result in large variance error due to often insufficient number of samples. A two-level training method that combines a number of operators designed on distinct windows of moderate size is an effective way to mitigate this issue. However, in order to train combined operators, one must specify not only how many operators will be combined, but also the windows for each of them. To date, a genetic algorithm that searches for window combinations has produced the best results for this problem. In this work we propose an alternative approach that is computationally much more efficient. The proposed method consists in efficiently reducing the search space by ranking windows of a collection according to an entropy based measure estimated from input-output joint probabilities. Computational efficiency comes from the fact that only few operators need to be trained. Experimental results show that this method produces results that outperform the best results obtained with manually selected combinations and are competitive with results obtained with the genetic algorithm based solution. The proposed approach is, thus, a promising step towards fully automating the process of binary morphological operator design.
With the advent of multicore processor architectures and the existence of a huge legacy code base, the need for efficient and scalable parallel zing compilers is growing. Where multi-core processors were seen as the w...
详细信息
With the advent of multicore processor architectures and the existence of a huge legacy code base, the need for efficient and scalable parallel zing compilers is growing. Where multi-core processors were seen as the way forward to address the known challenges such as the memory, power and ILP wall, efficient parallelization to make use of the multiple cores, is still an open issue. In this paper, we present two complementary tools, MCROF and XPU which provide an alternative development path to parallelise applications and that address the challenges of identifying potential parallelism and exploiting it in a different way. The MCROF tool provides a detailed profile of the data flowing inside an application and the XPU programming paradigm provides an intuitive and simple interface to express parallelism as well as the necessary runtime support. We demonstrate through two different use cases that better performance up to 4× can be achieved than available commercial compilers.
The LZW compression is a well known patented lossless compression method used in Unix file compression utility "compress" and in GIF and TIFF image formats. It converts an input string of characters (or 8-bi...
详细信息
The LZW compression is a well known patented lossless compression method used in Unix file compression utility "compress" and in GIF and TIFF image formats. It converts an input string of characters (or 8-bit unsigned integers) into a string of codes using a code table (or dictionary) that maps strings into codes. Since the code table is generated by repeatedly adding newly appeared substrings during the conversion, it is very hard to parallelize LZW compression. The main purpose of this paper is to accelerate LZW compression for TIFF images using a CUDA-enabled GPU. Our goal is to implement LZW compression algorithm using several acceleration techniques using CUDA, although it is a very hard task. Suppose that a GPU generates a resulting image generated by a computergraphics or imageprocessing CUDA program and we want to archive it as a LZW-compressed TIFF image in the SSD connected to the host PC. We focused on the following two scenarios. Scenario~1: the resulting image is compressed using a GPU and written in the SSD through the host PC, and Scenario~2: it is transferred to the host PC, and compressed and written in the SSD using a CPU. The experimental results using NVIDIA GeForce GTX 980 and Intel Core i7 4790 show that Scenario 1 using our LZW compression implemented in a GPU is about 3 times faster than Scenario 2. From this fact, we can say that it makes sense to compress images using a GPU to archive them in the SSD.
In this paper we carried out designing and implementing of a target tracking data fusion algorithm based on a two stages graph solution using the computational model Gamma (General Abstract Model for Multiset mAnipula...
详细信息
In this paper we carried out designing and implementing of a target tracking data fusion algorithm based on a two stages graph solution using the computational model Gamma (General Abstract Model for Multiset mAnipulation). The proposed solution is the first parallel implementation of the method PPTS (Pairs of Plots in Two Stages). For this, we employed three Gamma implementations, where two of them exploited the resources of a parallel hardware environment, one using the MPI (Message Passing Interface) and the other one GPU (graphicsprocessing Unit). Thus, the studied algorithm was evaluated from the parallelism exploited and finally was carried out a performance analysis of this algorithm in the three Gamma implementations used. The aim of this study is to provide an implementation on a real problem using for this the paradigm Gamma, which contributes to the implementations of the Gamma computational model, since it enables the performance analysis of these implementations and provides some suggestions for possible improvements. In addition, this work contributes to the PPTS method since it provides the parallelization of the first stage.
As the data acquisition capabilities of Earth observation (EO) satellites have been improved substantially in the past few years, large amount of high-resolution satellite images are downlinked continuously to ground ...
详细信息
As the data acquisition capabilities of Earth observation (EO) satellites have been improved substantially in the past few years, large amount of high-resolution satellite images are downlinked continuously to ground stations. Such amount of data increases rapidly beyond the users' capability to access the images' content in reasonable time. Hence, automatic and fast interpretation of a large data volume is a computationally intensive task. Recently, approximate nearest neighbour search has been used for content-based image retrieval in sub-linear time. Kernelized locality sensitive hashing (KLSH) is a well-known approximate method, which has recently shown promising results for fast remote sensing image retrieval. This paper proposes a novel parallelization of KLSH using Graphical processing Units (GPU), in order to perform fast parallel image retrieval. The proposed method was tested on high-dimensional feature vectors from two satellite-based image datasets, where an average speedup of 20 times was achieved.
In graphicsprocessing units (GPUs), memory access latency is one of the most critical performance hurdles. Several warp schedulers and memory prefetching algorithms have been proposed to avoid the long memory access ...
详细信息
In graphicsprocessing units (GPUs), memory access latency is one of the most critical performance hurdles. Several warp schedulers and memory prefetching algorithms have been proposed to avoid the long memory access latency. Prior application characterization studies shed light on the interaction between applications, GPU micro architecture and memory subsystem behavior. Most of these studies, however, only present aggregate statistics on how memory system behaves over the entire application run. In particular, they do not consider how individual load instructions in a program contribute to the aggregate memory system behavior. The analysis presented in this paper shows that there are two distinct classes of load instructions, categorized as deterministic and non-deterministic loads. Using a combination of profiling data from a real GPU card and cycle accurate simulation data we show that there is a significant performance impact disparity when executing these two types of loads. We discuss and suggest several approaches to treat these two load categories differently within the GPU micro architecture for optimizing memory system performance.
Denoising of Time-of-Flight (ToF) range data is an important task prior to further data processing. Existing techniques commonly work on a post processing level. This paper presents a novel approach for improving data...
详细信息
Denoising of Time-of-Flight (ToF) range data is an important task prior to further data processing. Existing techniques commonly work on a post processing level. This paper presents a novel approach for improving data quality on the image acquisition level by automatically determining the best integration time for arbitrary scenes. Our approach works on a per-pixel basis and uses knowledge gained from an extensive analysis of the underlying inherent sensor behavior regarding intensity, amplitude and distance error to reduce the overall error, to prevent oversaturation and to minimize the adaption time. It also works well in presence of various reflectivities and quick changes in the scene. This represents a significant improvement over previous methods.
暂无评论