Modern Graphics processing Units (GPUs) consist, of several SIMD-processors and thus provide a high degree of parallelism at low cost. We introduce it, new approach to systematically develop parallelimage reconstruct...
详细信息
ISBN:
(纸本)9783540854500
Modern Graphics processing Units (GPUs) consist, of several SIMD-processors and thus provide a high degree of parallelism at low cost. We introduce it, new approach to systematically develop parallelimage reconstruction algorithms for GPUs front their parallel equivalents for distributed-memory machines. We use High-Level Petri Nets (HLPN) to intuitively describe the parallel implementations for distributed-memory machines. By denoting the functions of the HLPN with memory requirements and information about data distribution, we are able to identify parallel functions that can be implemented efficiently on the GPU. For an important iterative medical image reconstruction algorithm -the list-mode OSEM algorithm-we demonstrate the limitations of its distributed-memory implementation and show how our HLPN-based approach leads to a fast implententation on GPUs, reusable across different medical imaging devices.
The pupil detection and tracking is an important step for developing a human-computer interaction system. To develop a human eye-computer interaction system, we examine pupil detection and tracking by imageprocessing...
详细信息
ISBN:
(纸本)9781479986767
The pupil detection and tracking is an important step for developing a human-computer interaction system. To develop a human eye-computer interaction system, we examine pupil detection and tracking by imageprocessing techniques. In the imageprocessing techniques, the illumination directly influences the image quality in general. If influences of illumination is little, we can obtain an image of good image quality. The subsequent imageprocessing techniques are expected almost to succeed. In this paper, in order to avoid the influences of illumination, we have tried to combine the hardware constitution of an infrared light-emitting diode (LED) light, a sensitive infrared camera, and an infrared (IR) filter. In the experiment with this hardware constitution, we investigate the effects of the pupil detection and tracking by imageprocessing techniques for a human eye-computer interaction system.
image matching has played a key role in object recognition and localization. One central problem is to find an efficient and effective approach to search for the best matching between two image sets. In contrast to th...
详细信息
image matching has played a key role in object recognition and localization. One central problem is to find an efficient and effective approach to search for the best matching between two image sets. In contrast to the conventional matching techniques, the innovation of our method detailed in this paper is to propose a hierarchical Chamfer matching scheme based on the dynamic detection of interesting points. The algorithm extends the traditional methods by introducing interesting points to replace edge points in distance transform for the matching measurement. The search for the best matching is guided by minimizing a given matching criterion in an interesting points pyramid from coarse level to fine level. The pyramid is created through a dynamic thresholding scheme and such a hierarchical structure aims to reduce the computation load. The processing speed is further improved by parallel implementation on a low cost heterogeneous PVM (parallel Virtual Machine) network without specific software and hardware requirements. The experimental results demonstrate that our algorithm is simple to implement and quite insensitive to noise and other disturbances with reliability and efficiency.
Asynchronous iterations can be used to implement fixed-point methods such as Jacobi and Gauss-Seidel on parallel computers with high synchronization costs. However, they are rarely considered in practice due to the lo...
详细信息
ISBN:
(纸本)9781509060580
Asynchronous iterations can be used to implement fixed-point methods such as Jacobi and Gauss-Seidel on parallel computers with high synchronization costs. However, they are rarely considered in practice due to the low convergence rate. This paper describes an implementation on GPUs of a novel Power Flow analysis model using asynchronous iterations. We present our model for the solution of the Power Flow analysis problem, prove its convergence and evaluate its performance for a GPU execution.
Constructing a fitting actor system to solve a problem is a task which needs experience. Deducing its properties and behaviour without a running instance and a set of test cases is even harder. In this paper, we show ...
详细信息
ISBN:
(纸本)9781509060580
Constructing a fitting actor system to solve a problem is a task which needs experience. Deducing its properties and behaviour without a running instance and a set of test cases is even harder. In this paper, we show how methods from linear algebra allow us to make statements about the number of messages in a given system. This can be used to reason about reaction to bigger or changed inputs and to analyse how optimisations will change the behaviour of a system. Further different numbers of actors or other design alternatives can be compared.
Low-level imageprocessing operations usually involve simple and repetitive operations over the entire input images, thus image processor may communicate with the memory system or each other frequently, hence the imag...
详细信息
ISBN:
(纸本)0819442836
Low-level imageprocessing operations usually involve simple and repetitive operations over the entire input images, thus image processor may communicate with the memory system or each other frequently, hence the image processor would provide high throughput rate. In this article we present an architectural design and analysis of a parallel RISC image processor. The processor was based on PCI bus to speed up a range of imageprocessing operations. The other characteristic of the processor is that a new three-port hostbridge is integrated into the processor. The implementation of commonly used imageprocessing algorithms and their performance evaluation are also discussed.
As data volumes grow rapidly, distributed computations are widely employed in data-centers to provide cheap and efficient methods to process large-scale parallel datasets. Various computation models have been proposed...
详细信息
As data volumes grow rapidly, distributed computations are widely employed in data-centers to provide cheap and efficient methods to process large-scale parallel datasets. Various computation models have been proposed to improve the abstraction of distributed datasets and hide the details of parallelism. However, most of them follow the single-layer partitioning method, which limits developers to express a multi-level partitioning operation succinctly. To overcome the problem, we present the NDD (Nested distributed Dataset) data model. It is a more compact and expressive extension of Spark RDD (Resilient distributed Dataset), in order to remove the burden on developers to manually write the logic for multi-level partitioning cases. Base on the NDD model, we develop an open-source framework called Bigflow, which serves as an optimization layer over computation engines from most widely used processing frameworks. With the help of Bigflow, some advanced optimization techniques, which may only be applied by experienced programmers manually, are enabled automatically in a distributed data processing job. Currently, Bigflow is processing about 3 PB data volumes daily in the data-centers of Baidu. According to customer experience, it can significantly save code length and improve performance over the intuitive programming style.
As the algorithms that are used to reconstruct medical images from measurable projection data continue to become mature, medical image reconstruction has remained an interesting and important topic to medical research...
详细信息
ISBN:
(纸本)1932415262
As the algorithms that are used to reconstruct medical images from measurable projection data continue to become mature, medical image reconstruction has remained an interesting and important topic to medical researcher. The main challenge in medical image reconstruction is how to establish an economical and efficient computing system that can be used to perform. fast image reconstructions. In this paper, we present a distributed computing system that is based on P2P technologies;and demonstrate how the design and implementation of this system addresses the challenge through a case study of an iterative EM medical image reconstruction algorithm. Computational experiments are designed to study the performance of the EM algorithm using the system. In the aggregate, this study provides an insights into the large-scale computation of iterative medical image reconstruction in a Grid environment.
At present, with the rapid development of big data processing technology, streaming data processing and real-time data analysis have gradually become new research hotspots. Both the industry and the academia have inve...
详细信息
ISBN:
(纸本)9781538673089
At present, with the rapid development of big data processing technology, streaming data processing and real-time data analysis have gradually become new research hotspots. Both the industry and the academia have invested a lot of research into the efficient processingmethods of massive data generated in the environment such as the Internet and e-commerce. Meanwhile, high-performance computing technology and supercomputers are also looking for new business growth points. The convergence of big data processing and high performance computing technology is the general trend of big data analysis in the future. This paper will give a brief overview of typical technologies in the fusion process of big data processing and high-performance computing.
One major difficulty in designing an architecture for the parallel implementation of Discrete Wavelet Transform (DWT) is that the DWT is not a block transform. As a result, frequent communication has to be set up betw...
详细信息
ISBN:
(纸本)0819433039
One major difficulty in designing an architecture for the parallel implementation of Discrete Wavelet Transform (DWT) is that the DWT is not a block transform. As a result, frequent communication has to be set up between processors to exchange data so that correct boundary wavelet coefficients can be computed. The significant communication overhead thus hampers the improvement of the efficiency of parallel systems, specially for processor networks with large communication latencies. In this paper we propose a new technique, called Boundary Postprocessing, that allows the correct transform of boundary samples. The basic idea is to model the DWT as a Finite State Machine (FSM) based on the lifting factorization of the wavelet filterbanks. Application of this technique leads to a new parallel DWT architecture, Sg,lit-and-Merge, which requires data to be communicated only once between neighboring processors for any arbitrary level of wavelet decompositions. Example designs and performance analysis for 1D and 2D DWT show that the proposed technique can greatly reduce the interprocessor communication overhead. As an example, in a two-processor case our proposed approach shows an average speedup of about 30% as compared to best currently available parallel computation.
暂无评论