The current standard for diagnosing liver tumors is contrast-enhanced multiphase computed tomography. On this basis, several software tools have been developed by different research groups worldwide to support physici...
详细信息
The current standard for diagnosing liver tumors is contrast-enhanced multiphase computed tomography. On this basis, several software tools have been developed by different research groups worldwide to support physicians for example in measuring remnant liver volume, analyzing tumors, and planning resections. Several algorithms have been developed to perform these tasks. Most of the time, the segmentation of the liver is at the beginning of the processing chain. Therefore, a vast amount of CT-based liver segmentation algorithms have been developed. However, clinics slowly move from CT as the current gold standard for diagnosing liver diseases towards magnetic resonance imaging. In this work, we utilize a Probabilistic Active Shape Model with an MR specific preprocessing and appearance model to segment the liver in contrast enhanced MR images. Evaluation is based on 8 clinical datasets.
We propose a real-time polygon reduction method for online reproduction of a three-dimensional spatial model using imageprocessing. Currently, Microsoft Kinect is a popular device for capturing a wide-area and detail...
详细信息
We propose a real-time polygon reduction method for online reproduction of a three-dimensional spatial model using imageprocessing. Currently, Microsoft Kinect is a popular device for capturing a wide-area and detailed depth image. When producing 3D data reconstruction from the depth image and the RGB image as well as when transmitting 3D data in realtime, it is important to reduce the data size. In this paper, we introduce a polygon reduction method that uses line detection for architectural surfaces and their joint lines in an RGB image. The system discards most of the depth information, leaving the representative value of the surface with part of the RGB image. The remaining data are used to reconstruct simple polygon data drawn by point-cloud or texture mappings. From the operation test of our proposed method, it is confirmed that the polygon reduction could reduce data without increasing the time duration of 3D reconstruction.
Although the expectation maximization (EM)based 3D computed tomography (CT) reconstruction algorithm lowers radiation exposure, its long execution time hinders practical usage. To accelerate this process, we introduce...
详细信息
Although the expectation maximization (EM)based 3D computed tomography (CT) reconstruction algorithm lowers radiation exposure, its long execution time hinders practical usage. To accelerate this process, we introduce a novel external memory bandwidth reduction strategy by reusing both the sinogram and the voxel intensity. Also, a customized computing engine based on field-programmable gate array (FPGA) is presented to increase the effective memory bandwidth. Experiments on actual patient data show that 85X speedup can be achieved over single-threaded CPU.
This paper presents a GPU-based system for real-time traffic sign detection and recognition which can classify 48 different traffic signs included in the library. The proposed design implementation has three stages: p...
详细信息
ISBN:
(纸本)9781479944965
This paper presents a GPU-based system for real-time traffic sign detection and recognition which can classify 48 different traffic signs included in the library. The proposed design implementation has three stages: pre-processing, feature extraction and classification. For high-speed processing, we propose a window-based histogram of gradient algorithm that is highly optimized for parallel processing on a GPU. For detecting signs in various sizes, the processing was applied at 32 scale levels. For more accurate recognition, multiple levels of supported vector machines are employed to classify the traffic signs. The proposed system can process 27.9 frames per second video with active pixels of 1,628 × 1,236 resolution. Evaluating using the BelgiumTS dataset, the experimental results show the detection rate is about 91.69% with false positives per window of 3.39 × 10 -5 and the recognition rate is about 93.77%.
This paper compares several computational approaches to Synthetic Aperture Sequential Beamforming (SASB) targeting consumer level parallel processors such as multi-core CPUs and GPUs. The proposed implementations demo...
详细信息
This paper compares several computational approaches to Synthetic Aperture Sequential Beamforming (SASB) targeting consumer level parallel processors such as multi-core CPUs and GPUs. The proposed implementations demonstrate that ultrasound imaging using SASB can be executed in real-time with a significant headroom for post-processing. The CPU implementations are optimized using Single Instruction Multiple Data (SIMD) instruction extensions and multithreading, and the GPU computations are performed using the APIs, OpenCL and OpenGL. The implementations include refocusing (dynamic focusing) of a set of fixed focused scan lines received from a BK Medical UltraView 800 scanner and subsequent imageprocessing for B-mode imaging and rendering to screen. The benchmarking is performed using a clinically evaluated imaging setup consisting of 269 scan lines × 1472 complex samples (1.58 MB per frame, 16 frames per second) on an Intel Core i7 2600 CPU with an AMD HD7850 and a NVIDIA GTX680 GPU. The fastest CPU and GPU implementations use 14% and 1.3% of the real-time budget of 62 ms/frame, respectively. The maximum achieved processing rate is 1265 frames/s.
In this work we have parallelized the Maximum Likelihood Expectation-Maximization (MLEM) and Ordered Subset Expectation Maximization (OSEM) algorithms for improving efficiency of reconstructions of multiple pinholes S...
详细信息
ISBN:
(纸本)9781479960989
In this work we have parallelized the Maximum Likelihood Expectation-Maximization (MLEM) and Ordered Subset Expectation Maximization (OSEM) algorithms for improving efficiency of reconstructions of multiple pinholes SPECT, and cone-bean CT data. We implemented the parallelized versions of the algorithms on a General Purpose Graphic processing Unit (GPGPU): 448 cores of a NVIDIA Tesla M2070 GPU with 6GB RAM per thread of computing. We compared their run times against those from the corresponding CPU implementations running on 8 cores CPU of an AMD Opteron 6128 with 32 GB RAM. We have further shown how an optimization of thread balancing can accelerate the speed of the GPU implementation.
The 3D FFT is critical in many physical simulations and imageprocessing applications. On FPGAs, however, the 3D FFT was thought to be inefficient relative to other methods such as convolution-based implementations of...
详细信息
The 3D FFT is critical in many physical simulations and imageprocessing applications. On FPGAs, however, the 3D FFT was thought to be inefficient relative to other methods such as convolution-based implementations of multigrid. We find the opposite: a simple design, operating at a conservative frequency, takes 4μs for 16 3 , 21μs for 32 3 , and 215μs for 64 3 single precision data points. The first two of these compare favorably with the 25μs and 29μs obtained running on a current Nvidia GPU. Some broader significance is that this is a critical piece in implementing a large scale FPGA-based MD engine: even a single FPGA is capable of keeping the FFT off of the critical path for a large fraction of possible MD simulations.
We study and characterize the performance of operations in an important class of applications on GPUs and Many Integrated Core (MIC) architectures. Our work is motivated by applications that analyze low-dimensional sp...
详细信息
We study and characterize the performance of operations in an important class of applications on GPUs and Many Integrated Core (MIC) architectures. Our work is motivated by applications that analyze low-dimensional spatial datasets captured by high resolution sensors, such as image datasets obtained from whole slide tissue specimens using microscopy scanners. Common operations in these applications involve the detection and extraction of objects (object segmentation), the computation of features of each extracted object (feature computation), and characterization of objects based on these features (object classification). In this work, we have identify the data access and computation patterns of operations in the object segmentation and feature computation categories. We systematically implement and evaluate the performance of these operations on modern CPUs, GPUs, and MIC systems for a microscopy image analysis application. Our results show that the performance on a MIC of operations that perform regular data access is comparable or sometimes better than that on a GPU. On the other hand, GPUs are significantly more efficient than MICs for operations that access data irregularly. This is a result of the low performance of MICs when it comes to random data access. We also have examined the coordinated use of MICs and CPUs. Our experiments show that using a performance aware task strategy for scheduling application operations improves performance about 1.29× over a first-come-first-served strategy. This allows applications to obtain high performance efficiency on CPU-MIC systems - the example application attained an efficiency of 84% on 192 nodes (3072 CPU cores and 192 MICs).
We present initial results from a new image generation approach for low-latency displays such as those needed in head-worn AR devices. Avoiding the usual video interfaces, such as HDMI, we favor direct control of the ...
详细信息
We present initial results from a new image generation approach for low-latency displays such as those needed in head-worn AR devices. Avoiding the usual video interfaces, such as HDMI, we favor direct control of the internal display technology. We illustrate our new approach with a bench-top optical see-through AR proof-of-concept prototype that uses a Digital Light processing (DLP TM ) projector whose Digital Micromirror Device (DMD) imaging chip is directly controlled by a computer, similar to the way random access memory is controlled. We show that a perceptually-continuous-tone dynamic gray-scale image can be efficiently composed from a very rapid succession of binary (partial) images, each calculated from the continuous-tone image generated with the most recent tracking data. As the DMD projects only a binary image at any moment, it cannot instantly display this latest continuous-tone image, and conventional decomposition of a continuous-tone image into binary time-division-multiplexed values would induce just the latency we seek to avoid. Instead, our approach maintains an estimate of the image the user currently perceives, and at every opportunity allowed by the control circuitry, sets each binary DMD pixel to the value that will reduce the difference between that user-perceived image and the newly generated image from the latest tracking data. The resulting displayed binary image is “neither here nor there,” but always approaches the moving target that is the constantly changing desired image, even when that image changes every 50μs. We compare our experimental results with imagery from a conventional DLP projector with similar internal speed, and demonstrate that AR overlays on a moving object are more effective with this kind of low-latency display device than with displays of similar speed that use a conventional video interface.
Recognition task is a hard problem due to the high dimension of input image data. The principal component analysis (PCA) is the one of the most popular algorithms for reducing the dimensionality. The main constraint o...
详细信息
Recognition task is a hard problem due to the high dimension of input image data. The principal component analysis (PCA) is the one of the most popular algorithms for reducing the dimensionality. The main constraint of PCA is the execution time in terms of updating when new data is included; therefore, parallel computation is needed. Opening the GPU architectures to general purpose computation allows performing parallel computation on a powerful platform. In this paper the modified version of fast PCA (MFPCA) algorithm is presented on the GPU architecture and also the suitability of the algorithm for face recognition task is discussed. The performance and efficiency of MFPCA algorithm is studied on large-scale datasets. Experimental results show a decrease of the MFPCA algorithm execution time while preserving the quality of the results.
暂无评论