检索结果-内蒙古大学图书馆

12th International Conference on parallel processing and Applied Mathematics (PPAM)

作者： Tokura, Hiroki Fujita, Toru Nakano, Koji Ito, Yasuaki Hiroshima Univ Dept Informat Engn Kagamiyama 1-4-1 Higashihiroshima 7398527 Japan

ISBN: (纸本)9783319780542;9783319780535

The row-wise and column-wise prefix-sum computation of a matrix has many applications in the area of image processing such as computation of the summed area table and the Euclidean distance map. It is known that the prefix-sums of a 1-dimensional array can be computed efficiently on the GPU. Hence, the row-wise prefix-sums of a matrix can also be computed efficiently on the GPU by executing this prefix-sum algorithm for every row in parallel. However, the same approach does not work well for computing the column-wise prefix-sums, because inefficient stride memory access to the global memory is performed. The main contribution of this paper is to present an almost optimal column-wise prefix-sum algorithm on the GPU. Since all elements in an input matrix must be read and the resulting prefix-sums must be written, computation of the column-wise prefix-sums cannot be faster than simple matrix duplication in the global memory of the GPU. Quite surprisingly, experimental results using NVIDIA TITAN x show that our column-wise prefix-sum algorithm runs only 2-6% slower than matrix duplication. Thus, our column-wise prefix-sum algorithm is almost optimal.

关键词： Prefix computation parallel algorithms GPU CUDA

来源：评论

学校读者我要写书评

暂无评论

Deep learning for medical image segmentation {using the IBM TrueNorth Neurosynaptic System

Deep learning for medical image segmentation {using the IBM ...

引用

Conference on Medical imaging - imaging Informatics for Healthcare, Research, and applications

作者： Moran, Steven Gaonkar, Bilwaj Whitehead, William Wolk, Aidan Macyszyn, Luke Iyer, Subramanian S. Univ Calif Los Angeles Ctr Heterogeneous Integrat & Performance Scaling Engn 4 420 Westwood Plaza Los Angeles CA 90095 USA Univ Calif Los Angeles Dept Neurosurg 300 Stein Plaza Driveway Los Angeles CA 90095 USA

ISBN: (数字)9781510616486

ISBN: (纸本)9781510616486

Deep convolutional neural networks have found success in semantic image segmentation tasks in computer vision and medical imaging. These algorithms are executed on conventional von Neumann processor architectures or GPUs. This is suboptimal. Neuromorphic processors that replicate the structure of the brain are better-suited to train and execute deep learning models for image segmentation by relying on massively-parallel processing. However, given that they closely emulate the human brain, on-chip hardware and digital memory limitations also constrain them. Adapting deep learning models to execute image segmentation tasks on such chips, requires specialized training and validation. In this work, we demonstrate for the first-time, spinal image segmentation performed using a deep learning network implemented on neuromorphic hardware of the IBM TrueNorth Neurosynaptic System and validate the performance of our network by comparing it to human-generated segmentations of spinal vertebrae and disks. To achieve this on neuromorphic hardware, the training model constrains the coefficients of individual neurons to {-1,0,1} using the Energy Efficient Deep Neuromorphic (EEDN)1 networks training algorithm. Given the similar to 1 million neurons and 256 million synapses, the scale and size of the neural network implemented by the IBM TrueNorth allows us to execute the requisite mapping between segmented images and non-uniform intensity MR images >20 times faster than on a GPU-accelerated network and using <0.1 W. This speed and efficiency implies that a trained neuromorphic chip can be deployed in intra-operative environments where real-time medical image segmentation is necessary.

关键词： Neuromorphic computing deep learning MR imaging semantic image segmentation IBM TrueNorth Neurosynaptic System

来源：评论

学校读者我要写书评

暂无评论

image processing and Pattern Recognition Based on parallel Shift Technology First edition

引用

2018年

作者： Bilan, Stepan Yuzhakov, Sergey

ISBN: (数字)9781351778572

ISBN: (纸本)9781138712263

This book describes the methods and algorithms for image pre-processing and recognition. These methods are based on a parallel shift technology of the imaging copy, as well as simple mathematical operations to allow the generation of a minimum set of features to describe and recognize the image. This book also describes the theoretical foundations of parallel shift technology and pattern recognition. Based on these methods and theories, this book is intended to help researchers with artificial intelligence systems design, robotics, and developing software and hardware applications.

关键词： computer engineering

来源：评论

学校读者我要写书评

暂无评论

Characterizing the Reconfiguration Latency of image Sensor Resolution on Android Devices 18

Characterizing the Reconfiguration Latency of Image Sensor R...

引用

19th International Workshop on Mobile Computing systems and applications (HotMobile)

作者： Hu, Jinhan Yang, Jianan Delhivala, Vraj LiKamWa, Robert Arizona State Univ Tempe AZ 85281 USA

ISBN: (纸本)9781450356305

Advances in vision processing have ignited a proliferation of mobile vision applications, including augmented reality. However, limited by the inability to rapidly reconfigure sensor operation for performance-efficiency tradeoffs, high power consumption causes vision applications to drain the device's battery. To explore the potential impact of enabling rapid reconfiguration, we use a case study around marker-based pose estimation to understand the relationship between image frame resolution, task accuracy, and energy efficiency. Our case study motivates that to balance energy efficiency and task accuracy, the application needs to dynamically and frequently reconfigure sensor resolution. To explore the latency bottlenecks to sensor resolution reconfiguration, we define and profile the end-to-end reconfiguration latency and frame-to-frame latency of changing capture resolution on a Google LG Nexus 5x device. We identify three major sources of sensor resolution reconfiguration latency in current Android systems: (i) sequential configuration patterns, (ii) expensive system calls, and (iii) imaging pipeline delay. Based on our intuitions, we propose a redesign of the Android camera system to mitigate the sources of latency. Enabling smooth transitions between sensor configurations will unlock new classes of adaptive-resolution vision applications.

关键词： image sensor Camera System Reconfiguration Mobile devices Operating system optimization

来源：评论

学校读者我要写书评

暂无评论

3D/4D hybrid spectral domain synthetic aperture image reconstruction method for hand-held ultrasound systems

3D/4D hybrid spectral domain synthetic aperture image recons...

引用

IEEE International Ultrasonics Symposium

作者： Yuriy Tasinkevych Marcin Lewandowski Mateusz Walczak Department of Ultrasound Institute of Fundamental Technological Research PAS Warsaw Poland Laboratory of Professional Electronics Institute of Fundamental Technological Research PAS Warsaw Poland

In the last few decades 3D/4D ultrasonography has been gaining increasing popularity not only as a scientific research topic but also as a new modality of medical imaging in clinical applications. However, design and implementation of 3D/4D device for high quality ultrasound imaging within portable, handheld systems is a technological challenge. Design of transmit/receive (Tx/Rx) electronics for efficient operation with 2D array transducers, comprised of thousands of elements, enormous amount of input/output data that must be transferred and processed, power consumption limitation are just a few of the difficulties that arise. No less important is development of reliable and numerically efficient algorithms for 3D/4D imaging which should take all these restrictions into account. The main objective of this paper is to present a new hybrid spectral domain imaging (HSDI) method that delivers an original and innovative solution for the technical limitations of modern ultrasonography 3D/4D. The developed image reconstruction method is based on the plane-wave insonification (PWI) with sub-aperture data acquisition combined with frequency domain (FD) data processing. The performance of the method was tested using the Field ii simulated acoustic data of 3D cyst phantom. For a 3D low-resolution image (LRI) comprised of 64×64×512 pixels the proposed HSDI method is about 100 times faster, in the case of a single 3D, than its counterpart based on the PWI synthetic aperture time domain (TD) method for a single Tx/Rx event. On the other hand, the frame rate increase is proportional to the number of sub-apertures used for a single high-resolution image (HRI) synthesis.

关键词： Three-dimensional displays Two dimensional displays Transducers image reconstruction imaging Ultrasonic imaging Apertures

来源：评论

学校读者我要写书评

暂无评论

Simplifying the multi-GPU programming of a hyperspectral image registration algorithm

Simplifying the multi-GPU programming of a hyperspectral ima...

引用

International Conference on High Performance Computing & Simulation (HPCS)

作者： Jorge Fernàndez-Fabeiro Arturo Gonzalez-Escribano Diego R. Llanos Departamento de Informaticá Universidad de Valladolid Valladolid Spain

ISBN: (数字)9781728144849

ISBN: (纸本)9781728144856

Hyperspectral image registration is a relevant task for real-time applications like environmental disasters management or search and rescue scenarios. Traditional algorithms for this problem were not really devoted to real-time performance. The HYFMGPU algorithm arose as a high-performance GPU-based solution to solve such a lack. Nevertheless, a single-GPU solution is not enough, as sensors are evolving and then generating images with finer resolutions and wider wavelength ranges. An MPI+CUDA multi-GPU implementation of HYFMGPU was previously presented. However, this solution shows the programming complexity of combining MPI with an accelerator programming model. In this paper we present a new and more abstract programming approach for this type of applications, which provides a high efficiency while simplifying the programming of the multi-device parts of the code. The solution uses Hitmap, a library to ease the programming of parallel applications based on distributed arrays. It uses a more algorithm-oriented approach than MPI, including abstractions for the automatic partition and mapping of arrays at runtime with arbitrary granularity, as well as techniques to build flexible communication patterns that transparently adapt to the data partitions. We show how these abstractions apply to this application class. We present a comparison of development effort metrics between the original MPI implementation and the one based on Hitmap, with reductions of up to 95% for the Halstead score in specific work redistribution steps. We finally present experimental results showing that these abstractions are internally implemented in a high efficient way that can reduce the overall performance time in up to 37% comparing with the original MPI implementation.

关键词： Programming Graphics processing units Hyperspectral imaging Real-time systems Libraries Principal component analysis

来源：评论

学校读者我要写书评

暂无评论

Advances in Motion Estimators for applications in Computer Vision

Advances in Motion Estimators for Applications in Computer V...

引用

作者： Kanberoglu, Berkay Arizona State University

学位级别：Ph.D.

Motion estimation is a core task in computer vision and many applications utilize optical flow methods as fundamental tools to analyze motion in images and videos. Optical flow is the apparent motion of objects in image sequences that results from relative motion between the objects and the imaging perspective. Today, optical flow fields are utilized to solve problems in various areas such as object detection and tracking, interpolation, visual odometry, etc. In this dissertation, three problems from different areas of computer vision and the solutions that make use of modified optical flow methods are explained. The contributions of this dissertation are approaches and frameworks that introduce i) a new optical flow-based interpolation method to achieve minimally divergent velocimetry data, ii) a framework that improves the accuracy of change detection algorithms in synthetic aperture radar (SAR) images, and iii) a set of new methods to integrate Proton Magnetic Resonance Spectroscopy (1HMRSI) data into threedimensional (3D) neuronavigation systems for tumor biopsies. In the first application an optical flow-based approach for the interpolation of minimally divergent velocimetry data is proposed. The velocimetry data of incompressible fluids contain signals that describe the flow velocity. The approach uses the additional flow velocity information to guide the interpolation process towards reduced divergence in the interpolated data. In the second application a framework that mainly consists of optical flow methods and other image processing and computer vision techniques to improve object extraction from synthetic aperture radar images is proposed. The proposed framework is used for distinguishing between actual motion and detected motion due to misregistration in SAR image sets and it can lead to more accurate and meaningful change detection and improve object extraction from a SAR datasets. In the third application a set of new methods that aim to improve upon t

关键词：

来源：评论

学校读者我要写书评

暂无评论

GPU Implementation of Bitplane Coding with parallel Coefficient processing for High Performance image Compression

引用

IEEE TRANSACTIONS ON parallel AND DISTRIBUTED systems 2017年第8期28卷 2272-2284页

作者： Enfedaque, Pablo Auli-Llinas, Francesc Moure, Juan Carlos Univ Autonoma Barcelona Dept Informat & Commun Engn E-08193 Barcelona Spain Univ Autonoma Barcelona Dept Comp Architecture & Operating Syst E-08193 Barcelona Spain

The fast compression of images is a requisite in many applications like TV production, teleconferencing, or digital cinema. Many of the algorithms employed in current image compression standards are inherently sequential. High performance implementations of such algorithms often require specialized hardware like field integrated gate arrays. Graphics processing Units (GPUs) do not commonly achieve high performance on these algorithms because they do not exhibit fine-grain parallelism. Our previous work introduced a new core algorithm for wavelet-based image coding systems. It is tailored for massive parallel architectures. It is called bitplane coding with parallel coefficient processing (BPC-PaCo). This paper introduces the first high performance, GPU-based implementation of BPC-PaCo. A detailed analysis of the algorithm aids its implementation in the GPU. The main insights behind the proposed codec are an efficient thread-to-data mapping, a smart memory management, and the use of efficient cooperation mechanisms to enable inter-thread communication. Experimental results indicate that the proposed implementation matches the requirements for high resolution (4 K) digital cinema in real time, yielding speedups of 30x with respect to the fastest implementations of current compression standards. Also, a power consumption evaluation shows that our implementation consumes 40x less energy for equivalent performance than state-of-the-art methods.

关键词： image coding SIMD computing graphics processing unit (GPU) compute unified device architecture (CUDA)

来源：评论

学校读者我要写书评

暂无评论

Accelerating CNN Algorithm with Fine-grained Dataflow Architectures 20

Accelerating CNN Algorithm with Fine-grained Dataflow Archit...

引用

20th IEEE International Conference on High Performance Computing and Communications (HPCC) / 16th IEEE International Conference on Smart City (SmartCity) / 4th IEEE International Conference on Data Science and systems (DSS)

作者： xiang, Taoran Feng, Yujing Ye, xiaochun Tan, xu Li, Wenming Zhu, Yatao Wu, Meng Zhang, Hao Fan, Dongrui Chinese Acad Sci ICT State Key Lab Comp Architecture Beijing Peoples R China UCAS Sch Comp & Control Engn Beijing Peoples R China

ISBN: (纸本)9781538666142

Convolutional Neural Network(CNN) is a hot and state-of-the-art algorithm which is widely used in applications such as face recognition, intelligent monitoring, image recognition and text recognition. Because of its high computational complexity, many efficient hardware accelerators have been proposed to exploit high degree of parallel processing for CNN. However, accelerators which are implemented on FPGAs and ASICs usually sacrifice generality for higher performance and lower power consumption. Other accelerators, such as GPUs, are general enough, but they lead to higher power consumption. Fine-grained dataflow architectures, which break conventional Von Neumann architectures, show natural advantages in processing CNN-like algorithms with high computational efficiency and low power consumption. At the same time, it remains broadly applicable and adaptable. In this paper, we propose a scheme for implementing and optimizing CNN on fine-grained dataflow architecture based accelerators. The experiment results reveal that by using our scheme, the performance of AlexNet running on the dataflow accelerator is 3.11x higher than that on NVIDIA Tesla K80, and the power consumption of our hardware is 8.52x lower than that of K80.

关键词： fine-grained dataflow Convolutional Neural Network general accelerator data reuse high parallel

来源：评论

学校读者我要写书评

暂无评论

Gabor optical coherence tomographic angiography (GOCTA) (Part I): human retinal imaging &ITin vivo&IT

引用

BIOMEDICAL OPTICS ExPRESS 2017年第12期8卷 5724-5734页

作者： Chen, Chaoliang Yang, Victor x. D. Ryerson Univ Dept Elect & Comp Engn Biophoton & Bioengn Lab Toronto ON Canada Sunnybrook Hlth Sci Ctr Div Neurosurg Toronto ON Canada Univ Toronto Fac Med Div Neurosurg Toronto ON Canada

Recently, parallel high A-line speed and wide field imaging for optical coherence tomography angiography (OCTA) has become more prevalent, resulting in a dramatic increase of data quantity which poses a challenge for real time imaging even for GPU in data processing. In this manuscript, we propose a new OCTA processing technique, Gabor optical coherence tomographic angiography (GOCTA), for label-free human retinal angiography imaging. In spectral domain optical coherence tomography (SDOCT), k-space resampling and Fourier transform (FFT) are required for the entire data set of interference fringes to calculate blood flow information in previous OCTA algorithms, which are computationally intensive. As adults' eye anterior-posterior radii are nearly constant, only 3 A-scan lines need to be processed to obtain the gross orientation of the retina by using a sphere model. Subsequently, the en face microvascular images can be obtained by using the GOCTA algorithm from interference fringes directly without the steps of k-space resampling, numerical dispersion compensation, FFT. and maximum (mean) projection, resulting in a significant improvement of the data processing speed by 4 to 20 times faster than the existing methods. GOCTA is potentially suitable for SDOCT systems in en face preview applications requiring real-time microvascular imaging. (C) 2017 Optical Society of America

关键词： image processing image quality In vivo imaging Medical imaging Real time imaging Spectral domain optical coherence tomography

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：