We present a schematic for image edge-aware Gaus-sian GPU filtering which has linear complexity on the number of pixels of the image. It allows us to reduce the execution time as we increase the number of Streaming Mu...
详细信息
ISBN:
(数字)9781728192741
ISBN:
(纸本)9781728192758
We present a schematic for image edge-aware Gaus-sian GPU filtering which has linear complexity on the number of pixels of the image. It allows us to reduce the execution time as we increase the number of Streaming Multiprocessors (SMs) on the GPU. We make use of a domain transformation and use a complex-valued recursive formulation of the Gaussian filter. The algorithm partitions the image in disjoint regions, where we compute in parallel the filtering operations, avoiding communication between regions. Our implementation leads to a real-time solution using a modern GPU. With the RTX 2080 Ti, we achieved an execution time of less than 10 milliseconds for 2 filtering iterations on high-resolution RGB images of dimensions $2048\times 2048$.
Transmission electron tomography (TET) is a widely used biomedical imaging technique. Iterative reconstruction with regularization is the common approach to obtain high-quality reconstructed images for TET. Mumford-Sh...
详细信息
Transmission electron tomography (TET) is a widely used biomedical imaging technique. Iterative reconstruction with regularization is the common approach to obtain high-quality reconstructed images for TET. Mumford-Shah (MS) regularization has the advantage in preserving image edges; but it is computation and memory-intensive because it is NP-hard when applied to TET. In this work, we design an FPGA accelerator for iterative image reconstruction for TET with the MS regularization. We first design the accelerator with multiple processing elements (PE) to leverage the parallelism in the TET reconstruction. We then schedule the forward projection and back projection according to the imaging geometry of TET to optimize off-chip memory access. Finally, we optimize the local buffer with customized partitioning for the stencil memory access of basic operations to increase the throughput of a single PE. Our FPGA accelerator achieves 6.68X and 1.87X speedup with the same image quality compared with the same algorithm implemented on CPU and GPU, respectively. And the off-chip bandwidth requirements of forward and back projection are reduced by one to two orders of magnitude. Compared with the state-of-the-art acceleration works on GPU and FPGA, we achieve 1.18X and 1.91X throughput, respectively.
Many volumetric rendering algorithms use spatial 3D grids as the underlying data structure. Efficient representation, construction, and traversal of these grids are essential to achieve real-time performance, particul...
详细信息
ISBN:
(数字)9781728192741
ISBN:
(纸本)9781728192758
Many volumetric rendering algorithms use spatial 3D grids as the underlying data structure. Efficient representation, construction, and traversal of these grids are essential to achieve real-time performance, particularly for time-varying data such as in fluid simulations. In this paper, we present improvements on algorithms for building and traversing Bounding Volume Hierarchies (BVH) designed for sparse volumes. Our main insight was to simplify data layout representation by grouping voxels in buckets, preserving their spatiality using Morton codes, instead of using bricks, as current solutions use. Our solution does not use pointers nor stacks, allowing for its usage directly on computing shaders and provides, on average, 9.3x improvement in construction speed, compared with state-of-the-art approaches for Linear Bounding Volume Hierarchies (LBVH).
This work presents an online pipeline for incremental 3D reconstruction and 6-DoF camera pose estimation based on colored point clouds captured by consumer RGB-D cameras. The proposed approach combines geometric match...
详细信息
ISBN:
(数字)9781728192741
ISBN:
(纸本)9781728192758
This work presents an online pipeline for incremental 3D reconstruction and 6-DoF camera pose estimation based on colored point clouds captured by consumer RGB-D cameras. The proposed approach combines geometric matching provided by the point cloud with photometric matching provided by the color sensor through an adaptive weighting scheme that avoids eventual misalignment errors between RGB and depth data. Our experimental results indicate that the 3D reconstructions achieved by the proposed scheme are visually better or similar than a competitive approach.
In this paper, a low-cost single camera, single projector system that could be built in a desktop nail printer will be described. The usage of this system is to capture an image of a finger nail and to generate a 3D h...
详细信息
Convolutional neural networks (CNNs) have been used in several computer vision applications. However, most well-succeeded models are usually pre-trained on large labeled datasets. The adaptation of such models to new ...
详细信息
ISBN:
(数字)9781728192741
ISBN:
(纸本)9781728192758
Convolutional neural networks (CNNs) have been used in several computer vision applications. However, most well-succeeded models are usually pre-trained on large labeled datasets. The adaptation of such models to new applications (or datasets) with no label information might be an issue, calling for the construction of a suitable model from scratch. In this paper, we introduce an interactive method to estimate CNN filters from image markers with no need for backpropagation and pre-trained models. The method, named FLIM (feature learning from image markers), exploits the user knowledge about image regions that discriminate objects for marker selection. For a given CNN's architecture and user-drawn markers in an input image, FLIM can estimate the CNN filters by clustering marker pixels in a layer-by-layer fashion - i.e., the filters of a current layer are estimated from the output of the previous one. We demonstrate the advantages of FLIM for object delineation over alternatives based on a state-of-the-art pre-trained model and the Lab color space. The results indicate the potential of the method towards the construction of explainable CNN models.
Human action recognition has become one of the most active field of research in computer vision due to its wide range of applications, like surveillance, medical, industrial environments, smart homes, among others. Re...
详细信息
ISBN:
(数字)9781728192741
ISBN:
(纸本)9781728192758
Human action recognition has become one of the most active field of research in computer vision due to its wide range of applications, like surveillance, medical, industrial environments, smart homes, among others. Recently, deep learning has been successfully used to learn powerful and interpretable features for recognizing human actions in videos. Most of the existing deep learning approaches have been designed for processing video information as RGB image sequences. For this reason, a preliminary decoding process is required, since video data are often stored in a compressed format. However, a high computational load and memory usage is demanded for decoding a video. To overcome this problem, we propose a deep neural network capable of learning straight from compressed video. Our approach was evaluated on two public benchmarks, the UCF-101 and HMDB-51 datasets, demonstrating comparable recognition performance to the state-of-the-art methods, with the advantage of running up to 2 times faster in terms of inference speed.
Industry pipeline fault, like blockage can create major problems for engineers and financial loss for the company. The blockage detection is necessary for smooth functioning of an industry and safety of the environmen...
详细信息
ISBN:
(数字)9781728192741
ISBN:
(纸本)9781728192758
Industry pipeline fault, like blockage can create major problems for engineers and financial loss for the company. The blockage detection is necessary for smooth functioning of an industry and safety of the environment. This work presents a model for non-invasive inspection of pipes. It proposes the use of a neural network to identify the obstruction stage in fertilizer industry, using external thermal images obtained from the pipelines. A dataset capable of mapping the external thermal behavior in profile of the internal deposit is developed. The Multilayer Perceptron neural network was able to learn the thermal pixel mapping in a deposit profile, obtaining satisfactory results.
Automatic violence detection in video surveillance is crucial for social and personal security. Due to the massive video data produced by surveillance cameras installed in different environments like airports, trains,...
详细信息
ISBN:
(数字)9781728192741
ISBN:
(纸本)9781728192758
Automatic violence detection in video surveillance is crucial for social and personal security. Due to the massive video data produced by surveillance cameras installed in different environments like airports, trains, stadiums, schools, etc., traditional video monitoring by humans operators becomes inefficient. In this context, develop systems capable of detect automatically violent actions is a challenging task. This study describes a method to detect and localize violent acts in video surveillance using dynamic images, CNN's, and weakly supervised localization methods. Experimental results demonstrate the effectiveness of our approach when applied to three public benchmark datasets: Hockey Fight [1], Violent Flows [2], and UCFCrime2Loca1 [3].
The brazilian National Department of Transport Infrastructure (DNIT) maintains the National Traffic Counting Plan (PNCT). The main goal of PNCT is to evaluate the current flow of traffic on federal highways aiming to ...
详细信息
ISBN:
(数字)9781728192741
ISBN:
(纸本)9781728192758
The brazilian National Department of Transport Infrastructure (DNIT) maintains the National Traffic Counting Plan (PNCT). The main goal of PNCT is to evaluate the current flow of traffic on federal highways aiming to define public policies. However, DNIT still performs the quantitative classificatory surveys not automated or with invasive equipment. It is crucial for conducting traffic studies to search for more modern solutions to accomplish a higher number of automated non-invasive, and low-cost classificatory surveys. This paper proposes a system that uses YOLOv3 for object detection and the Deep SORT for multiple objects tracking algorithms. From the results over real-world videos collected in brazilian roads, we obtained a precision above 90 % in the global vehicle count. We also show that our proposal outperformed other previously proposed tools with 99.15% precision in public datasets. We believe this paper's proposal allows the development of a traffic analysis tool to be used for the automation of the volumetric traffic surveys, enabling to improve the DNIT agility and generating economy for the public coffers.
暂无评论