Neural networks are used for many real world applications, but often they have problems estimating their own confidence. This is particularly problematic for computervision applications aimed at making high stakes de...
详细信息
ISBN:
(纸本)9781665448994
Neural networks are used for many real world applications, but often they have problems estimating their own confidence. This is particularly problematic for computervision applications aimed at making high stakes decisions with humans and their lives. In this paper we make a meta-analysis of the literature, showing that most if not all computervision applications do not use proper epistemic uncertainty quantification, which means that these models ignore their own limitations. We describe the consequences of using models without proper uncertainty quantification, and motivate the community to adopt versions of the models they use that have proper calibrated epistemic uncertainty, in order to enable out of distribution detection. We close the paper with a summary of challenges on estimating uncertainty for computervision applications and recommendations.
As a novelty, in this paper we present an event-based stereo vision matching approach based on time-correlation using segmentation to restrict the matching process to active image areas, exploiting the event-driven be...
详细信息
ISBN:
(纸本)9781479943098
As a novelty, in this paper we present an event-based stereo vision matching approach based on time-correlation using segmentation to restrict the matching process to active image areas, exploiting the event-driven behavior of a silicon retina sensor. Stereo matching is used in depth generating camera systems for solving the correspondence problem and reconstructing 3D data. Using conventionally frame-based cameras, this correspondence problem is a time consuming and computationally expensive task. To overcome this issue, embedded systems can be used to speed up the calculation of stereo matching results. The silicon retina delivers asynchronous events if the illumination changes instead of synchronous intensity or color images. It provides sparse input data and therefore the output of the stereo vision algorithm (depth map) is also sparse. The high temporal resolution of such event-driven sensors leads to high data rates. To handle these and the correspondence problem in real time, we implemented our stereo matching algorithm for a field programmable gate array (FPGA). The results show that our matching criterion, based on the time of occurrence of an event, leads to a small average distance error and the parallel hardware architecture and efficient memory utilization results in a frame rate of up to 1140fps.
Several papers addressed ellipse detection as a first step for several computervision applications, but most of the proposed solutions are too slow to be applied in real time on large images or with limited hardware ...
详细信息
ISBN:
(纸本)9780769549903
Several papers addressed ellipse detection as a first step for several computervision applications, but most of the proposed solutions are too slow to be applied in real time on large images or with limited hardware resources, as in the case of mobile devices. This demo is based on a novel algorithm for fast and accurate ellipse detection. The proposed algorithm relies on a careful selection of arcs which are candidate to form ellipses and on the use of Hough transform to estimate parameters in a decomposed space. The demo will show it working on a commercial smart-phone.
Event-based cameras, also known as neuromorphic cameras, are bioinspired sensors able to perceive changes in the scene at high frequency with low power consumption. Becoming available only very recently, a limited amo...
详细信息
ISBN:
(数字)9781728125060
ISBN:
(纸本)9781728125060
Event-based cameras, also known as neuromorphic cameras, are bioinspired sensors able to perceive changes in the scene at high frequency with low power consumption. Becoming available only very recently, a limited amount of work addresses object detection on these devices. In this paper we propose two neural networks architectures for object detection: YOLE, which integrates the events into surfaces and uses a frame-based model to process them, and fcYOLE, an asynchronous event-based fully convolutional network which uses a novel and generalformalization of the convolutional and max pooling layers to exploit the sparsity of camera events. We evaluate the algorithm with different extensions of publicly available datasets, and on a novel synthetic dataset.
Lane feature extraction is one of the key computational steps in lane analysis systems. In this paper, we propose a lane feature extraction method, which enables different configurations of embedded solutions that add...
详细信息
ISBN:
(纸本)9780769549903
Lane feature extraction is one of the key computational steps in lane analysis systems. In this paper, we propose a lane feature extraction method, which enables different configurations of embedded solutions that address both accuracy and embedded systems' constraints. The proposed lane feature extraction process is evaluated in detail using real world lane data, to explore its effectiveness for embedded realization and adaptability to varying contextual information like lane types and environmental conditions.
We present an approach to perform supervised action recognition in the dark. In this work, we present our results on the ARID dataset[60]. Most previous works only evaluate performance on large, well illuminated datas...
详细信息
ISBN:
(纸本)9781665448994
We present an approach to perform supervised action recognition in the dark. In this work, we present our results on the ARID dataset[60]. Most previous works only evaluate performance on large, well illuminated datasets like Kinetics and HMDB51. We demonstrate that our work is able to achieve a very low error rate while being trained on a much smaller dataset of dark videos. We also explore a variety of training and inference strategies including domain transfer methodologies and also propose a simple but useful frame selection strategy. Our empirical results demonstrate that we beat previously published baseline models by 11%.
Building footprints (BFP) provide useful visual context for users of digital maps when navigating in space. This paper proposes a method for extracting and symbolizing building footprints from satellite imagery using ...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Building footprints (BFP) provide useful visual context for users of digital maps when navigating in space. This paper proposes a method for extracting and symbolizing building footprints from satellite imagery using a convolutional neural network (CNN). The CNN architecture outputs rotated rectangles, providing a symbolized approximation that works well for small buildings. Experiments are conducted on the four cities in the DeepGlobe Challenge dataset (Las Vegas, Paris, Shanghai, Khartoum). Our method performs best on suburbs consisting of individual houses. These experiments show that either large buildings or buildings without clear delineation produce weaker results in terms of precision and recall.
We describe an efficient method of improving the performance of vision algorithms operating on video streams by reducing the amount of data captured and transferred from image sensors to analysis servers in a data-awa...
详细信息
ISBN:
(纸本)9781728193601
We describe an efficient method of improving the performance of vision algorithms operating on video streams by reducing the amount of data captured and transferred from image sensors to analysis servers in a data-aware manner. The key concept is to combine guided, highly heterogeneous sampling with an intelligent Scene Cache. This enables the system to adapt to spatial and temporal patterns in the scene, thus reducing redundant data capture and processing. A software prototype of our framework running on a general-purpose embedded processor enables superior object detection accuracy (by 56%) at similar energy consumption (slight improvement of 4%) compared to an H.264 hardware accelerator.
Motion segmentation is a technique to detect and localize class-agnostic motion in videos. This motion is assumed to be relative to a stationary background and usually originates from objects such as vehicles or human...
详细信息
ISBN:
(纸本)9781665448994
Motion segmentation is a technique to detect and localize class-agnostic motion in videos. This motion is assumed to be relative to a stationary background and usually originates from objects such as vehicles or humans. When the camera moves, too, frame differencing approaches that do not have to model the stationary background over minutes, hours, or even days are more promising compared to background subtraction methods. In this paper, we propose a Deep Convolutional Neural Network (DCNN) for multi-modal motion segmentation: the current image contributes with appearance information to distinguish between relevant and irrelevant motion and frame differencing captures the temporal information, which is the scene's motion independent of the camera motion. We fuse this information to receive an effective and efficient approach for robust motion segmentation. The effectiveness is demonstrated using the multi-spectral CDNet-2014 dataset that we re-labeled for motion segmentation. We specifically show that we can detect tiny moving objects significantly better compared to methods based on optical flow.
This paper introduces a novel dataset for video enhancement and studies the state-of-the-art methods of the NTIRE 2021 challenge on quality enhancement of compressed video. The challenge is the first NTIRE challenge i...
详细信息
ISBN:
(纸本)9781665448994
This paper introduces a novel dataset for video enhancement and studies the state-of-the-art methods of the NTIRE 2021 challenge on quality enhancement of compressed video. The challenge is the first NTIRE challenge in this direction, with three competitions, hundreds of participants and tens of proposed solutions. Our newly collected Large-scale Diverse Video (LDV) dataset is employed in the challenge. In our study, we analyze the solutions of the challenges and several representative methods from previous literature on the proposed LDV dataset. We find that the NTIRE 2021 challenge advances the state-of-theart of quality enhancement on compressed video.
暂无评论