We present a novel approach for accelerating convolutions during inference for CPU-based architectures. The most common method of computation involves packing the image into the columns of a matrix (im2col) and perfor...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
We present a novel approach for accelerating convolutions during inference for CPU-based architectures. The most common method of computation involves packing the image into the columns of a matrix (im2col) and performing general matrix multiplication (GEMM) with a matrix of weights. This results in two main drawbacks: (a) im2col requires a large memory buffer and can experience inefficient memory access, and (b) while GEMM is highly optimized for scientific matrices multiplications, it is not well suited for convolutions. We propose an approach that takes advantage of scalar-matrix multiplication and reduces memory overhead. Our experiments with commonly used network architectures demonstrate a significant speedup compared to existing indirect methods.
Material recognition is researched in both computervision and vision science fields. In this paper, we investigated how humans observe material images and found the eye fixation information improves the performance o...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Material recognition is researched in both computervision and vision science fields. In this paper, we investigated how humans observe material images and found the eye fixation information improves the performance of material image classification models. We first collected eye-tracking data from human observers and used it to fine-tune a generative adversarial network for saliency prediction (SalGAN). We then fused the predicted saliency map with material images and fed them to CNN models for material classification. The experiment results show that the classification accuracy is improved than those using original images. This indicates that human's visual cues could benefit computational models as priors.
Live demonstration setup. (Left) The setup consists of a DAVIS346B event camera connected to a standard consumer laptop and undergoes some motion. (Right) The motion estimates are plotted in red and, for rotation-like...
详细信息
ISBN:
(纸本)9781665448994
Live demonstration setup. (Left) The setup consists of a DAVIS346B event camera connected to a standard consumer laptop and undergoes some motion. (Right) The motion estimates are plotted in red and, for rotation-like motions, the angular velocities provided by the camera IMU are also plotted in blue. This plot exemplifies an event camera undergoing large rotational motions (up to ~ 1000 deg/s) around the (a) x-axis, (b) y-axis and (c) z-axis. Overall, the incremental motion estimation method follows the IMU measurements. Optionally, the resultant global optical flow can also be shown, as well as the corresponding generated events by accumulating them onto the image plane (bottom left corner).
We introduce the first benchmark for a new problem - recognizing human action adverbs (HAA): "Adverbs Describing Human Actions" (ADHA). We demonstrate some key features of ADHA: a semantically complete set o...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
We introduce the first benchmark for a new problem - recognizing human action adverbs (HAA): "Adverbs Describing Human Actions" (ADHA). We demonstrate some key features of ADHA: a semantically complete set of adverbs describing human actions, a set of common, describable human actions, and an exhaustive labelling of simultaneously emerging actions in each video. We commit an in-depth analysis on the implementation of current effective models in action recognition and image captioning on adverb recognition, and the results reveal that such methods are unsatisfactory. Furthermore, we propose a novel three-stream hybrid model to tackle the HAA problem, which achieves better performances and receives relatively promising results.
A typical digital signal processor (DSP) uses hierarchical memory to handle the trade-off between cost and speed. It has a fast on-chip memory with data-access rates similar to the DSP's processing rate but it is ...
详细信息
ISBN:
(纸本)9780769549903
A typical digital signal processor (DSP) uses hierarchical memory to handle the trade-off between cost and speed. It has a fast on-chip memory with data-access rates similar to the DSP's processing rate but it is not large enough to hold the entire Image data. Image buffers typically reside in the larger external memory like DDR whose data access rate is similar to 4-6X slower than the processor rate. Cache or direct memory access (DMA) mechanisms are used to improve the slow access rate of external memory using the internal memory. Optimizing an embedded processing application to be efficient for such hierarchical memory systems requires block-based algorithm design. This is usually accomplished by manually re-designing the code. This effort requires several man months and DSP expertise. In this paper, we automate this process and demonstrate a performance improvement of similar to 2-4X over conventional frame level processing. We believe that the proposed solution is novel in the sense that it is fully automated and scalable to any memory size and speed. We use a compiler assisted parser to extract the relevant function parameters and use them to re-target the code to be block-based and handle memory management automatically. This is an offline code generation process with self-verification. We have implemented and tested the parser for Texas Instruments (TI) C6000 DSPs but the method is generic to work with any processor core.
We propose a simple yet effective proposal-free architecture for lidar panoptic segmentation. We jointly optimize both semantic segmentation and class-agnostic instance classification in a single network using a pilla...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
We propose a simple yet effective proposal-free architecture for lidar panoptic segmentation. We jointly optimize both semantic segmentation and class-agnostic instance classification in a single network using a pilla-rbased bird's-eye view representation. The instance classification head learns pairwise affinity between pillars to determine whether the pillars belong to the same instance or not. We further propose a local clustering algorithm to propagate instance ids by merging semantic segmentation and affinity predictions. Our experiments on nuScenes dataset show that our approach outperforms previous proposal-free methods and is comparable to proposal-based methods which requires extra annotation from object detection.
In this paper we present a unified formulation for a large class of relative pose problems with radial distortion and varying calibration. For minimal cases, we show that one can eliminate the number of parameters dow...
详细信息
ISBN:
(纸本)9781665448994
In this paper we present a unified formulation for a large class of relative pose problems with radial distortion and varying calibration. For minimal cases, we show that one can eliminate the number of parameters down to one to three. The relative pose can then be expressed using varying calibration constraints on the fundamental matrix, with entries that are polynomial in the parameters. We can then apply standard techniques based on the action matrix and Sturm sequences to construct our solvers. This enables efficient solvers for a large class of relative pose problems with radial distortion, using a common framework. We evaluate a number of these solvers for robust two-view inlier and epipolar geometry estimation, used as minimal solvers in RANSAC.
Previous research on localizing a target region in an image referred to by a natural language expression has occurred within an object-centric paradigm. However, in practice, there may not be any easily named or ident...
详细信息
ISBN:
(纸本)9781728193601
Previous research on localizing a target region in an image referred to by a natural language expression has occurred within an object-centric paradigm. However, in practice, there may not be any easily named or identifiable objects near a target location. Instead, references may need to rely on basic visual attributes, such as color or geometric clues. An expression like "a red something beside a blue vertical line" could still pinpoint a target location. As such, we begin to explore the open challenge of computational object-agnostic reference by constructing a novel dataset and by devising a new set of algorithms that can identify a target region in an image when given a referring expression containing only basic conceptual features.
This paper describes the third Affective Behavior Analysis in-the-wild (ABAW) Competition, held in conjunction with ieee International conference on computervision and patternrecognition (CVPR), 2022. The 3rd ABAW C...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
This paper describes the third Affective Behavior Analysis in-the-wild (ABAW) Competition, held in conjunction with ieee International conference on computervision and patternrecognition (CVPR), 2022. The 3rd ABAW Competition is a continuation of the Competitions held at ICCV 2021, ieee FG 2020 and ieee CVPR 2017 conferences, and aims at automatically analyzing affect. This year the Competition encompasses four Challenges: i) uni-task Valence-Arousal Estimation, ii) uni-task Expression Classification, iii) uni-task Action Unit Detection, and iv) MultiTask-Learning. All the Challenges are based on a common benchmark database, Aff-Wild2, which is a large scale in-the-wild database and the first one to be annotated in terms of valence-arousal, expressions and action units. In this paper, we present the four Challenges, with the utilized Competition corpora, we outline the evaluation metrics and present both the baseline systems and the top performing teams' per Challenge. Finally we illustrate the obtained results of the baseline systems and of all participating teams.
In this paper we present an extensive evaluation of instance segmentation in the context of images containing clothes. We propose a multi level evaluation that completes the classical overlapping criteria given by IoU...
详细信息
ISBN:
(纸本)9781665448994
In this paper we present an extensive evaluation of instance segmentation in the context of images containing clothes. We propose a multi level evaluation that completes the classical overlapping criteria given by IoU. In particular, we quantify both the contour and color content accuracy of the the predicted segmentation masks. We demonstrate that the proposed evaluation framework is relevant to obtain meaningful insights on models performance through experiments conducted on five state of the art instance segmentation methods.
暂无评论