We present a novel approach for accelerating convolutions during inference for CPU-based architectures. The most common method of computation involves packing the image into the columns of a matrix (im2col) and perfor...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
We present a novel approach for accelerating convolutions during inference for CPU-based architectures. The most common method of computation involves packing the image into the columns of a matrix (im2col) and performing general matrix multiplication (GEMM) with a matrix of weights. This results in two main drawbacks: (a) im2col requires a large memory buffer and can experience inefficient memory access, and (b) while GEMM is highly optimized for scientific matrices multiplications, it is not well suited for convolutions. We propose an approach that takes advantage of scalar-matrix multiplication and reduces memory overhead. Our experiments with commonly used network architectures demonstrate a significant speedup compared to existing indirect methods.
In this paper we present a method for computing the localization of a mobile robot with reference to a learning video sequence. The robot is first guided on a path by a human, while the camera records a monocular lear...
详细信息
ISBN:
(纸本)0769523722
In this paper we present a method for computing the localization of a mobile robot with reference to a learning video sequence. The robot is first guided on a path by a human, while the camera records a monocular learning sequence. Then a 3D reconstruction of the path and the environment is computed off line from the learning sequence. The 3D reconstruction is then used for computing the pose of the robot in real time (30 Hz) in autonomous navigation. Results from our localization method are compared to the ground truth measured with a differential GPS.
Despite their remarkable performance, the explainability of vision Transformers (ViTs) remains a challenge. While forward attention-based token attribution techniques have become popular in text processing, their suit...
详细信息
ISBN:
(纸本)9798350365474
Despite their remarkable performance, the explainability of vision Transformers (ViTs) remains a challenge. While forward attention-based token attribution techniques have become popular in text processing, their suitability for ViTs hasn't been extensively explored. In this paper, we compare these methods against state-of-the-art input attribution methods from the vision literature, revealing their limitations due to improper aggregation of information across layers. To address this, we introduce two general techniques, PLUS and SkipPLUS, that can be composed with any input attribution method to more effectively aggregate information across layers while handling noisy layers. Through comprehensive and quantitative evaluations of faithfulness and human interpretability on a variety of ViT architectures and datasets, we demonstrate the effectiveness of PLUS and SkipPLUS, establishing a new state-of-the-art in white-box token attribution. We conclude with a comparative analysis highlighting the strengths and weaknesses of the best versions of all the studied methods. The code used in this paper is freely available at https://***/NightMachinery/SkipPLUS-cvpr-2024.
In this paper, we propose an online movement-specific vehicle counting system to realize robust traffic flow analysis at crowded intersections. Our proposed framework adopts PP-YOLO as the vehicle detector and adapts ...
详细信息
ISBN:
(纸本)9781665448994
In this paper, we propose an online movement-specific vehicle counting system to realize robust traffic flow analysis at crowded intersections. Our proposed framework adopts PP-YOLO as the vehicle detector and adapts the Deep-Sort algorithm to perform multi-object tracking. In order to realize online and robust vehicle counting, we further adopt a shape-based movement assignment strategy to differentiate movements and carefully designed spatial constraints to effectively reduce false-positive counts. Our proposed framework achieves the overall S1-score of 0.9467, ranking the first in the AICITY2021-track1 challenge.
We present a new state-of-the-art on the text-to-video retrieval task on MSRVTT and LSMDC benchmarks where our model outperforms all previous solutions by a large margin. Moreover, state-of-the-art results are achieve...
详细信息
ISBN:
(纸本)9781665448994
We present a new state-of-the-art on the text-to-video retrieval task on MSRVTT and LSMDC benchmarks where our model outperforms all previous solutions by a large margin. Moreover, state-of-the-art results are achieved using a single model and without finetuning. This multidomain generalisation is achieved by a proper combination of different video caption datasets. We show that our practical approach for training on different datasets can improve test results of each other. Additionally, we check intersection between many popular datasets and show that MSRVTT as well as ActivityNet contains a significant overlap between the test and the training parts. More details are available at https://***/papermsucode/mdmmt.
As the request for deep learning solutions increases, the need for explainability is even more fundamental. In this setting, particular attention has been given to visualization techniques, that try to attribute the r...
详细信息
ISBN:
(纸本)9781665448994
As the request for deep learning solutions increases, the need for explainability is even more fundamental. In this setting, particular attention has been given to visualization techniques, that try to attribute the right relevance to each input pixel with respect to the output of the network. In this paper, we focus on Class Activation Mapping (CAM) approaches, which provide an effective visualization by taking weighted averages of the activation maps. To enhance the evaluation and the reproducibility of such approaches, we propose a novel set of metrics to quantify explanation maps, which show better effectiveness and simplify comparisons between approaches. To evaluate the appropriateness of the proposal, we compare different CAM-based visualization methods on the entire ImageNet validation set, fostering proper comparisons and reproducibility.
In this paper, we study the problem of reproducing the light from a single image of an object covered with random specular microfacets on the surface. We show that such reflectors can be interpreted as a randomized ma...
详细信息
ISBN:
(纸本)9781467367592
In this paper, we study the problem of reproducing the light from a single image of an object covered with random specular microfacets on the surface. We show that such reflectors can be interpreted as a randomized mapping from the lighting to the image. Such specular objects have very different optical properties from both diffuse surfaces and smooth specular objects like metals, so we design a special imaging system to robustly and effectively photograph them. We present simple yet reliable algorithms to calibrate the proposed system and do the inference. We conduct experiments to verify the correctness of our model assumptions and prove the effectiveness of our pipeline.
Landuse classification is an important problem in the remote sensing field. It can be used in a wide range of applications. In this paper we propose a hybrid method fusing edges and regions information for the landuse...
详细信息
ISBN:
(纸本)0769523722
Landuse classification is an important problem in the remote sensing field. It can be used in a wide range of applications. In this paper we propose a hybrid method fusing edges and regions information for the landuse classification of multispectral images. It mainly includes the steps of image pre-processing, initial segmentation and region merging. Especially, a novel spatial mean shift procedure is proposed so that some information can be extracted and used in the successive steps. Aiming at the multispectral images processing, we also design a band weighting strategy that give a proper weight to each band adaptively according to the region to be processed. Experimental results on the Landsat TM and ETM+ images validate the performance of the proposed method.
Translation symmetry is one of the most important pattern characteristics in natural and man-made environments. Detecting translation symmetry is a grand challenge in computervision. This has a large spectrum of real...
详细信息
ISBN:
(纸本)9780769549903
Translation symmetry is one of the most important pattern characteristics in natural and man-made environments. Detecting translation symmetry is a grand challenge in computervision. This has a large spectrum of real-world applications from industrial settings to design, arts, entertainment and eduction. This paper describes the algorithm we have submitted for the Symmetry Detection Competition 2013. We introduce two new concepts in our symmetric repetitive pattern detection algorithm. The first concept is the bottom-up detection-inference approach. This extends the versatility of current detection methods to a higher level segmentation. The second concept is the framework of a new theoretical analysis of invariant repetitive patterns. This is crucial in symmetry/non-symmetry structure extraction but has less coverage in the previous literature on pattern detection and classification.
作者:
Rigoutsos, IIBM Corp
Thomas J Watson Res Ctr Computat Biol Ctr Bioinformat & Pattern Discovery Yorktown Heights NY 10598 USA
We derive and discuss a set of parametric equations which, when given a convex 3D feature domain, K, will generate affine invariants with the property that the invariants' values are uniformly distributed in the r...
详细信息
ISBN:
(纸本)0818684976
We derive and discuss a set of parametric equations which, when given a convex 3D feature domain, K, will generate affine invariants with the property that the invariants' values are uniformly distributed in the region [0,1]x[0,1]x[0,1]. Once the shape of the feature domain K is determined and fixed it is straightforward to compute the values of the parameters and thus the proposed scheme can be tuned to a specific feature domain. The features of all recognizable objects (models) are assumed to be three-dimensional points and uniformly distributed over K. The scheme leads to improved discrimination power, improved computational-load and storage-load balancing and can also be used to determine and identify biases in the database of recognizable models (over-represented constructs of object points). Obvious enhancements produce rigid-transformation and similarity-transformation invariants with the same good distribution properties, making this approach generally applicable.
暂无评论