We introduce PREDATOR, a model for pairwise point-cloud registration with deep attention to the overlap region. Different from previous work, our model is specifically designed to handle (also) point-cloud pairs with ...
详细信息
ISBN:
(纸本)9781665445092
We introduce PREDATOR, a model for pairwise point-cloud registration with deep attention to the overlap region. Different from previous work, our model is specifically designed to handle (also) point-cloud pairs with low overlap. Its key novelty is an overlap-attention block for early information exchange between the latent encodings of the two point clouds. In this way the subsequent decoding of the latent representations into per-point features is conditioned on the respective other point cloud, and thus can predict which points are not only salient, but also lie in the overlap region between the two point clouds. The ability to focus on points that are relevant for matching greatly improves performance: PREDATOR raises the rate of successful registrations by more than 20% in the low-overlap scenario, and also sets a new state of the art for the 3DMatch benchmark with 89% registration recall.
Human activities can be learned from video. With effective modeling it is possible to discover not only the action labels but also the temporal structure of the activities, such as the progression of the sub-activitie...
详细信息
ISBN:
(纸本)9781665445092
Human activities can be learned from video. With effective modeling it is possible to discover not only the action labels but also the temporal structure of the activities, such as the progression of the sub-activities. Automatically recognizing such structure from raw video signal is a new capability that promises authentic modeling and successful recognition of human-object interactions. Toward this goal, we introduce Asynchronous-Sparse Interaction Graph Networks (ASSIGN), a recurrent graph network that is able to automatically detect the structure of interaction events associated with entities in a video scene. ASSIGN pioneers learning of autonomous behavior of video entities including their dynamic structure and their interaction with the coexisting neighbors. Entities' lives in our model are asynchronous to those of others therefore more flexible in adapting to complex scenarios. Their interactions are sparse in time hence more faithful to the true underlying nature and more robust in inference and learning. ASSIGN is tested on humanobject interaction recognition and shows superior performance in segmenting and labeling of human sub-activities and object affordances from raw videos. The native ability of ASSIGN in discovering temporal structure also eliminates the dependence on external segmentation that was previously mandatory for this task.
Adversarial robustness corresponds to the susceptibility of deep neural networks to imperceptible perturbations made at test time. In the context of image tasks, many algorithms have been proposed to make neural netwo...
详细信息
ISBN:
(纸本)9781665445092
Adversarial robustness corresponds to the susceptibility of deep neural networks to imperceptible perturbations made at test time. In the context of image tasks, many algorithms have been proposed to make neural networks robust to adversarial perturbations made to the input pixels. These perturbations are typically measured in an 4, norm. However, robustness often holds only for the specific attack used for training. In this work we extend the above setting to consider the problem of training of deep neural networks that can be made simultaneously robust to perturbations applied in multiple natural representations spaces. For the case of image data, examples include the standard pixel representation as well as the representation in the discrete cosine transform (DCT) basis. We design a theoretically sound algorithm with formal guarantees for the above problem. Furthermore, our guarantees also hold when the goal is to require robustness with respect to multiple 4, norm based attacks. We then derive an efficient practical implementation and demonstrate the effectiveness of our approach on standard datasets for image classification.(1)
In order to solve the problem of wind turbine blade recognition, deep learning recognition methods are generally used. However, such algorithms require a large number of training samples and have high computational co...
详细信息
This paper is designed to have an optical character recognition system capable of interpreting captured images of hard disk drive and solid-state drive labels with high accuracy. Manual checking of the disk capacity s...
详细信息
ISBN:
(纸本)9781665483797
This paper is designed to have an optical character recognition system capable of interpreting captured images of hard disk drive and solid-state drive labels with high accuracy. Manual checking of the disk capacity size and part number found on the labels is time consuming, more prone to errors and utilizes more manpower. Automating the inspection through optical character recognition using image pre-processing and machine vision contributes to an easier inspection process, better management of records and faster cycle time. The images captured using a vision camera went through different stages of image pre-processing via OpenCV-Python and recognition through Google Tesseract. Different categorical variables including exposure time and location of texts in a captured image were used to determine and improve the overall recognition accuracy. By improving the lighting condition through the addition of light sources, the developed OCR system was able to achieve a character recognition accuracy of 99.375%.
The proceedings contain 304 papers. The topics discussed include: parameter matching and regenerative braking energy recovery strategy for pure electric commercial vehicles;an optimized design of an electric vehicle w...
ISBN:
(纸本)9798350368208
The proceedings contain 304 papers. The topics discussed include: parameter matching and regenerative braking energy recovery strategy for pure electric commercial vehicles;an optimized design of an electric vehicle wireless charging robotic arm;research on automatic classification and recognition system of works of art based on computervision;research on energy-saving control strategy of residents' smart home;modeling designed for regenerative braking energy recovery of pure electric commercial vehicles;building facade design and optimization based on computer technology and improved genetic algorithm;application research of wireless sensor network in structural health monitoring of civil engineering;and design and application of control strategy for the ‘single forced draft fan, double induced draft fan, and double air preheater’ configuration of air and flue gas system in thermal power units.
Digital Memes have been widely utilized in people’s daily lives over social media platforms. Composed of images and descriptive texts, memes are often distributed with the flair of sarcasm or humor, yet can also spre...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Digital Memes have been widely utilized in people’s daily lives over social media platforms. Composed of images and descriptive texts, memes are often distributed with the flair of sarcasm or humor, yet can also spread harmful content or biases from social and cultural factors. Aside from mainstream tasks such as meme generation and classification, generating explanations for memes has become more vital and poses challenges in avoiding propagating already embedded biases. Our work studied whether recent advanced vision Language Models (VL models) can fairly explain meme contents from different domains/topics, contributing to a unified benchmark for meme explanation. With the dataset, we semi-automatically and manually evaluate the quality of VL model-generated explanations, identifying the major categories of biases in meme explanations.
In this paper, we aim to recognize materials with combined use of auditory and visual perception. To this end, we construct a new dataset named GLAudio that consists of both the geometry of the object being struck and...
详细信息
ISBN:
(纸本)9781665445092
In this paper, we aim to recognize materials with combined use of auditory and visual perception. To this end, we construct a new dataset named GLAudio that consists of both the geometry of the object being struck and the sound captured from either modal sound synthesis (for virtual objects) or real measurements (for real objects). Besides global geometries, our dataset also takes local geometries around different hitpoints into consideration. This local information is less explored in existing datasets. We demonstrate that local geometry has a greater impact on the sound than the global geometry and offers more cues in material recognition. To extract features from different modalities and perform proper fusion, we propose a new deep neural network GLAVNet that comprises multiple branches and a well-designed fusion module. Once trained on GLAudio, our GLAVNet provides state-of-the-art performance on material identification and supports fine-grained material categorization.
Physical adversarial attacks against object detectors have seen increasing success in recent years. However, these attacks require direct access to the object of interest in order to apply a physical patch. Furthermor...
详细信息
ISBN:
(纸本)9781665445092
Physical adversarial attacks against object detectors have seen increasing success in recent years. However, these attacks require direct access to the object of interest in order to apply a physical patch. Furthermore, to hide multiple objects, an adversarial patch must be applied to each object. In this paper, we propose a contactless translucent physical patch containing a carefully constructed pattern, which is placed on the camera's lens, to fool state-of-the-art object detectors. The primary goal of our patch is to hide all instances of a selected target class. In addition, the optimization method used to construct the patch aims to ensure that the detection of other (untargeted) classes remains unharmed. Therefore, in our experiments, which are conducted on state-of-the-art object detection models used in autonomous driving, we study the effect of the patch on the detection of both the selected target class and the other classes. We show that our patch was able to prevent the detection of 42.27% of all stop sign instances while maintaining high (nearly 80%) detection of the other classes.
Batch Normalization (BatchNorm) has become the default component in modern neural networks to stabilize training. In BatchNorm, centering and scaling operations, along with mean and variance statistics, are utilized f...
详细信息
ISBN:
(纸本)9781665445092
Batch Normalization (BatchNorm) has become the default component in modern neural networks to stabilize training. In BatchNorm, centering and scaling operations, along with mean and variance statistics, are utilized for feature standardization over the batch dimension. The batch dependency of BatchNorm enables stable training and better representation of the network, while inevitably ignores the representation differences among instances. We propose to add a simple yet effective feature calibration scheme into the centering and scaling operations of BatchNorm, enhancing the instance-specific representations with the negligible computational cost. The centering calibration strengthens informative features and reduces noisy features. The scaling calibration restricts the feature intensity to form a more stable feature distribution. Our proposed variant of BatchNorm, namely Representative BatchNorm, can be plugged into existing methods to boost the performance of various tasks such as classification, detection, and segmentation.
暂无评论