Non-local low-rank tensor approximation has been developed as a state-of-the-art method for hyperspectral image (HSI) denoising. Unfortunately, while their denoising performance benefits little from more spectral band...
详细信息
ISBN:
(纸本)9781728132938
Non-local low-rank tensor approximation has been developed as a state-of-the-art method for hyperspectral image (HSI) denoising. Unfortunately, while their denoising performance benefits little from more spectral bands, the running time of these methods significantly increases. In this paper, we claim that the HSI lies in a global spectral low-rank subspace, and the spectral subspaces of each full band patch groups should lie in this global low-rank subspace. This motivates us to propose a unified spatial-spectral paradigm for HSI denoising. As the new model is hard to optimize, An efficient algorithm motivated by alternating minimization is developed. This is done by first learning a low-dimensional orthogonal basis and the related reduced image from the noisy HSI. Then, the non-local low-rank denoising and iterative regularization are developed to refine the reduced image and orthogonal basis, respectively. Finally, the experiments on synthetic and both real datasets demonstrate the superiority against the stateof-the-art HSI denoising methods.
Existing computervision research in artwork struggles with artwork's fine-grained attributes recognition and lack of curated annotated datasets due to their costly creation. In this work, we use CLIP (Contrastive...
详细信息
ISBN:
(纸本)9781665448994
Existing computervision research in artwork struggles with artwork's fine-grained attributes recognition and lack of curated annotated datasets due to their costly creation. In this work, we use CLIP (Contrastive Language-Image Pre-Training) [12] for training a neural network on a variety of art images and text pairs, being able to learn directly from raw descriptions about images, or if available, curated labels. Model's zero-shot capability allows predicting the most relevant natural language description for a given image, without directly optimizing for the task. Our approach aims to solve 2 challenges: instance retrieval and fine-grained artwork attribute recognition. We use the iMet Dataset [20], which we consider the largest annotated artwork dataset. Our code and models will be available at https://***/KeremTurgutlu/clip_art
When creating a new labeled dataset, human analysts or data reductionists must review and annotate large numbers of images. This process is time consuming and a barrier to the deployment of new computervision solutio...
详细信息
ISBN:
(纸本)9781665448994
When creating a new labeled dataset, human analysts or data reductionists must review and annotate large numbers of images. This process is time consuming and a barrier to the deployment of new computervision solutions, particularly for rarely occurring objects. To reduce the number of images requiring human attention, we evaluate the utility of images created from 3D models refined with a generative adversarial network to select confidence thresholds that significantly reduce false alarms rates. The resulting approach has been demonstrated to cut the number of images needing to be reviewed by 50% while preserving a 95% recall rate, with only 6 labeled examples of the target.
This paper describes an Active Character recognition methodology henceforth referred to as ACR. We present in this paper a method that uses an active heuristic function similar to the one used by A* search algorithm t...
详细信息
This paper describes an Active Character recognition methodology henceforth referred to as ACR. We present in this paper a method that uses an active heuristic function similar to the one used by A* search algorithm that adaptively determines the length of the feature vector as well as the features themselves used to classify an input pattern.. ACR adapts to factors such as the quality of the input pattern, its intrinsic similarities and differences from patterns of other classes it is being compared against and the processing time available. Furthermore, the finer resolution is accorded to only certain "zones" of the input pattern rr which are deemed important given the classes that are being discriminated. Experimental results support the methodology presented. recognition rate of ACR is about 96% on the NIST data sets and the speed is better than traditional classification methods.
This paper presents a novel approach for generating and analyzing epipolar plane images (EPIs) from video sequences taken from a moving platform subject to vibration so that the 3D model of an arbitrary scene can be c...
详细信息
This paper presents a novel approach for generating and analyzing epipolar plane images (EPIs) from video sequences taken from a moving platform subject to vibration so that the 3D model of an arbitrary scene can be constructed. Two problems are solved in our approach: (1) how to generate EPIs from video under a more general motion than a pure translation; (2) how to analyze the huge amount of data in the EPIs robustly and efficiently. For the first problem, a 3D image stabilization method is proposed which decouples the vibration from the vehicle's motion so that good EPIs and panoramic view images (PVIs) can be generated. For the second problem, we propose an efficient panoramic EPI analysis (PEPIA) method in which only one scanline of each EPI is processed. The PEPIA combines advantages of PVIs and EPIs and consists of three important steps: locus orientation detection, motion boundary localization, and occlusion/resolution recovery. The output of the PEPIA - a layered 3D panorama, is very useful in visual navigation and virtual reality modeling. Since camera calibration, image segmentation, feature extraction and matching are avoided, all the proposed algorithms are fully automatic and rather general. Results on real image sequences are given.
computervision is increasingly effective at segmenting objects in images and videos;however, scene effects related to the objects-shadows, reflections, generated smoke, etc.-are typically overlooked. Identifying such...
详细信息
ISBN:
(纸本)9781665445092
computervision is increasingly effective at segmenting objects in images and videos;however, scene effects related to the objects-shadows, reflections, generated smoke, etc.-are typically overlooked. Identifying such scene effects and associating them with the objects producing them is important for improving our fundamental understanding of visual scenes, and can also assist a variety of applications such as removing, duplicating, or enhancing objects in video. In this work, we take a step towards solving this novel problem of automatically associating objects with their effects in video. Given an ordinary video and a rough segmentation mask over time of one or more subjects of interest, we estimate an omnimatte for each subject-an alpha matte and color image that includes the subject along with all its related time-varying scene elements. Our model is trained only on the input video in a self-supervised manner, without any manual labels, and is generic-it produces omnimattes automatically for arbitrary objects and a variety of effects. We show results on real-world videos containing interactions between different types of subjects (cars, animals, people) and complex effects, ranging from semitransparent elements such as smoke and reflections, to fully opaque effects such as objects attached to the subject.
This paper presents an axiomatic approach to corner detection. In the first part of the paper we review five currently used corner detection methods (Harris-Stephens, Forstner Shi-Tomasi, Rohr and Kenney et al.) for g...
详细信息
ISBN:
(纸本)0769523722
This paper presents an axiomatic approach to corner detection. In the first part of the paper we review five currently used corner detection methods (Harris-Stephens, Forstner Shi-Tomasi, Rohr and Kenney et al.) for graylevel images. This is followed by a discussion of extending these corner detectors to images with different pixel dimensions such as signals (pixel dimension one) and tomographic medical images (pixel dimension three) as well as different intensity dimensions such as color or LADAR images (intensity dimension three). These extensions are motivated by analyzing a particular example of optical flow in pixel and intensity space with arbitrary dimensions. Placing corner detection in a general setting enables us to state four axioms that any corner detector might reasonably be required to satisfy. Our main result is that only the Shi-Tomasi (and equivalently the Kenney et al. 2-norm detector) satisfy all four of the axioms
An algorithm Sor tracking a person's head is presented. The head's projection onto the image plane is modeled as an ellipse whose position and size are continually updated by a local search combining the outpu...
详细信息
ISBN:
(纸本)0818684976
An algorithm Sor tracking a person's head is presented. The head's projection onto the image plane is modeled as an ellipse whose position and size are continually updated by a local search combining the output of a module concentrating an the intensity gradient around the ellipse's perimeter with that of another module focusing on the color histogram of the ellipse's interior: Since these two modules have roughly orthogonal failure modes, they serve to complement one another: The result is a robust, real-time system that is able to track a person's head with enough accuracy to automatically: central the camera's pml, tilt, and zoom in order to keep the person centered in the field of view at a desired size. Extensive experimentation shows the algorithm's robustness with respect to full 360-degree out-of-plane rotation, up to 90-degree tilting, severe but brief occlusion, arbitrary camera movement, and multiple moving people in the background.
Detecting spoofing attacks plays a vital role for deploying automatic face recognition for biometric authentication in applications such as access control, face payment, device unlock, etc. In this paper we propose a ...
详细信息
ISBN:
(数字)9781728125060
ISBN:
(纸本)9781728125060
Detecting spoofing attacks plays a vital role for deploying automatic face recognition for biometric authentication in applications such as access control, face payment, device unlock, etc. In this paper we propose a new anti-spoofing network architecture that takes advantage of multi-modal image data and aggregates intra-channel features at multiple network layers. We also transfer strong facial features learned for face recognition and show their benefits for detecting spoofing attacks. Finally, to increase the generalization ability of our method to unseen attacks, we use an ensemble of models trained separately for distinct types of spoofing attacks. The proposed method achieves state-of-the-art result on the largest multi-modal anti-spoofing dataset CASIA-SURF [26].
Perceiving distance from two camera images, a task called stereo vision, is fundamental for many applications in robotics or automation. However, algorithms that compute this information at high accuracy have a high c...
详细信息
ISBN:
(纸本)9781509014378
Perceiving distance from two camera images, a task called stereo vision, is fundamental for many applications in robotics or automation. However, algorithms that compute this information at high accuracy have a high computational complexity. One such algorithm, Semi Global Matching (SGM), performs well in many stereo vision benchmarks, while maintaining a manageable computational complexity. Nevertheless, CPU and GPU implementations of this algorithm often fail to achieve real-time processing of camera images, especially in power-constrained embedded environments. This work presents a novel architecture to calculate disparities through SGM. The proposed architecture is highly scalable and applicable for low-power embedded as well as high-performance multi-camera high-resolution applications.
暂无评论