A deraining network can be interpreted as a conditional generator that aims at removing rain streaks from image. Most existing image deraining methods ignore model errors caused by uncertainty that reduces embedding q...
详细信息
ISBN:
(纸本)9781665445092
A deraining network can be interpreted as a conditional generator that aims at removing rain streaks from image. Most existing image deraining methods ignore model errors caused by uncertainty that reduces embedding quality. Unlike existing image deraining methods that embed low-quality features into the model directly, we replace low-quality features by latent high-quality features. The spirit of closed-loop feedback in the automatic control field is borrowed to obtain latent high-quality features. A new method for error detection and feature compensation is proposed to address model errors. Extensive experiments on benchmark datasets as well as specific real datasets demonstrate that the proposed method outperforms recent state-of-the-art methods. Code is available at: https://***/LI-Hao-SJTU/DerainRLNet
Most existing unsupervised re-identification uses a clustering-based approach to generate pseudo-labels as supervised signals, allowing deep neural networks to learn discriminative representations without annotations....
详细信息
ISBN:
(纸本)9789819984619;9789819984626
Most existing unsupervised re-identification uses a clustering-based approach to generate pseudo-labels as supervised signals, allowing deep neural networks to learn discriminative representations without annotations. However, drawbacks in clustering algorithms and the absence of discriminatory ability early in training limit better performance seriously. A severe problem arises from path dependency, wherein noisy samples rarely have a chance to escape from their assigned clusters during iterative training. To tackle this challenge, we propose a novel label refinement strategy based on the stable cluster reconstruction. Our approach contains twomodules, the stable cluster reconstruction (SCR) module and the similarity recalculate (SR) module. It reconstructs more stable clusters and re-evaluates the relationship between samples and clearer cluster representatives, providing complementary information for pseudo labels at the instance level. Our proposed approach effectively improves unsupervised reID performance, achieving state-of-the-art performance on four benchmark datasets. Specifically, our method achieves 46.0% and 39.1% mAP on the challenging dataset VeRi776 and MSMT17.
This paper proposes an intelligent management computervision system based on the Internet of Things for the special needs of football stadiums. The system integrates advanced image processing algorithms and computer ...
详细信息
Both Non-Local (NL) operation and sparse representation are crucial for Single Image Super-Resolution (SISR). In this paper, we investigate their combinations and propose a novel Non-Local Sparse Attention (NLSA) with...
详细信息
ISBN:
(纸本)9781665445092
Both Non-Local (NL) operation and sparse representation are crucial for Single Image Super-Resolution (SISR). In this paper, we investigate their combinations and propose a novel Non-Local Sparse Attention (NLSA) with dynamic sparse attention pattern. NLSA is designed to retain long-range modeling capability from NL operation while enjoying robustness and high-efficiency of sparse representation. Specifically, NLSA rectifies non-local attention with spherical locality sensitive hashing (LSH) that partitions the input space into hash buckets of related features. For every query signal, NLSA assigns a bucket to it and only computes attention within the bucket. The resulting sparse attention prevents the model from attending to locations that are noisy and less-informative, while reducing the computational cost from quadratic to asymptotic linear with respect to the spatial size. Extensive experiments validate the effectiveness and efficiency of NLSA. With a few non-local sparse attention modules, our architecture, called non-local sparse network (NLSN), reaches state-of-the-art performance for SISR quantitatively and qualitatively.
Extracting text from complex real-world images poses a significant challenge in computervision due to cluttered backgrounds, diverse fonts, and varying orientations. Traditional methods struggle with accuracy in such...
详细信息
Mechanical image stabilization using actuated gimbals enables capturing long-exposure shots without suffering from blur due to camera motion. These devices, however, are often physically cumbersome and expensive, limi...
详细信息
ISBN:
(纸本)9781665445092
Mechanical image stabilization using actuated gimbals enables capturing long-exposure shots without suffering from blur due to camera motion. These devices, however, are often physically cumbersome and expensive, limiting their widespread use. In this work, we propose to digitally emulate a mechanically stabilized system from the input of a fast unstabilized camera. To exploit the trade-off between motion blur at long exposures and low SNR at short exposures, we train a CNN that estimates a sharp high-SNR image by aggregating a burst of noisy short-exposure frames, related by unknown motion. We further suggest learning the burst's exposure times in an end-to-end manner, thus balancing the noise and blur across the frames. We demonstrate this method's advantage over the traditional approach of deblurring a single image or denoising a fixed-exposure burst on both synthetic and real data.
Studies related to bird species identification, movements, and behavior are important for protecting the environment and measuring biodiversity, especially for ornithological research and conservation. Researchers hav...
详细信息
Long-tailed data distribution is common in many multi-label visual recognition tasks and the direct use of these data for training usually leads to relatively low performance on tail classes. While re-balanced data sa...
详细信息
ISBN:
(纸本)9781665445092
Long-tailed data distribution is common in many multi-label visual recognition tasks and the direct use of these data for training usually leads to relatively low performance on tail classes. While re-balanced data sampling can improve the performance on tail classes, it may also hurt the performance on head classes in training due to label co-occurrence. In this paper, we propose a new approach to train on both uniform and re-balanced samplings in a collaborative way, resulting in performance improvement on both head and tail classes. More specifically, we design a visual recognition network with two branches: one takes the uniform sampling as input while the other takes the rebalanced sampling as the input. For each branch, we conduct visual recognition using a binary-cross-entropy-based classification loss with learnable logit compensation. We further define a new cross-branch loss to enforce the consistency when the same input image goes through the two branches. We conduct extensive experiments on VOC-LT and COCO-LT datasets. The results show that the proposed method significantly outperforms previous state-of-the-art methods on long-tailed multi-label visual recognition.
The study of computervision has been drawing tremendous attention to low light object detection as the need for reliable vision systems that can function in difficult lighting circumstances is growing. Low levels of ...
详细信息
We introduce a new approach for audio-visual speech separation. Given a video, the goal is to extract the speech associated with a face in spite of simultaneous background sounds and/or other human speakers. Whereas e...
详细信息
ISBN:
(纸本)9781665445092
We introduce a new approach for audio-visual speech separation. Given a video, the goal is to extract the speech associated with a face in spite of simultaneous background sounds and/or other human speakers. Whereas existing methods focus on learning the alignment between the speaker's lip movements and the sounds they generate, we propose to leverage the speaker's face appearance as an additional prior to isolate the corresponding vocal qualities they are likely to produce. Our approach jointly learns audio-visual speech separation and cross-modal speaker embeddings from unlabeled video. It yields state-of-the-art results on five benchmark datasets for audiovisual speech separation and enhancement, and generalizes well to challenging real-world videos of diverse scenarios.
暂无评论