In situations when both the output and the result are graphical, classification is used. The science's name was decided upon because of its concentration on image analysis. Imaging, satellite data, contrasts ampli...
详细信息
In smartphones and mobile camera devices, the image Signal Processor(ISP) is applied to reconstruct the RAW image into a sRGB image for human reading by a series of signal modules. Due to the non-linear ISP transforma...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
In smartphones and mobile camera devices, the image Signal Processor(ISP) is applied to reconstruct the RAW image into a sRGB image for human reading by a series of signal modules. Due to the non-linear ISP transformation, it is complicated to model the degradation in the sRGB domain. Most existing super-resolution methods directly handle the sRGB image processed by the ISP, introducing more difficult degradation patterns. To address this challenge, we propose an enhanced transformer network named RBSFormer. Unlike other methods that operate on sRGB images, RBSFormer takes RAW images as input, thus avoiding the complex degradation introduced by ISP processing. We design two enhanced core components, i.e., Enhanced CrossCovairance Attention(EXCA) and Enhanced Gated Feedforward Network(EGFN), in the RBSFormer, and we further introduce data augmentation in the RAW domain and hybrid ensemble strategies to enhance our results. Experimental results demonstrate superior performance against the majority of methods both qualitatively and quantitatively. Our RBSFormer achieves 3rd place in terms of all the evaluation metrics both on the official validation and testing set with fewer parameters in the NTIRE 2024 challenge on Raw image Super Resolution.
Contour-based instance segmentation methods have developed rapidly recently but feature rough and hand-crafted front-end contour initialization, which restricts the model performance, and an empirical and fixed backen...
详细信息
Fine grained image classification is a very popular research topic in the fields of computer vision and patternrecognition in recent years. At present, fine-grained image classification by deep learning is mainly bas...
详细信息
Existing methods for shadow removal in high-resolution images may not be effective due to challenges such as the time-consuming nature of training and the loss of visual data during image cropping or resizing, highlig...
Existing methods for shadow removal in high-resolution images may not be effective due to challenges such as the time-consuming nature of training and the loss of visual data during image cropping or resizing, highlighting the necessity for the development of more efficient methods. In this paper, we propose a novel Pyramid Ensemble Structure (PES) for High Resolution image Shadow Removal. Our approach takes advantage of multiple scales by constructing pyramid inputs that allow for the capturing of a wide range of shadow sizes and shapes. We then train the network in pyramid stages to enhance global information processing. Furthermore, an ensemble of different shadow removal models is employed, and the maximum value is chosen to indicate the least amount of remaining shadow in the output. Experiments on both validation and testing data sets confirm the effectiveness of our method. In the image Shadow Removal Challenge competition, our method obtained 22.36 PSNR score (1st place) and 0.70 SSIM score (2nd place) on the test sets.
3D object detection is an essential perception task in autonomous driving to understand the environments. The Bird's-Eye-View (BEV) representations have significantly improved the performance of 3D detectors with ...
详细信息
Handling clustering problems are important in data statistics, patternrecognition and imageprocessing. The mean-shift algorithm, a common unsupervised algorithms, is widely used to solve clustering problems. However...
详细信息
ISBN:
(纸本)9781665488105
Handling clustering problems are important in data statistics, patternrecognition and imageprocessing. The mean-shift algorithm, a common unsupervised algorithms, is widely used to solve clustering problems. However, the meanshift algorithm is restricted by its huge computational resource cost. In previous research [1], we proposed a novel GPUaccelerated Faster Mean-shift algorithm, which greatly speed up the cosine-embedding clustering problem. In this study, we extend and improve the previous algorithm to handle Euclidean distance metrics. Different from conventional GPU-based meanshift algorithms, our algorithm adopts novel Seed Selection & Early Stopping approaches, which greatly increase computing speed and reduce GPU memory consumption. In the simulation testing, when processing.a 200K points clustering problem, our algorithm achieved around 3 times speedup compared to the state-of-the-art GPU-based mean-shift algorithms with optimized GPU memory consumption. Moreover, in this study, we implemented a plug-and-play model for faster mean-shift algorithm, which can be easily deployed. (Plug-and-play model is available: https://***/masqm/Faster-Mean-Shift-Euc)
Convolutional neural networks (CNNs) and Transformers have achieved significant success in image signal processing. However, little effort has been made to effectively combine the properties of these two architectures...
Convolutional neural networks (CNNs) and Transformers have achieved significant success in image signal processing. However, little effort has been made to effectively combine the properties of these two architectures to satisfy image deraining. In this paper, we propose an effective de-raining method, dilated convolutional Transformer (DCT), which can enlarge the receptive fields of the network to aggregate global information. The fundamental building unit of our approach is the dilformer block containing multi-dilconv sparse attention (MDSA) and multi-dilconv feed-forward network (MDFN). The MDSA calculates the multi-scale query to generate accurate similarity map so that rich multi-scale information can be better utilized for the high-quality image reconstruction. In addition, we adopt ReLU to replace the original softmax to enforce sparsity in the Transformer for better feature aggregation. The MDFN is further established to better integrate the rain information of different scales in the feature transformation. Extensive experiments on the benchmarks show the favorable performance against state-of-the-art approaches.
Transformer architectures have become state-of-the-art models in computer vision and natural language processing. To a significant degree, their success can be attributed to self-supervised pre-training on large scale...
Transformer architectures have become state-of-the-art models in computer vision and natural language processing. To a significant degree, their success can be attributed to self-supervised pre-training on large scale unlabeled datasets. This work investigates the use of self-supervised masked image reconstruction to advance transformer models for hyperspectral remote sensing imagery. To facilitate self-supervised pre-training, we build a large dataset of unlabeled hyperspectral observations from the EnMAP satellite and systematically investigate modifications of the vision transformer architecture to optimally leverage the characteristics of hyperspectral data. We find significant improvements in accuracy on different land cover classification tasks over both standard vision and sequence transformers using (i) blockwise patch embeddings, (ii) spatialspectral self-attention, (iii) spectral positional embeddings and (iv) masked self-supervised pre-training 1 . The resulting model outperforms standard transformer architectures by +5% accuracy on a labeled subset of our EnMAP data and by +15% on Houston2018 hyperspectral dataset, making it competitive with a strong 3D convolutional neural network baseline. In an ablation study on label-efficiency based on the Houston2018 dataset, self-supervised pre-training significantly improves transformer accuracy when little labeled training data is available. The self-supervised model outperforms randomly initialized transformers and the 3D convolutional neural network by +7-8% when only 0.1-10% of the training labels are available.
The proceedings contain 516 papers. The topics discussed include: OmniLayout: room layout reconstruction from indoor spherical panoramas;boosting adversarial robustness using feature level stochastic smoothing;beyond ...
ISBN:
(纸本)9781665448994
The proceedings contain 516 papers. The topics discussed include: OmniLayout: room layout reconstruction from indoor spherical panoramas;boosting adversarial robustness using feature level stochastic smoothing;beyond joint demosaicking and denoising: an imageprocessing.pipeline for a pixel-bin image sensor;assessment of deep learning based blood pressure prediction from PPG and rPPG signals;towards domain-specific explainable AI: model interpretation of a skin image classifier using a human approach;DAMSL: domain agnostic meta score-based learning;deep learning based spatial-temporal in-loop filtering for versatile video coding;automated tackle injury risk assessment in contact-based sports - a rugby union example;two-stage network for single image super-resolution;and ***: dataset for automatic mapping of buildings, woodlands, water and roads from aerial imagery.
暂无评论