This paper attempts to survey newer approaches that do not use convolutional neural networks (CNNs) conventionally to the evolving field of medical image classification. While analyzing, firstly, an all feed-forward a...
详细信息
In this paper, we propose a novel and effective Fused Network, which is based on the residual attention mechanism and ordered memory module, a framework for image captioning that enables computer to produce the more a...
详细信息
Traditional focal stack methods require multiple shots to capture images focused at different distances of the same scene, which cannot be applied to dynamic scenes well. Generating a high-quality all-in-focus image f...
Traditional focal stack methods require multiple shots to capture images focused at different distances of the same scene, which cannot be applied to dynamic scenes well. Generating a high-quality all-in-focus image from a single shot is challenging, due to the highly ill-posed nature of the single-image defocus and deblurring problem. In this paper, to restore an all-in-focus image, we propose the event focal stack which is defined as event streams captured during a continuous focal sweep. Given an RGB image focused at an arbitrary distance, we explore the high temporal resolution of event streams, from which we automatically select refocusing timestamps and reconstruct corresponding refocused images with events to form a focal stack. Guided by the neighbouring events around the selected timestamps, we can merge the focal stack with proper weights and restore a sharp all-in-focus image. Experimental results on both synthetic and real datasets show superior performance over state-of-the-art methods.
Relighting an outdoor scene is challenging due to the diverse illuminations and salient cast shadows. Intrinsic image decomposition on outdoor photo collections could partly solve this problem by weakly supervised lab...
Relighting an outdoor scene is challenging due to the diverse illuminations and salient cast shadows. Intrinsic image decomposition on outdoor photo collections could partly solve this problem by weakly supervised labels with albedo and normal consistency from multiview stereo. With neural radiance fields (NeRF), editing the appearance code could produce more realistic results without interpreting the outdoor scene image formation explicitly. This paper proposes to complement the intrinsic estimation from volume rendering using NeRF and from inversing the photometric image formation model using convolutional neural networks (CNNs). The former produces richer and more reliable pseudo labels (cast shadows and sky appearances in addition to albedo and normal) for training the latter to predict interpretable and editable lighting parameters via a single-image prediction pipeline. We demonstrate the advantages of our method for both intrinsic image decomposition and relighting for various real outdoor scenes.
Hyperspectral images had made many applications in the medical field with their rich spectral information. However, there were currently problems with feature extraction based on hyperspectral images, especially in ex...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Hyperspectral images had made many applications in the medical field with their rich spectral information. However, there were currently problems with feature extraction based on hyperspectral images, especially in extracting contextual feature information from spectral bands, and a single convolutional kernel may restrict the receptive field and inadequately capture the sequential properties of the data. Meanwhile, due to the large data volume of hyperspectral images, current medical hyperspectral images focus more on individual segmentation or classification. This paper proposed a 3D swin transformer with multi-task joint learning framework, to simultaneously learn multiple tasks for hyperspectral tongue images. Based on the 3D swin transformer model, the framework regards cross-band context feature learning of hyperspectral images as a sequence-to-sequence prediction process. The 3D transformer encoder used as the basic framework for shared feature extraction, and set up corresponding decoders for prediction according to different visual tasks such as segmentation and classification. We conducted simultaneous segmentation and classification tasks of the tongue coating region on the hyperspectral image of tongue images. The results showed that the proposed model had good results in single tasks of segmentation and classification, and performed better than other multi-task convolutional neural network models.
The depth information of the image is essential for accurate positioning. However, for the scale of the model, stricter standards are inevitable. This paper proposed Lower Layer Efficient Neural Network (LL-ENet) on t...
详细信息
In this work, we propose an imageprocessing method which combines the limited contrast adaptive histogram equalization (CLAHE) with Gamma transform to solve the illumination problem in facial expression recognition. ...
详细信息
ISBN:
(纸本)9781665431828
In this work, we propose an imageprocessing method which combines the limited contrast adaptive histogram equalization (CLAHE) with Gamma transform to solve the illumination problem in facial expression recognition. We apply this algorithm to professional illumination datasets (Extended Yale B) and get better visual results, compared with using CLAHE and Gamma correction separately. Moreover, we use a convolution neural network (CNN) that pre-trained on FER2013 datasets to evaluate the effect of this method in facial expression recognition. We use this preprocessing algorithm to enhance the CK+ and Oulu expression datasets, and get accuracy of 89.24% and 70.24% respectively. Compared with the datasets that have not been pre-processed, it has provided an increase in classification accuracy of 7% on the Oulu datasets.
With the rapid development of computers and the internet, digital image forgery detection has become one of the important research hot topics in the field of computer vision. In this article, we propose a dual stream ...
详细信息
ISBN:
(数字)9798350361674
ISBN:
(纸本)9798350361681
With the rapid development of computers and the internet, digital image forgery detection has become one of the important research hot topics in the field of computer vision. In this article, we propose a dual stream network to detect image tampering and locate forged areas. Our network includes the ringed residual U-Net, RGB stream and DCT stream, which is named as a hybrid feature ringed residual U-Net (HFRRU-Net) because of combining frequency domain features with RGB features. Compared to the existing works, HFRRU-Net is an end-to-end training approach without image preprocessing or post-processing, and this network may speed up the training and prediction process. In the experments, the HFRRU-Net is evaluated on five public image forgery datasets, CASIA v1.0, Columbia, In-The- Wild, Nist2016, and Realistic. The results indicate that the F1 scores of the HFRRU-Net outperforms other methods in four datasets. And visual comparisons further demonstrate the proposed method.
Existing approaches in video captioning concentrate on exploring global frame features in the uncompressed videos, while the free of charge and critical saliency information already encoded in the compressed videos is...
详细信息
Video analytics systems are rapidly evolving, and the effectiveness of their work depends on the quality of operations at the initial level of the entire processing process, namely the quality of segmentation of objec...
详细信息
ISBN:
(纸本)9781665426053
Video analytics systems are rapidly evolving, and the effectiveness of their work depends on the quality of operations at the initial level of the entire processing process, namely the quality of segmentation of objects in the scene and their recognition. Successful performance of these procedures is primarily due to image quality, which depends on many factors: technical parameters of video sensors, low or uneven lighting, changes in lighting levels of the scene due to weather conditions, time changes in illumination, or changes in scenarios in the scene. A novel method for determining the optimal value of a gamma correction parameter, which ensures the selection of the image of the best quality in automatic mode is represented in the paper. The method is based on the use of gamma correction, which reflects properties of a human visual system, effectively reduces the negative impact of changes in scene illumination and due to simple adjustment and effective implementation is widely used in practice. The technique of selection in an automatic mode of the optimum value of the gamma parameter at which the corrected image reaches the maximum quality is developed.
暂无评论