Fashion image retrieval (FIR) is a challenging task, which involves similar item searching from a massive collection of fashion products based on a query image. FIR in different garments and shoes are popular in liter...
详细信息
Intrinsic image decomposition is an important topic in computervision and computer graphics applications. However, this is a challenging problem by adopting the information of a single image. Therefore, additional pr...
详细信息
ISBN:
(纸本)9781467399616
Intrinsic image decomposition is an important topic in computervision and computer graphics applications. However, this is a challenging problem by adopting the information of a single image. Therefore, additional priors or supplementary information such as multiply images or user interactions are necessary to address this problem. In this paper, we propose a novel scheme to use multiple images for intrinsic image decomposition, based on a similar strategy in robust principal component analysis (RPCA). RPCA utilizes the fact that the reflectance layer is common in multiple lights of a scene, and attempts to decompose the data matrix constructed from input images into a low-rank matrix and a sparse matrix. This is possible if the sparse matrix is sufficiently sparse, which is often not the case in computervision applications. Moreover, the weighting parameter between the low-rank and sparse matrices greatly affects the accuracy of the results, and tuning this parameter can be tricky. This paper proposes a rank-constrained PCA algorithm (RCPCA) for solving background recovering problems. Fixing the rank of the low-rank matrix to be 1 allows RCPCA to better recover the low-rank matrix from the data matrix. Comprehensive tests show that RCPCA produces more stable and accurate results than RP CA.
Precision agriculture requires high-resolution images to monitor crop growth and health effects. However, capturing such images from aerial platforms faces challenges due to altitude, motion blur, and limited resoluti...
详细信息
ISBN:
(纸本)9783031581731;9783031581748
Precision agriculture requires high-resolution images to monitor crop growth and health effects. However, capturing such images from aerial platforms faces challenges due to altitude, motion blur, and limited resolution. Early detection and identification of these diseases can help prevent their spread and minimize the impact on crop yields. To address the challenge of capturing high-quality images, this paper relies on the new degradation model BSRGAN to enhance low-resolution aerial images of potato crops, initially sized at 750x750, to a higher resolution of 3000x3000. The experimental results indicate that the proposed BSRGAN method surpasses existing super-resolution techniques in terms of visual quality. The study utilizes the Potato Multispectral images Dataset, which has dimensions of 750 x 750. Interpolation methods are then applied to enhance the resolution, resulting in a high super-resolution dataset with dimensions of 3000 x 3000. Subsequently, the dataset is labeled using the publicly available tool *** (***: https:// ***/***), based on three categories of crop health: (a) Healthy, (b) Potato Leafroll Virus (PLRV), and (c) Verticillium wilt, as specified by the Potato Disease Identification, Agriculture, and Horticulture Development Board 2023 (AHDB). After labeling the data, we use You Only Look Once version 8 (YOLOv8) for potato crop health detection on the high-resolution dataset with dimensions of 3000x3000. The YOLOv8 algorithm has been trained on a high super-resolution dataset for object detection and classification. It achieves an impressive mAP better than without super-resolution of over 73%, specifically for healthy potato leaves, with an overall mAP of 56% across all three categories. These findings demonstrate the potential of deep learning based approaches to accurately and efficiently identify potato leaf diseases, empowering farmers to protect their crops.
In this work, we have designed a local descriptor based on the reassigned Stankovic time frequency distribution. The Stankovic distribution is one of the improved extensions of the well knownWignerWille distribution. ...
详细信息
image Aesthetics Assessment is one of the emerging domains in research. The domain deals with classification of images into categories depending on the basis of how pleasant they are for the users to watch. In this ar...
详细信息
Haze poses challenges in many vision-related applications. Thus, dehazing an image becomes popular among vision researchers. Available methods use various priors, deep learning models, or a combination of both to get ...
详细信息
This paper presents a study on the exploitation of visual information from two points of view radically different. computervision is a branch of artificial intelligence that focuses on the extraction of useful inform...
详细信息
ISBN:
(纸本)9781509016457
This paper presents a study on the exploitation of visual information from two points of view radically different. computervision is a branch of artificial intelligence that focuses on the extraction of useful information in an image. image matching is a fundamental aspect of many problems in computervision. Several algorithms have been developed for this purpose. Based on this research, this paper present all the previous work reviewed.
Document image binarization separates the foreground from background which is a very crucial pre-processing step in OCR. Accuracy of binarization immensely influences the accuracy of OCR. Various degradations like ina...
详细信息
image-to-image translation is an emerging method of computervision dataset augmentation, which allows transferring the style of real life images onto synthetic ones, making them more realistic. In our work we propose...
详细信息
ISBN:
(纸本)9781728190808
image-to-image translation is an emerging method of computervision dataset augmentation, which allows transferring the style of real life images onto synthetic ones, making them more realistic. In our work we propose an incremental improvement over the adversarial learning generator architectures used by image-to-image translation models. First, we managed to use a single network, instead of 2, thus creating a more memory-efficient model, which allowed for an end-to-end training on high resolutions. Second, inspired from recent work on semantic segmentation architectures, we enhanced our model by implying a multi-scale encoding and stylization phase, allowing for a better control over the contextual and spatial features. Given a synthetic image, our framework allows for its mullimodal translation into the real domain. Our model shows promising results at narrowing the semantic gap between synthetic and real data.
Pattern recognition, image and video processing based automatic or semi-automatic methodologies are widely used in healthcare services. Especially, image and video guided systems have successfully replaced various med...
详细信息
暂无评论