The task of image style transfer is to automatically redraw an input image in the style of another image, such as an artist's painting. The disadvantage of conventional stylization algorithms is the uniqueness of ...
详细信息
The task of image style transfer is to automatically redraw an input image in the style of another image, such as an artist's painting. The disadvantage of conventional stylization algorithms is the uniqueness of result. If the user is not satisfied with the way the style was transferred, he has no option to remake the stylization. The paper provides an overview of existing style transfer methods that generate diverse results after each run and proposes two new methods. The first method enables diversity by concatenating a random vector into inner image representation inside the neural network and by reweighting image features accordingly in the loss function. The second method allows diverse stylizations by passing the stylized image through orthogonal transformations, which impact the way the target style is transferred. These blocks are trained to replicate patterns from additional pattern images, which serve as additional input and provide an interpretable way to control stylization variability for the end user. Qualitative and quantitative comparisons demonstrate that both methods are capable to generate different stylizations with higher variability achieved by the second method. The code of both methods is available on github.
Few would disagree that artificial intelligence (AI) holds potential for advancing knowledge and innovation. Over the past decades, substantial research has been devoted to the development and application of AI in soi...
详细信息
Few would disagree that artificial intelligence (AI) holds potential for advancing knowledge and innovation. Over the past decades, substantial research has been devoted to the development and application of AI in soil science. While most of today's AI applications in soil science are related to machine learning (ML), AI also encompasses other fields such as digital image analysis, natural language processing (NLP), expert systems, and knowledge representation. This review aims to provide a comprehensive overview of AI in soil science. A definition of AI that equates intelligence with rationality is provided, followed by a typical classification of AI into the three main domains of sensing and interacting, reasoning and decision-making, and learning and predicting. From this framework, a taxonomy of AI in soil research is derived and serves as a basis for a literature review. The major findings are as follows: i) AI in soil science is diverse, with applications in decision support systems, image classification, prediction with ML and expert systems;ii) AI in soil science is currently almost exclusively characterized by ML;iii) applications of ML are predominantly found in the field of digital soil mapping and for the development of pedotransfer functions;and iv) most AI applications are used for prediction purposes. A few notable exceptions stand apart from mainstream applications, particularly in the realms of NLP, the development of soil cognitive models, and interpretable ML. Based on these findings, I discuss attention points, such as using AI almost exclusively for prediction at the expense of explanation and the lack of integration of soil knowledge in algorithmic AI solutions. I envision that future developments could include the use of AI for text recognition of legacy soil profile data, providing a new source of soil information. Another promising line of research is the language processing of soil texts to build meta-analyses that summarize the growing bod
Visual question answering (VQA) is a problem that researchers in both computer vision and natural language processing are interested in studying. In VQA, a system is given an image and a question in natural language a...
详细信息
Visual question answering (VQA) is a problem that researchers in both computer vision and natural language processing are interested in studying. In VQA, a system is given an image and a question in natural language about that image. The VQA system is then expected to answer in natural language. To find the right answer, a VQA algorithm may need to use common sense to make sense of the information in the image and external knowledge. In this paper, we discuss some of the main ideas behind VQA systems and provide a comprehensive literature survey of the current state of the art in VQA and visual reasoning from four perspectives: problem definition and challenges, approaches, existing datasets, and evaluation matrices. We conclude our survey with a discussion and some potential future research directions in this area to generate new ideas and creative approaches to solving current problems and developing new applications.
In aerial image classification, integrating advanced vision transformers with optimal preprocessing techniques is pivotal for enhancing model performance. This study presents SwinSight, a novel hierarchical vision tra...
详细信息
Pill image recognition by machinevision can reduce the risk of taking the wrong medications, a severe healthcare problem. Automated dispensing machines or home applications both need reliable imageprocessing techniq...
详细信息
Pill image recognition by machinevision can reduce the risk of taking the wrong medications, a severe healthcare problem. Automated dispensing machines or home applications both need reliable imageprocessing techniques to compete with the problem of changing viewing conditions, large number of classes, and the similarity in pill appearance. The problem is attacked with a multi-stream, two-phase metric embedding neural model. To enhance the metric learning procedure, dynamic margin setting is introduced into the loss function. Moreover, it is shown that besides the visual features of drug samples, even free text of drug leaflets (processed with a natural language model) can be used to set the value of the margin in the triplet loss and thus increase the recognition accuracy of testing. Thus, besides using the conventional metric learning approach, the given discriminating features can be explicitly injected into the metric model using the NLP of the free text of pill leaflets or descriptors of images of selected pills. The performance on two datasets is analysed and a 1.6% (two-sided) and 2.89% (one-sided) increase in Top-1 accuracy on the CURE dataset is reported compared to existing best results. The inference time on CPU and GPU makes the proposed model suitable for different kinds of applications in medical pill verification;moreover, the approach applies to other areas of object recognition where few-shot problems arise. The proposed high-level feature injection method (into a low-level metric learning model) can also be exploited in other cases, where class features can be well described with textual or visual cues.
The image compression field is witnessing a shift in paradigm thanks to the rise of neural network-based models. In this context, the JPEG committee is in the process of standardizing the first learning-based image co...
详细信息
ISBN:
(纸本)9781510679344;9781510679351
The image compression field is witnessing a shift in paradigm thanks to the rise of neural network-based models. In this context, the JPEG committee is in the process of standardizing the first learning-based image compression standard, known as JPEG AI. While most of the research to date has focused on image coding for humans, JPEG AI plans to address both human and machinevision by presenting several non-normative decoders addressing multiple imageprocessing and computer vision tasks in addition to standard reconstruction. While the impact of conventional image compression on computer vision tasks has already been addressed, no study has been conducted to assess the impact of learning-based image compression on such tasks. In this paper, the impact of learning-based image compression, including JPEG AI, on computer vision tasks is reviewed and discussed, mainly focusing on the image classification task along with object detection and segmentation. This study reviews the impact of image compression with JPEG AI on various computer vision models. It shows the superiority of JPEG AI over other conventional and learning-based compression models.
Visual object tracking, crucial in aerial applications such as surveillance, cinematography, and chasing, faces challenges despite AI advancements. Current solutions lack full reliability, leading to common tracking f...
详细信息
Visual object tracking, crucial in aerial applications such as surveillance, cinematography, and chasing, faces challenges despite AI advancements. Current solutions lack full reliability, leading to common tracking failures in the presence of fast motions or long-term occlusions of the subject. To tackle this issue, a 3D motion model is proposed that employs camera/vehicle states to locate a subject in the inertial coordinates. Next, a probability distribution is generated over future trajectories and they are sampled using a Monte Carlo technique to provide search regions that are fed into an online appearance learning process. This 3D motion model incorporates machine-learning approaches for direct range estimation from monocular images. The model adapts computationally by adjusting search areas based on tracking confidence. It is integrated into DiMP, an online and deep learning-based appearance model. The resulting tracker is evaluated on the VIOT dataset with sequences of both images and camera states, achieving a 68.9% tracking precision compared to DiMP's 49.7%. This approach demonstrates increased tracking duration, improved recovery after occlusions, and faster motions. Additionally, this strategy outperforms random searches by about 3.0%. The air-to-ground visual object tracking has several applications, including surveillance, cinematography, and chasing. Briefly, camera states and a vision-based range estimation are added to the tracking method to locate the target in inertial coordinates and introduce a probability distribution to predict the future positions of the target. The results of adding this motion model to the DiMP tracker demonstrate a 19.2% tracking precision improvement. image
Detection of sky regions is one of the most crucial challenges in imageprocessing and computer visionapplications, including scene parsing, picture retrieval, weather forecasting, and robot navigation. However, it i...
详细信息
Detection of sky regions is one of the most crucial challenges in imageprocessing and computer visionapplications, including scene parsing, picture retrieval, weather forecasting, and robot navigation. However, it is challenging to detect sky regions under certain circumstances, particularly in gloomy and overcast conditions. This study aims to summarize sky region detection approaches, challenges, and applications together. Additionally, classical, and deep learning-based approaches have been delineated. An extensive literature review has been conducted to achieve the objectives of the study. It has emerged that various machine and deep learning approaches have been proposed. Unfortunately, most of the approaches lose efficacy when encountering overcasting or lighting conditions, as most of the approaches are trained on an ideal dataset. Moreover, a taxonomy of sky region detection challenges has been proposed, categorizing the identified challenges into edge-based, color-based, texture-based, deep learning-based methods, etc. The challenging datasets that are being utilized for robust sky detection methods have been presented.
Hyperspectral anomaly detection is crucial for applications like aerial surveillance in remote sensing images. However, robust identification of anomalous pixels remains challenging. A novel spectral-spatial anomaly d...
详细信息
ISBN:
(纸本)9798350350494;9798350350500
Hyperspectral anomaly detection is crucial for applications like aerial surveillance in remote sensing images. However, robust identification of anomalous pixels remains challenging. A novel spectral-spatial anomaly detection technique called Dual-Domain Autoencoders (DDA) is proposed to address these challenges. First, Nonnegative Matrix Factorization (NMF) is applied to decompose the hyperspectral data into anomaly and background components. Refinement of the designation is then done using intersection masking. Next, a spectral autoencoder is trained on identified background signature pixels and used to reconstruct the image. The reconstruction error highlights spectral anomalies. Furthermore, a spatial autoencoder is trained on principal component patches from likely background areas. Fused reconstruction error from the spectral and spatial autoencoders is finally used to give enhanced anomaly detection. Experiments demonstrate higher AUC for DDA over individual autoencoders and benchmark methods. The integration of matrix factorization and dual-domain, fused autoencoders thus provides superior anomaly identification. Spatial modeling further constrains the background, enabling accurate flagging of unusual local hyperspectral patterns. This study provides the effectiveness of employing autoencoders trained on intelligently sampled hyperspectral pixel signatures and spatial features for improved spectral-spatial anomaly detection.
Underwater imageprocessing has received tremendous attention in the past few years. The reason for increased research in this area is that the process of taking images underwater is very difficult. images obtained un...
详细信息
Underwater imageprocessing has received tremendous attention in the past few years. The reason for increased research in this area is that the process of taking images underwater is very difficult. images obtained underwater frequently suffer from quality deterioration issues such as poor contrast, blurring features, colour variations, non-uniform lighting, the presence of dust particles, noise at the bottom of the sea, different properties of the water medium, and so on. The improvement of underwater images is a critical problem in imageprocessing and computer vision for a variety of practical applications. To address this problem, we need to find some other methods to increase the quality of the image while capturing it underwater. But capturing the image in normal circumstances as well as underwater is the same, so once we get an image, some mechanism to increase the quality of the captured image will also be required. A complete and in-depth study of relevant accomplishments and developments, particularly the survey of underwater image methods and datasets, which are a critical issue in underwater imageprocessing and intelligent application, is still lacking. In this paper, we first provide a review of more than 85 articles on the most recent advancements in underwater image restoration methods, underwater image enhancement methods, and underwater image enhancement using deep learning and machine learning methods, along with the techniques, data sets, and evaluation criteria. To provide a thorough grasp of underwater image restoration, enhancement, and enhancement using deep learning and machine learning, we explore the strengths and limits of existing techniques. Additionally, we offer thorough, unbiased reviews and evaluations of the representative methodologies for five distinct types of underwater situations, which vary their usefulness in various underwater circumstances. Two main evaluations, subjective image quality evaluation and objective image quali
暂无评论