In this study, an ad-hoc imageprocessing pipeline has been developed and proposed for the purpose of semantically segmenting wheat kernel data acquired through near-infrared hyperspectral imaging (HSI). The Gaussian ...
详细信息
In this study, an ad-hoc imageprocessing pipeline has been developed and proposed for the purpose of semantically segmenting wheat kernel data acquired through near-infrared hyperspectral imaging (HSI). The Gaussian Mixture Model (GMM), characterized as a soft clustering method, has been employed for this task, yielding noteworthy results in both kernel and germ segmentation. A comparative analysis was conducted, wherein GMM was compared with two hard clustering methods, hierarchical clustering and k-means, as well as other common clustering algorithms prevalent in food HSI applications. Notably, GMM exhibited the highest accuracy, with a Jaccard index of 0.745, surpassing hierarchical clustering at 0.698 and k-means at 0.652. Furthermore, the spectral variations observed in wheat kernel topology can be used for semantic image segmentation, especially in the context of selecting the germ portion within the wheat kernels. These findings carry practical significance for professionals in the fields of hyperspectral imaging (HSI) and machinevision, particularly for food product quality assessment and real-time inspection.
The task of image style transfer is to automatically redraw an input image in the style of another image, such as an artist's painting. The disadvantage of conventional stylization algorithms is the uniqueness of ...
详细信息
The task of image style transfer is to automatically redraw an input image in the style of another image, such as an artist's painting. The disadvantage of conventional stylization algorithms is the uniqueness of result. If the user is not satisfied with the way the style was transferred, he has no option to remake the stylization. The paper provides an overview of existing style transfer methods that generate diverse results after each run and proposes two new methods. The first method enables diversity by concatenating a random vector into inner image representation inside the neural network and by reweighting image features accordingly in the loss function. The second method allows diverse stylizations by passing the stylized image through orthogonal transformations, which impact the way the target style is transferred. These blocks are trained to replicate patterns from additional pattern images, which serve as additional input and provide an interpretable way to control stylization variability for the end user. Qualitative and quantitative comparisons demonstrate that both methods are capable to generate different stylizations with higher variability achieved by the second method. The code of both methods is available on github.
Visual question answering (VQA) is a problem that researchers in both computer vision and natural language processing are interested in studying. In VQA, a system is given an image and a question in natural language a...
详细信息
Visual question answering (VQA) is a problem that researchers in both computer vision and natural language processing are interested in studying. In VQA, a system is given an image and a question in natural language about that image. The VQA system is then expected to answer in natural language. To find the right answer, a VQA algorithm may need to use common sense to make sense of the information in the image and external knowledge. In this paper, we discuss some of the main ideas behind VQA systems and provide a comprehensive literature survey of the current state of the art in VQA and visual reasoning from four perspectives: problem definition and challenges, approaches, existing datasets, and evaluation matrices. We conclude our survey with a discussion and some potential future research directions in this area to generate new ideas and creative approaches to solving current problems and developing new applications.
In aerial image classification, integrating advanced vision transformers with optimal preprocessing techniques is pivotal for enhancing model performance. This study presents SwinSight, a novel hierarchical vision tra...
详细信息
Pill image recognition by machinevision can reduce the risk of taking the wrong medications, a severe healthcare problem. Automated dispensing machines or home applications both need reliable imageprocessing techniq...
详细信息
Pill image recognition by machinevision can reduce the risk of taking the wrong medications, a severe healthcare problem. Automated dispensing machines or home applications both need reliable imageprocessing techniques to compete with the problem of changing viewing conditions, large number of classes, and the similarity in pill appearance. The problem is attacked with a multi-stream, two-phase metric embedding neural model. To enhance the metric learning procedure, dynamic margin setting is introduced into the loss function. Moreover, it is shown that besides the visual features of drug samples, even free text of drug leaflets (processed with a natural language model) can be used to set the value of the margin in the triplet loss and thus increase the recognition accuracy of testing. Thus, besides using the conventional metric learning approach, the given discriminating features can be explicitly injected into the metric model using the NLP of the free text of pill leaflets or descriptors of images of selected pills. The performance on two datasets is analysed and a 1.6% (two-sided) and 2.89% (one-sided) increase in Top-1 accuracy on the CURE dataset is reported compared to existing best results. The inference time on CPU and GPU makes the proposed model suitable for different kinds of applications in medical pill verification;moreover, the approach applies to other areas of object recognition where few-shot problems arise. The proposed high-level feature injection method (into a low-level metric learning model) can also be exploited in other cases, where class features can be well described with textual or visual cues.
Neutrosophic sets (NS) have referred to as interval fuzzy sets applied in minimizing the uncertainty and fuzziness in computer-vision and machine-learning communities and hence employed for several applications. As fa...
详细信息
The image compression field is witnessing a shift in paradigm thanks to the rise of neural network-based models. In this context, the JPEG committee is in the process of standardizing the first learning-based image co...
详细信息
ISBN:
(纸本)9781510679344;9781510679351
The image compression field is witnessing a shift in paradigm thanks to the rise of neural network-based models. In this context, the JPEG committee is in the process of standardizing the first learning-based image compression standard, known as JPEG AI. While most of the research to date has focused on image coding for humans, JPEG AI plans to address both human and machinevision by presenting several non-normative decoders addressing multiple imageprocessing and computer vision tasks in addition to standard reconstruction. While the impact of conventional image compression on computer vision tasks has already been addressed, no study has been conducted to assess the impact of learning-based image compression on such tasks. In this paper, the impact of learning-based image compression, including JPEG AI, on computer vision tasks is reviewed and discussed, mainly focusing on the image classification task along with object detection and segmentation. This study reviews the impact of image compression with JPEG AI on various computer vision models. It shows the superiority of JPEG AI over other conventional and learning-based compression models.
Visual object tracking, crucial in aerial applications such as surveillance, cinematography, and chasing, faces challenges despite AI advancements. Current solutions lack full reliability, leading to common tracking f...
详细信息
Visual object tracking, crucial in aerial applications such as surveillance, cinematography, and chasing, faces challenges despite AI advancements. Current solutions lack full reliability, leading to common tracking failures in the presence of fast motions or long-term occlusions of the subject. To tackle this issue, a 3D motion model is proposed that employs camera/vehicle states to locate a subject in the inertial coordinates. Next, a probability distribution is generated over future trajectories and they are sampled using a Monte Carlo technique to provide search regions that are fed into an online appearance learning process. This 3D motion model incorporates machine-learning approaches for direct range estimation from monocular images. The model adapts computationally by adjusting search areas based on tracking confidence. It is integrated into DiMP, an online and deep learning-based appearance model. The resulting tracker is evaluated on the VIOT dataset with sequences of both images and camera states, achieving a 68.9% tracking precision compared to DiMP's 49.7%. This approach demonstrates increased tracking duration, improved recovery after occlusions, and faster motions. Additionally, this strategy outperforms random searches by about 3.0%. The air-to-ground visual object tracking has several applications, including surveillance, cinematography, and chasing. Briefly, camera states and a vision-based range estimation are added to the tracking method to locate the target in inertial coordinates and introduce a probability distribution to predict the future positions of the target. The results of adding this motion model to the DiMP tracker demonstrate a 19.2% tracking precision improvement. image
Detection of sky regions is one of the most crucial challenges in imageprocessing and computer visionapplications, including scene parsing, picture retrieval, weather forecasting, and robot navigation. However, it i...
详细信息
Detection of sky regions is one of the most crucial challenges in imageprocessing and computer visionapplications, including scene parsing, picture retrieval, weather forecasting, and robot navigation. However, it is challenging to detect sky regions under certain circumstances, particularly in gloomy and overcast conditions. This study aims to summarize sky region detection approaches, challenges, and applications together. Additionally, classical, and deep learning-based approaches have been delineated. An extensive literature review has been conducted to achieve the objectives of the study. It has emerged that various machine and deep learning approaches have been proposed. Unfortunately, most of the approaches lose efficacy when encountering overcasting or lighting conditions, as most of the approaches are trained on an ideal dataset. Moreover, a taxonomy of sky region detection challenges has been proposed, categorizing the identified challenges into edge-based, color-based, texture-based, deep learning-based methods, etc. The challenging datasets that are being utilized for robust sky detection methods have been presented.
Underwater imageprocessing has received tremendous attention in the past few years. The reason for increased research in this area is that the process of taking images underwater is very difficult. images obtained un...
详细信息
Underwater imageprocessing has received tremendous attention in the past few years. The reason for increased research in this area is that the process of taking images underwater is very difficult. images obtained underwater frequently suffer from quality deterioration issues such as poor contrast, blurring features, colour variations, non-uniform lighting, the presence of dust particles, noise at the bottom of the sea, different properties of the water medium, and so on. The improvement of underwater images is a critical problem in imageprocessing and computer vision for a variety of practical applications. To address this problem, we need to find some other methods to increase the quality of the image while capturing it underwater. But capturing the image in normal circumstances as well as underwater is the same, so once we get an image, some mechanism to increase the quality of the captured image will also be required. A complete and in-depth study of relevant accomplishments and developments, particularly the survey of underwater image methods and datasets, which are a critical issue in underwater imageprocessing and intelligent application, is still lacking. In this paper, we first provide a review of more than 85 articles on the most recent advancements in underwater image restoration methods, underwater image enhancement methods, and underwater image enhancement using deep learning and machine learning methods, along with the techniques, data sets, and evaluation criteria. To provide a thorough grasp of underwater image restoration, enhancement, and enhancement using deep learning and machine learning, we explore the strengths and limits of existing techniques. Additionally, we offer thorough, unbiased reviews and evaluations of the representative methodologies for five distinct types of underwater situations, which vary their usefulness in various underwater circumstances. Two main evaluations, subjective image quality evaluation and objective image quali
暂无评论