检索结果-内蒙古大学图书馆

Gaussian mixture model clustering allows accurate semantic image segmentation of wheat kernels from near-infrared hyperspectral images

引用

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS 2025年 259卷

作者： Kartakoullis, Andreas Caporaso, Nicola Whitworth, Martin B. Fisk, Ian D. Univ Nottingham Sch Biosci Div Div Food Nutr & Dietet Sutton Bonington Campus Nottingham LE12 5RD England Campden BRI Chipping Campden GL55 6LD Glos England Buhler UK Ltd London E16 2LD England

In this study, an ad-hoc image processing pipeline has been developed and proposed for the purpose of semantically segmenting wheat kernel data acquired through near-infrared hyperspectral imaging (HSI). The Gaussian Mixture Model (GMM), characterized as a soft clustering method, has been employed for this task, yielding noteworthy results in both kernel and germ segmentation. A comparative analysis was conducted, wherein GMM was compared with two hard clustering methods, hierarchical clustering and k-means, as well as other common clustering algorithms prevalent in food HSI applications. Notably, GMM exhibited the highest accuracy, with a Jaccard index of 0.745, surpassing hierarchical clustering at 0.698 and k-means at 0.652. Furthermore, the spectral variations observed in wheat kernel topology can be used for semantic image segmentation, especially in the context of selecting the germ portion within the wheat kernels. These findings carry practical significance for professionals in the fields of hyperspectral imaging (HSI) and machine vision, particularly for food product quality assessment and real-time inspection.

关键词： NIR hyperspectral imaging image segmentation Real time image processing Grain computer vision

来源：评论

学校读者我要写书评

暂无评论

Diversified image style transfer-approaches, new methods and directed variability control

引用

machine vision AND applications 2025年第2期36卷 1-12页

作者： Ustyuzhanin, Alexander Kitov, Victor Kitov, Vladimir Yandex Technol LLC Artificial Intelligence & Res Dept 16 Lva Tolstogo St Moscow 119021 Russia Plekhanov Russian Univ Econ Artificial Intelligence Lab 36 Stremyanny Per Moscow 117997 Russia Lomonosov Moscow State Univ Fac Computat Math & Cybernet GSP-12nd Acad Bldg Moscow 119991 Russia

The task of image style transfer is to automatically redraw an input image in the style of another image, such as an artist's painting. The disadvantage of conventional stylization algorithms is the uniqueness of result. If the user is not satisfied with the way the style was transferred, he has no option to remake the stylization. The paper provides an overview of existing style transfer methods that generate diverse results after each run and proposes two new methods. The first method enables diversity by concatenating a random vector into inner image representation inside the neural network and by reweighting image features accordingly in the loss function. The second method allows diverse stylizations by passing the stylized image through orthogonal transformations, which impact the way the target style is transferred. These blocks are trained to replicate patterns from additional pattern images, which serve as additional input and provide an interpretable way to control stylization variability for the end user. Qualitative and quantitative comparisons demonstrate that both methods are capable to generate different stylizations with higher variability achieved by the second method. The code of both methods is available on github.

关键词： Artistic rendering Diversity image processing image generation Neural networks

来源：评论

学校读者我要写书评

暂无评论

VQA and Visual Reasoning: An overview of approaches, datasets, and future direction

引用

NEUROCOMPUTING 2025年 622卷

作者： Zakari, Rufai Yusuf Owusu, Jim Wilson Qin, Ke Wang, Hailin Lawal, Zaharaddeen Karami He, Tao Univ Elect Sci & Technol China Xiyuan Ave Chengdu 611731 Peoples R China Univ Brunei Darussalam Jalan Tungku Link BE-1410 Bandar Seri Begawan Brunei

Visual question answering (VQA) is a problem that researchers in both computer vision and natural language processing are interested in studying. In VQA, a system is given an image and a question in natural language about that image. The VQA system is then expected to answer in natural language. To find the right answer, a VQA algorithm may need to use common sense to make sense of the information in the image and external knowledge. In this paper, we discuss some of the main ideas behind VQA systems and provide a comprehensive literature survey of the current state of the art in VQA and visual reasoning from four perspectives: problem definition and challenges, approaches, existing datasets, and evaluation matrices. We conclude our survey with a discussion and some potential future research directions in this area to generate new ideas and creative approaches to solving current problems and developing new applications.

关键词： machine learning Deep learning VQA Natural language processing Reasoning Computer vision

来源：评论

学校读者我要写书评

暂无评论

SwinSight: a hierarchical vision transformer using shifted windows to leverage aerial image classification

引用

Multimedia Tools and applications 2024年第39期83卷 86457-86478页

作者： Pradhan, Praveen Kumar Das, Alloy Kumar, Amish Baruah, Udayan Sen, Biswaraj Ghosal, Palash Department of Information Technology Sikkim Manipal Institute of Technology Sikkim Manipal University East Sikkim Sikkim Majitar737136 India Centre for Computers and Communication Technology Sikkim Chisopani South Sikkim737136 India CVPR Unit Indian Statistical Institute 203 Barrackpore Trunk Road West Bengal Kolkata700108 India Birangana Sati Sadhani Rajyik Vishwavidyalaya Assam Golaghat785621 India

In aerial image classification, integrating advanced vision transformers with optimal preprocessing techniques is pivotal for enhancing model performance. This study presents SwinSight, a novel hierarchical vision transformer optimized for aerial image classification, which effectively addresses the computational challenges typically associated with transformers through a shifted window mechanism. The core of the research focuses on enhancing model performance by integrating a systematic preprocessing approach using Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), and Fast Fourier Transform (FFT). An extensive ablation study evaluates six permutations of these techniques, aiming to identify the most effective sequence for preprocessing. Results indicate that the sequence of DCT, followed by DWT, then FFT, significantly excels, achieving a high classification accuracy of 93.16% and maintaining a rapid inference time of 0.0049 seconds per frame. This sequence’s superior performance highlights the critical role of preprocessing order in optimizing feature extraction, thereby boosting the efficacy of the classification process. SwinSight’s advancements not only set a new benchmark for aerial image analysis but also offer broader implications for enhancing image processing workflows in various applications, contributing to theoretical insights and practical improvements in image-based machine learning tasks. This paper not only offers a practical solution for aerial image classification for diverse applications such as agriculture, environmental monitoring, land use applications, security, and beyond but also presents a novel SAIOD (Sikkim Aerial images dataset for Object Detection) to the computer vision research community, fostering added advancements. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.

关键词： Discrete wavelet transforms

来源：评论

学校读者我要写书评

暂无评论

Metric-based pill recognition with the help of textual and visual cues

引用

IET image processing 2024年第14期18卷 4623-4638页

作者： Radli, Richard Voroshazi, Zsolt Czuni, Laszlo Univ Pannonia Image Proc Res Lab H-8200 Veszprem Hungary

Pill image recognition by machine vision can reduce the risk of taking the wrong medications, a severe healthcare problem. Automated dispensing machines or home applications both need reliable image processing techniques to compete with the problem of changing viewing conditions, large number of classes, and the similarity in pill appearance. The problem is attacked with a multi-stream, two-phase metric embedding neural model. To enhance the metric learning procedure, dynamic margin setting is introduced into the loss function. Moreover, it is shown that besides the visual features of drug samples, even free text of drug leaflets (processed with a natural language model) can be used to set the value of the margin in the triplet loss and thus increase the recognition accuracy of testing. Thus, besides using the conventional metric learning approach, the given discriminating features can be explicitly injected into the metric model using the NLP of the free text of pill leaflets or descriptors of images of selected pills. The performance on two datasets is analysed and a 1.6% (two-sided) and 2.89% (one-sided) increase in Top-1 accuracy on the CURE dataset is reported compared to existing best results. The inference time on CPU and GPU makes the proposed model suitable for different kinds of applications in medical pill verification;moreover, the approach applies to other areas of object recognition where few-shot problems arise. The proposed high-level feature injection method (into a low-level metric learning model) can also be exploited in other cases, where class features can be well described with textual or visual cues.

关键词： image processing neural net architecture object recognition

来源：评论

学校读者我要写书评

暂无评论

Kalman Bucy Filtered Neuro Fuzzy image Denoising for Medical image processing

引用

Neutrosophic Sets and Systems 2024年 70卷 314-330页

作者： Mohanapriya, G. Muthukumar, S. Santhosh Kumar, S. Shanmugapriya, M.M. Department of Mathematics Sri Ramakrishna Mission Vidyalaya College of Arts and Science Affiliated to Bharathiyar University Tamilnadu Coimbatore India Department of Mathematics KGiSL Institute of Technology Tamilnadu Coimbatore India Department of Mathematics Karpagam Academy of Higher Education Tamilnadu Coimbatore India

Neutrosophic sets (NS) have referred to as interval fuzzy sets applied in minimizing the uncertainty and fuzziness in computer-vision and machine-learning communities and hence employed for several applications. As far as medical image processing applications are concerned NSs are obtained as an important technique for de-noising. Also, fuzzy segmentation with machine and deep learning is determined as a familiar procedure that splits input image into distinct regions for precise learning. Several research works conducted in different image-processing domains. However, less works was focused on denoising and segmentation of medical image processing with minimal time complexity and accuracy. In this work we plan to develop a Kalman–Bucy Filtered Neutrosophic Neuro Fuzzy image Denoising (KBF-NNFID) method with the objective of reducing the noisy artifacts with higher peak signal-to-noise ratio in a computationally efficient manner. First, medical images obtained from Brain MRI LGG segmentation dataset are subjected to filtering employing Kalman Bucy Filtering algorithm with series of measurements examined. Second with the filtered medical images provided as input, uncertainty is handled by utilizing Neutrosophic Neuro Fuzzy set (NNFS) with help of the membership grade. With the aid of three membership grades, i.e., truth, indeterminacy and falsity, uncertainty involved in noisy image are said to be handled in a time efficient manner. By this way, an efficient image denoising process is performed with better PSNR. Experimental evaluation is carried out using medical images with different performance metrics such as enhanced PSNR and true positive rate up to 13%, 14% as well minimum execution time by 38% using medical images. © (2024), (University of New Mexico). All rights reserved.

关键词： image denoising

来源：评论

学校读者我要写书评

暂无评论

On the impact of learning-based image compression on computer vision tasks 47

On the impact of learning-based image compression on compute...

引用

Conference on applications of Digital image processing XLVII

作者： Akamatsu, Shunsuke Testolina, Michela Upenik, Evgeniy Ebrahimi, Touradj Waseda Univ Adv Multimedia Syst Lab Shillman Hall 4013-14-9 OkuboShinjuku Ku Tokyo 1690072 Japan Ecole Polytech Fed Lausanne EPFL Multimedia Signal Proc Grp MMSPG CH-1015 Lausanne Switzerland

ISBN: (纸本)9781510679344;9781510679351

The image compression field is witnessing a shift in paradigm thanks to the rise of neural network-based models. In this context, the JPEG committee is in the process of standardizing the first learning-based image compression standard, known as JPEG AI. While most of the research to date has focused on image coding for humans, JPEG AI plans to address both human and machine vision by presenting several non-normative decoders addressing multiple image processing and computer vision tasks in addition to standard reconstruction. While the impact of conventional image compression on computer vision tasks has already been addressed, no study has been conducted to assess the impact of learning-based image compression on such tasks. In this paper, the impact of learning-based image compression, including JPEG AI, on computer vision tasks is reviewed and discussed, mainly focusing on the image classification task along with object detection and segmentation. This study reviews the impact of image compression with JPEG AI on various computer vision models. It shows the superiority of JPEG AI over other conventional and learning-based compression models.

关键词： JPEG AI learning-based image compression computer vision image classification object detection

来源：评论

学校读者我要写书评

暂无评论

Probabilistic 3D motion model for object tracking in aerial applications

引用

IET image processing 2024年第8期18卷 2011-2027页

作者： Mirtajadini, Seyed Hojat Atashgah, MohammadAli Amiri Shahbazi, Mohammad Univ Tehran Fac New Sci & Technol Tehran Iran Iran Univ Sci & Technol Sch Mech Engn Tehran Iran

Visual object tracking, crucial in aerial applications such as surveillance, cinematography, and chasing, faces challenges despite AI advancements. Current solutions lack full reliability, leading to common tracking failures in the presence of fast motions or long-term occlusions of the subject. To tackle this issue, a 3D motion model is proposed that employs camera/vehicle states to locate a subject in the inertial coordinates. Next, a probability distribution is generated over future trajectories and they are sampled using a Monte Carlo technique to provide search regions that are fed into an online appearance learning process. This 3D motion model incorporates machine-learning approaches for direct range estimation from monocular images. The model adapts computationally by adjusting search areas based on tracking confidence. It is integrated into DiMP, an online and deep learning-based appearance model. The resulting tracker is evaluated on the VIOT dataset with sequences of both images and camera states, achieving a 68.9% tracking precision compared to DiMP's 49.7%. This approach demonstrates increased tracking duration, improved recovery after occlusions, and faster motions. Additionally, this strategy outperforms random searches by about 3.0%. The air-to-ground visual object tracking has several applications, including surveillance, cinematography, and chasing. Briefly, camera states and a vision-based range estimation are added to the tracking method to locate the target in inertial coordinates and introduce a probability distribution to predict the future positions of the target. The results of adding this motion model to the DiMP tracker demonstrate a 19.2% tracking precision improvement. image

关键词： computer vision motion estimation object tracking

来源：评论

学校读者我要写书评

暂无评论

A Taxonomy of Sky Region Detection:Challenges and Solutions

引用

Data Intelligence 2024年第4期6卷 1057-1085页

作者： Athar.I.Alboqomi Rehan Ullah Khan Department of Information Technology College of ComputerQassim UniversityBuraydahSaudi Arabia

Detection of sky regions is one of the most crucial challenges in image processing and computer vision applications, including scene parsing, picture retrieval, weather forecasting, and robot navigation. However, it is challenging to detect sky regions under certain circumstances, particularly in gloomy and overcast conditions. This study aims to summarize sky region detection approaches, challenges, and applications together. Additionally, classical, and deep learning-based approaches have been delineated. An extensive literature review has been conducted to achieve the objectives of the study. It has emerged that various machine and deep learning approaches have been proposed. Unfortunately, most of the approaches lose efficacy when encountering overcasting or lighting conditions, as most of the approaches are trained on an ideal dataset. Moreover, a taxonomy of sky region detection challenges has been proposed, categorizing the identified challenges into edge-based, color-based, texture-based, deep learning-based methods, etc. The challenging datasets that are being utilized for robust sky detection methods have been presented.

关键词： Computer vision Semantic segmentation Sky detection image processing

来源：评论

学校读者我要写书评

暂无评论

A systematic review of the methodologies for the processing and enhancement of the underwater images

引用

MULTIMEDIA TOOLS AND applications 2023年第25期82卷 38371-38396页

作者： Singh, Nishant Bhat, Aruna Delhi Technol Univ Dept Comp Sci & Engn Delhi 110042 India

Underwater image processing has received tremendous attention in the past few years. The reason for increased research in this area is that the process of taking images underwater is very difficult. images obtained underwater frequently suffer from quality deterioration issues such as poor contrast, blurring features, colour variations, non-uniform lighting, the presence of dust particles, noise at the bottom of the sea, different properties of the water medium, and so on. The improvement of underwater images is a critical problem in image processing and computer vision for a variety of practical applications. To address this problem, we need to find some other methods to increase the quality of the image while capturing it underwater. But capturing the image in normal circumstances as well as underwater is the same, so once we get an image, some mechanism to increase the quality of the captured image will also be required. A complete and in-depth study of relevant accomplishments and developments, particularly the survey of underwater image methods and datasets, which are a critical issue in underwater image processing and intelligent application, is still lacking. In this paper, we first provide a review of more than 85 articles on the most recent advancements in underwater image restoration methods, underwater image enhancement methods, and underwater image enhancement using deep learning and machine learning methods, along with the techniques, data sets, and evaluation criteria. To provide a thorough grasp of underwater image restoration, enhancement, and enhancement using deep learning and machine learning, we explore the strengths and limits of existing techniques. Additionally, we offer thorough, unbiased reviews and evaluations of the representative methodologies for five distinct types of underwater situations, which vary their usefulness in various underwater circumstances. Two main evaluations, subjective image quality evaluation and objective image quali

关键词： Underwater image enhancement Underwater image restoration Deep learning image quality evaluation image dehazing image datasets

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：