检索结果-内蒙古大学图书馆

Diversified image style transfer-approaches, new methods and directed variability control

machine vision AND applications 2025年第2期36卷 1-12页

作者： Ustyuzhanin, Alexander Kitov, Victor Kitov, Vladimir Yandex Technol LLC Artificial Intelligence & Res Dept 16 Lva Tolstogo St Moscow 119021 Russia Plekhanov Russian Univ Econ Artificial Intelligence Lab 36 Stremyanny Per Moscow 117997 Russia Lomonosov Moscow State Univ Fac Computat Math & Cybernet GSP-12nd Acad Bldg Moscow 119991 Russia

The task of image style transfer is to automatically redraw an input image in the style of another image, such as an artist's painting. The disadvantage of conventional stylization algorithms is the uniqueness of result. If the user is not satisfied with the way the style was transferred, he has no option to remake the stylization. The paper provides an overview of existing style transfer methods that generate diverse results after each run and proposes two new methods. The first method enables diversity by concatenating a random vector into inner image representation inside the neural network and by reweighting image features accordingly in the loss function. The second method allows diverse stylizations by passing the stylized image through orthogonal transformations, which impact the way the target style is transferred. These blocks are trained to replicate patterns from additional pattern images, which serve as additional input and provide an interpretable way to control stylization variability for the end user. Qualitative and quantitative comparisons demonstrate that both methods are capable to generate different stylizations with higher variability achieved by the second method. The code of both methods is available on github.

关键词： Artistic rendering Diversity image processing image generation Neural networks

来源：评论

学校读者我要写书评

暂无评论

Artificial intelligence in soil science

引用

EUROPEAN JOURNAL OF SOIL SCIENCE 2025年第2期76卷

作者： Wadoux, Alexandre M. J. -C. Univ Montpellier Inst Agro LISAH INRAEIRDAgroParisTech Montpellier France

Few would disagree that artificial intelligence (AI) holds potential for advancing knowledge and innovation. Over the past decades, substantial research has been devoted to the development and application of AI in soil science. While most of today's AI applications in soil science are related to machine learning (ML), AI also encompasses other fields such as digital image analysis, natural language processing (NLP), expert systems, and knowledge representation. This review aims to provide a comprehensive overview of AI in soil science. A definition of AI that equates intelligence with rationality is provided, followed by a typical classification of AI into the three main domains of sensing and interacting, reasoning and decision-making, and learning and predicting. From this framework, a taxonomy of AI in soil research is derived and serves as a basis for a literature review. The major findings are as follows: i) AI in soil science is diverse, with applications in decision support systems, image classification, prediction with ML and expert systems;ii) AI in soil science is currently almost exclusively characterized by ML;iii) applications of ML are predominantly found in the field of digital soil mapping and for the development of pedotransfer functions;and iv) most AI applications are used for prediction purposes. A few notable exceptions stand apart from mainstream applications, particularly in the realms of NLP, the development of soil cognitive models, and interpretable ML. Based on these findings, I discuss attention points, such as using AI almost exclusively for prediction at the expense of explanation and the lack of integration of soil knowledge in algorithmic AI solutions. I envision that future developments could include the use of AI for text recognition of legacy soil profile data, providing a new source of soil information. Another promising line of research is the language processing of soil texts to build meta-analyses that summarize the growing bod

关键词： decision-making deep learning expert systems information processing machine intelligence machine learning

来源：评论

学校读者我要写书评

暂无评论

VQA and Visual Reasoning: An overview of approaches, datasets, and future direction

引用

NEUROCOMPUTING 2025年 622卷

作者： Zakari, Rufai Yusuf Owusu, Jim Wilson Qin, Ke Wang, Hailin Lawal, Zaharaddeen Karami He, Tao Univ Elect Sci & Technol China Xiyuan Ave Chengdu 611731 Peoples R China Univ Brunei Darussalam Jalan Tungku Link BE-1410 Bandar Seri Begawan Brunei

Visual question answering (VQA) is a problem that researchers in both computer vision and natural language processing are interested in studying. In VQA, a system is given an image and a question in natural language about that image. The VQA system is then expected to answer in natural language. To find the right answer, a VQA algorithm may need to use common sense to make sense of the information in the image and external knowledge. In this paper, we discuss some of the main ideas behind VQA systems and provide a comprehensive literature survey of the current state of the art in VQA and visual reasoning from four perspectives: problem definition and challenges, approaches, existing datasets, and evaluation matrices. We conclude our survey with a discussion and some potential future research directions in this area to generate new ideas and creative approaches to solving current problems and developing new applications.

关键词： machine learning Deep learning VQA Natural language processing Reasoning Computer vision

来源：评论

学校读者我要写书评

暂无评论

SwinSight: a hierarchical vision transformer using shifted windows to leverage aerial image classification

引用

Multimedia Tools and applications 2024年第39期83卷 86457-86478页

作者： Pradhan, Praveen Kumar Das, Alloy Kumar, Amish Baruah, Udayan Sen, Biswaraj Ghosal, Palash Department of Information Technology Sikkim Manipal Institute of Technology Sikkim Manipal University East Sikkim Sikkim Majitar737136 India Centre for Computers and Communication Technology Sikkim Chisopani South Sikkim737136 India CVPR Unit Indian Statistical Institute 203 Barrackpore Trunk Road West Bengal Kolkata700108 India Birangana Sati Sadhani Rajyik Vishwavidyalaya Assam Golaghat785621 India

In aerial image classification, integrating advanced vision transformers with optimal preprocessing techniques is pivotal for enhancing model performance. This study presents SwinSight, a novel hierarchical vision transformer optimized for aerial image classification, which effectively addresses the computational challenges typically associated with transformers through a shifted window mechanism. The core of the research focuses on enhancing model performance by integrating a systematic preprocessing approach using Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), and Fast Fourier Transform (FFT). An extensive ablation study evaluates six permutations of these techniques, aiming to identify the most effective sequence for preprocessing. Results indicate that the sequence of DCT, followed by DWT, then FFT, significantly excels, achieving a high classification accuracy of 93.16% and maintaining a rapid inference time of 0.0049 seconds per frame. This sequence’s superior performance highlights the critical role of preprocessing order in optimizing feature extraction, thereby boosting the efficacy of the classification process. SwinSight’s advancements not only set a new benchmark for aerial image analysis but also offer broader implications for enhancing image processing workflows in various applications, contributing to theoretical insights and practical improvements in image-based machine learning tasks. This paper not only offers a practical solution for aerial image classification for diverse applications such as agriculture, environmental monitoring, land use applications, security, and beyond but also presents a novel SAIOD (Sikkim Aerial images dataset for Object Detection) to the computer vision research community, fostering added advancements. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.

关键词： Discrete wavelet transforms

来源：评论

学校读者我要写书评

暂无评论

Metric-based pill recognition with the help of textual and visual cues

引用

IET image processing 2024年第14期18卷 4623-4638页

作者： Radli, Richard Voroshazi, Zsolt Czuni, Laszlo Univ Pannonia Image Proc Res Lab H-8200 Veszprem Hungary

Pill image recognition by machine vision can reduce the risk of taking the wrong medications, a severe healthcare problem. Automated dispensing machines or home applications both need reliable image processing techniques to compete with the problem of changing viewing conditions, large number of classes, and the similarity in pill appearance. The problem is attacked with a multi-stream, two-phase metric embedding neural model. To enhance the metric learning procedure, dynamic margin setting is introduced into the loss function. Moreover, it is shown that besides the visual features of drug samples, even free text of drug leaflets (processed with a natural language model) can be used to set the value of the margin in the triplet loss and thus increase the recognition accuracy of testing. Thus, besides using the conventional metric learning approach, the given discriminating features can be explicitly injected into the metric model using the NLP of the free text of pill leaflets or descriptors of images of selected pills. The performance on two datasets is analysed and a 1.6% (two-sided) and 2.89% (one-sided) increase in Top-1 accuracy on the CURE dataset is reported compared to existing best results. The inference time on CPU and GPU makes the proposed model suitable for different kinds of applications in medical pill verification;moreover, the approach applies to other areas of object recognition where few-shot problems arise. The proposed high-level feature injection method (into a low-level metric learning model) can also be exploited in other cases, where class features can be well described with textual or visual cues.

关键词： image processing neural net architecture object recognition

来源：评论

学校读者我要写书评

暂无评论

On the impact of learning-based image compression on computer vision tasks 47

On the impact of learning-based image compression on compute...

引用

Conference on applications of Digital image processing XLVII

作者： Akamatsu, Shunsuke Testolina, Michela Upenik, Evgeniy Ebrahimi, Touradj Waseda Univ Adv Multimedia Syst Lab Shillman Hall 4013-14-9 OkuboShinjuku Ku Tokyo 1690072 Japan Ecole Polytech Fed Lausanne EPFL Multimedia Signal Proc Grp MMSPG CH-1015 Lausanne Switzerland

ISBN: (纸本)9781510679344;9781510679351

The image compression field is witnessing a shift in paradigm thanks to the rise of neural network-based models. In this context, the JPEG committee is in the process of standardizing the first learning-based image compression standard, known as JPEG AI. While most of the research to date has focused on image coding for humans, JPEG AI plans to address both human and machine vision by presenting several non-normative decoders addressing multiple image processing and computer vision tasks in addition to standard reconstruction. While the impact of conventional image compression on computer vision tasks has already been addressed, no study has been conducted to assess the impact of learning-based image compression on such tasks. In this paper, the impact of learning-based image compression, including JPEG AI, on computer vision tasks is reviewed and discussed, mainly focusing on the image classification task along with object detection and segmentation. This study reviews the impact of image compression with JPEG AI on various computer vision models. It shows the superiority of JPEG AI over other conventional and learning-based compression models.

关键词： JPEG AI learning-based image compression computer vision image classification object detection

来源：评论

学校读者我要写书评

暂无评论

Probabilistic 3D motion model for object tracking in aerial applications

引用

IET image processing 2024年第8期18卷 2011-2027页

作者： Mirtajadini, Seyed Hojat Atashgah, MohammadAli Amiri Shahbazi, Mohammad Univ Tehran Fac New Sci & Technol Tehran Iran Iran Univ Sci & Technol Sch Mech Engn Tehran Iran

Visual object tracking, crucial in aerial applications such as surveillance, cinematography, and chasing, faces challenges despite AI advancements. Current solutions lack full reliability, leading to common tracking failures in the presence of fast motions or long-term occlusions of the subject. To tackle this issue, a 3D motion model is proposed that employs camera/vehicle states to locate a subject in the inertial coordinates. Next, a probability distribution is generated over future trajectories and they are sampled using a Monte Carlo technique to provide search regions that are fed into an online appearance learning process. This 3D motion model incorporates machine-learning approaches for direct range estimation from monocular images. The model adapts computationally by adjusting search areas based on tracking confidence. It is integrated into DiMP, an online and deep learning-based appearance model. The resulting tracker is evaluated on the VIOT dataset with sequences of both images and camera states, achieving a 68.9% tracking precision compared to DiMP's 49.7%. This approach demonstrates increased tracking duration, improved recovery after occlusions, and faster motions. Additionally, this strategy outperforms random searches by about 3.0%. The air-to-ground visual object tracking has several applications, including surveillance, cinematography, and chasing. Briefly, camera states and a vision-based range estimation are added to the tracking method to locate the target in inertial coordinates and introduce a probability distribution to predict the future positions of the target. The results of adding this motion model to the DiMP tracker demonstrate a 19.2% tracking precision improvement. image

关键词： computer vision motion estimation object tracking

来源：评论

学校读者我要写书评

暂无评论

A Taxonomy of Sky Region Detection:Challenges and Solutions

引用

Data Intelligence 2024年第4期6卷 1057-1085页

作者： Athar.I.Alboqomi Rehan Ullah Khan Department of Information Technology College of ComputerQassim UniversityBuraydahSaudi Arabia

Detection of sky regions is one of the most crucial challenges in image processing and computer vision applications, including scene parsing, picture retrieval, weather forecasting, and robot navigation. However, it is challenging to detect sky regions under certain circumstances, particularly in gloomy and overcast conditions. This study aims to summarize sky region detection approaches, challenges, and applications together. Additionally, classical, and deep learning-based approaches have been delineated. An extensive literature review has been conducted to achieve the objectives of the study. It has emerged that various machine and deep learning approaches have been proposed. Unfortunately, most of the approaches lose efficacy when encountering overcasting or lighting conditions, as most of the approaches are trained on an ideal dataset. Moreover, a taxonomy of sky region detection challenges has been proposed, categorizing the identified challenges into edge-based, color-based, texture-based, deep learning-based methods, etc. The challenging datasets that are being utilized for robust sky detection methods have been presented.

关键词： Computer vision Semantic segmentation Sky detection image processing

来源：评论

学校读者我要写书评

暂无评论

Spectral-Spatial Anomaly Detection in Hyperspectral imagery Based on Dual-Domain Autoencoders 13

Spectral-Spatial Anomaly Detection in Hyperspectral Imagery ...

引用

13th Iranian/3rd International machine vision and image processing Conference (MVIP)

作者： Aghili, Mohamad Ebrahim Ghassemian, Hassan Arani, Maryam Imani Tarbiat Modares Univ Fac Elect & Comp Engn Image Proc & Informat Anal Lab Tehran Iran

ISBN: (纸本)9798350350494;9798350350500

Hyperspectral anomaly detection is crucial for applications like aerial surveillance in remote sensing images. However, robust identification of anomalous pixels remains challenging. A novel spectral-spatial anomaly detection technique called Dual-Domain Autoencoders (DDA) is proposed to address these challenges. First, Nonnegative Matrix Factorization (NMF) is applied to decompose the hyperspectral data into anomaly and background components. Refinement of the designation is then done using intersection masking. Next, a spectral autoencoder is trained on identified background signature pixels and used to reconstruct the image. The reconstruction error highlights spectral anomalies. Furthermore, a spatial autoencoder is trained on principal component patches from likely background areas. Fused reconstruction error from the spectral and spatial autoencoders is finally used to give enhanced anomaly detection. Experiments demonstrate higher AUC for DDA over individual autoencoders and benchmark methods. The integration of matrix factorization and dual-domain, fused autoencoders thus provides superior anomaly identification. Spatial modeling further constrains the background, enabling accurate flagging of unusual local hyperspectral patterns. This study provides the effectiveness of employing autoencoders trained on intelligently sampled hyperspectral pixel signatures and spatial features for improved spectral-spatial anomaly detection.

关键词： hyperspectral image anomaly detection autoencoder matrix factorization spectral-spatial features

来源：评论

学校读者我要写书评

暂无评论

A systematic review of the methodologies for the processing and enhancement of the underwater images

引用

MULTIMEDIA TOOLS AND applications 2023年第25期82卷 38371-38396页

作者： Singh, Nishant Bhat, Aruna Delhi Technol Univ Dept Comp Sci & Engn Delhi 110042 India

Underwater image processing has received tremendous attention in the past few years. The reason for increased research in this area is that the process of taking images underwater is very difficult. images obtained underwater frequently suffer from quality deterioration issues such as poor contrast, blurring features, colour variations, non-uniform lighting, the presence of dust particles, noise at the bottom of the sea, different properties of the water medium, and so on. The improvement of underwater images is a critical problem in image processing and computer vision for a variety of practical applications. To address this problem, we need to find some other methods to increase the quality of the image while capturing it underwater. But capturing the image in normal circumstances as well as underwater is the same, so once we get an image, some mechanism to increase the quality of the captured image will also be required. A complete and in-depth study of relevant accomplishments and developments, particularly the survey of underwater image methods and datasets, which are a critical issue in underwater image processing and intelligent application, is still lacking. In this paper, we first provide a review of more than 85 articles on the most recent advancements in underwater image restoration methods, underwater image enhancement methods, and underwater image enhancement using deep learning and machine learning methods, along with the techniques, data sets, and evaluation criteria. To provide a thorough grasp of underwater image restoration, enhancement, and enhancement using deep learning and machine learning, we explore the strengths and limits of existing techniques. Additionally, we offer thorough, unbiased reviews and evaluations of the representative methodologies for five distinct types of underwater situations, which vary their usefulness in various underwater circumstances. Two main evaluations, subjective image quality evaluation and objective image quali

关键词： Underwater image enhancement Underwater image restoration Deep learning image quality evaluation image dehazing image datasets

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：