检索结果-内蒙古大学图书馆

A survey on deep learning in UAv imagery for precision agriculture and wild flora monitoring: Datasets, models and challenges

引用

SMART AGRICULTURAL TECHNOLOGY 2024年 9卷

作者： Epifani, Lorenzo Caruso, Antonio Palazzo Fiorini Dept Math & Phys Ennio Giorgi Campus Ecotekne I-73100 Lecce Italy

machine learning is the state of the art for many recurring tasks in several heterogeneous domains. In the last decade, it has been also widely used in Precision Agriculture (PA) and Wild Flora Monitoring (WFM) to address a set of problems with a big impact on economy, society and academia, heralding a paradigm shift across the industry and academia. Many applications in those fields involve image processing and computer vision stages. Remote sensing devices are very popular choice for image acquisition in this context, and in particular, Unmanned Aerial vehicles (UAvs) offer a good tradeoff between cost and area coverage. For these reasons, research literature is rich of works that face problems in Precision Agriculture and Wild Flora Monitoring domains with machine learning/computer vision methods applied to UAv imagery. In this work, we review this literature, with a special focus on algorithms, model sizing, dataset characteristics and innovative technical solutions presented in many domain-specific models, providing the reader with an overview of the research trend in recent years.

关键词： machine learning Deep neural networks image analysis Unmanned aerial vehicles Agritech

来源：评论

学校读者我要写书评

暂无评论

LEGIT: TEXT LEGIBILITY FOR USER-GENERATED MEDIA 31

LEGIT: TEXT LEGIBILITY FOR USER-GENERATED MEDIA

引用

2024 International Conference on image processing

作者： Mandal, Maniratnam Birkbeck, Neil Adsumilli, Balu Bovik, Alan C. Univ Texas Austin Austin TX 78712 USA YouTube Mountain View CA USA Google Inc Mountain View CA USA

ISBN: (纸本)9798350349405;9798350349399

User-generated content (UGC) is ubiquitous across the internet as a result of billions of videos and images being uploaded each day. All kinds of UGC media are affected by natural distortions, occurring both during and after capture, which are inherently diverse and commingled. These distortions have different perceptual effects based on the media content. Given recent dramatic increases in the consumption of short-form content, the analysis and control of their perceptual quality has become an important problem. Regardless of the content, many UGC videos have overlaid and embedded texts in them, which are visually salient. Hence text quality has a significant impact on the global perception of video or image quality and needs to be studied. One of the most important factors in perceptual text quality in user-generated media is legibility, which has been studied very little in the context of computer vision. Predicting text legibility can also help in text recognition applications such as image search or document identification. This work aims at modeling text legibility using computer vision techniques and thus studying the relationship between text quality and legibility. We propose a modified dataset variant of COCO-Text [1] and a model for predicting text legibility for both handwritten and machine-generated texts. We also demonstrate how models trained to predict text legibility can help in the prediction of text (perceptual) quality. The dataset and models can be accessed here https://***/research/Quality/***.

关键词： image Quality Assessment Text Quality Text Legibility User-generated Content

来源：评论

学校读者我要写书评

暂无评论

Energy Efficiency Through In-Sensor Computing: ADC-less Real-Time Sensing for image Edge Detection 24

Energy Efficiency Through In-Sensor Computing: ADC-less Real...

引用

29th ACM / IEEE International Symposium on Low Power Electronics and Design (ISLPED)

作者： Modak, Nirmoy Roy, Kaushik Purdue Univ W Lafayette IN 47907 USA

ISBN: (纸本)9798400706882

In-sensor computing has revolutionized modern vision-based applications, particularly in scenarios like autonomous vehicles and robotics where real-time or near-real-time processing is crucial. By enabling data processing at the sensor level, in-sensor computing eliminates the need to transmit data to cloud servers, significantly reducing latency and enhancing decision-making speed. Central to the in-sensor computing paradigm, CMOS image sensors (CISs) with edge computing, play a pivotal role in machine vision applications. The need for high resolution, low power, and real-time operation aligns seamlessly with the demands of modern vision-based applications. In this paper, we propose a novel approach for real-time image edge detection with an in-sensor, ADC-less sensing solution that achieves high energy efficiency and speed. The design utilizes the column-parallel architecture of existing CIS and the row-wise pixel readout scheme. Column voltages of three consecutive rows with a delay arrangement extract 4-bit edge pixels without deriving the actual digital image pixels. A time-to-digital conversion (TDC) technique using a 4-bit counter eliminates the requirement of power-hungry ADC. A 256(H) x 256(v) 2D CMOS pixel array with 10.. m pixel pitch is simulated using Spectre in TSMC 65nm low-power technology. CMOS pixels with wide dynamic range (WDR) capture the light intensity variation up to 92dB [10]. Simulation results show energy consumption of 2pWper pixel per frame, operating at a frame rate of 3.9kfps, all well-contained within a modest 0.5 mW power budget. The resultant frame rate emerges as notably superior in terms of speed, accompanied by a more than tenfold reduction in power consumption per edge frame-pixel compared to the existing prior art.

关键词： CMOS image sensor (CIS) column-parallel (CP) imaging Wide Dynamic Range (WDR) image Signal Processor (ISP) time-to-digital conversion (TDC) ADC-less CIS On-chip image edge detection

来源：评论

学校读者我要写书评

暂无评论

Classification of microplanktons in an imbalanced digital holographic image dataset with a deep network using channel attention

引用

JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS image SCIENCE AND vision 2025年第4期42卷 512-520页

作者： Shrihari, A. Guha, Prithwijit Kulkarni, Rishikesh Dilip Indian Inst Technol Guwahati Ctr Intelligent Cyber Phys Syst Gauhati India Indian Inst Technol Guwahati Dept Elect & Elect Engn Gauhati India

Classifying microplanktons in digital holographic images is challenging due to a multitude of factors. For instance, shifts in viewpoint can alter how microplanktons are perceived, while illumination changes can affect the visibility of certain features. Geometric anomalies can distort the shapes of these microplanktons, and the presence of noise within the digital holographic microscope can further alter local image features. Additionally, the difficulty in data collection results in dataset imbalance leading to a biased classification problem. These class-imbalanced datasets pose a considerable hurdle in machine learning applications. Here, categorical representations tend to favor majority classes while neglecting minority classes that are equally important for a comprehensive understanding of microplankton diversity. Accordingly, this research contributes what we believe to be a novel debiasing method using channel attention blocks (DCABs) and a novel attention product. It enhances the model's ability to focus on relevant features while mitigating the effects of bias. This method was applied on six biased models, viz., vGG16, ResNet50v2, ResNet152v2, Inceptionv3, Xception, ShuffleNetv2, and ShincNet. The proposed method achieved a significant reduction in the degree of bias (DoB) and KL divergence (KL) values for all the six biased models. With just 6.68M parameters and 6.4 GFLOPs, the DCAB for ShincNet demonstrated a competitive performance in terms of DoB (0.125) and KL (0.82) compared to four state-of-the-art debiasing techniques. (c) 2025 Optica Publishing Group. All rights, including for text and data mining (TDM), Artificial Intelligence (AI) training, and similar technologies, are reserved.

关键词： Digital holography Fresnel diffraction Holographic microscopy Imaging systems machine learning Three dimensional imaging

来源：评论

学校读者我要写书评

暂无评论

The generative adversarial networks and its application in machine vision

引用

ENTERPRISE INFORMATION SYSTEMS 2022年第2期16卷 326-346页

作者： Zhang, Dongbo Huang Dongru Kang, Lanlan Zhang, Wei Guangdong Inst Intelligent Mfg Guangdong Key Lab Modern Control Technol Guangzhou Guangdong Peoples R China Guangdong Univ Finance & Econ Publ Procurement Res Ctr Guangzhou Guangdong Peoples R China JiangXi Univ Sci & Technol Dept Comp Engn 156 KeJia Ave Ganzhou Jiangxi Peoples R China Natl Radio & Televis Adm Acad Broadcasting Sci Beijing Peoples R China

In recent years, the model of improved GAN has been widely applied in the field of machine vision. It not only covers the traditional image processing, but also includes image conversion, image synthesis and so on.. Firstly, this paper describes the basic principles and existing problems of GAN, then introduces several improved GAN models, including Info-GAN, DC-GAN, f-GAN, Cat-GAN and others. Secondly, several improved GAN models for different applications in the field of machine vision are described. Finally, the future trend and development of GAN are prospected.

关键词： Deep learning machine vision generative network discriminative network generative adversarial network

来源：评论

学校读者我要写书评

暂无评论

Exploring optimizer efficiency for facial expression recognition with convolutional neural networks

引用

JOURNAL OF ENGINEERING-JOE 2025年第1期2025卷

作者： Madni, Syed Hamid Hussain Pathmanatan, Lokessh A. L. Faheem, Muhammad Shahzad, Hafiz Muhammad Faisal Shah, Sajid Univ Southampton Malaysia Sch Elect & Comp Sci Johor Baharu Malaysia Univ Teknol Malaysia Fac Comp Skudai Johor Malaysia Univ Vaasa Sch Technol & Innovat Vaasa Finland VTT Tech Res Ctr Finland Ltd Espoo Finland Univ Sargodha Dept Comp Sci Sargodha Pakistan

It's widely accepted that human expressions, considering for roughly sixty percent of all daily interactions, are among the most authentic forms of communication. Numerous studies are being conducted to explore the importance of facial expressions and the development of machine-assisted recognition techniques. Significant progress is being made in facial and expression recognition, largely due to the rapid growth of machine learning and computer vision. A variety of algorithmic approaches and methods exist for detecting and recognizing facial expressions and features. This study investigates various optimization algorithms used with convolutional neural networks for facial expression recognition. The primary focus is on Adam, RMSProp, stochastic gradient descent and AdaMax optimizers. A comprehensive comparison is being made, examining the key aspects of each optimizer, including its advantages and disadvantages. Furthermore, the study also incorporates findings from recent studies that used these optimizers in various applications, highlighting their performance in terms of training time and precision. The aim is to illuminate the process of selecting a suitable optimizer for specific applications, analysing the trade-offs between training speed and higher accuracy levels. Moreover, this study provides a deeper analysis of the role optimizers play in machine learning-based facial expression recognition models. The discussion of the technical challenges posed by these optimizers and future improvements for achieving much more optimal results concludes the study.

关键词： image classification image processing image recognition optimisation convolution neural network machine leanring

来源：评论

学校读者我要写书评

暂无评论

A Data Augmentation Method for Data-Driven Component Segmentation of Engineering Drawings

引用

JOURNAL OF COMPUTING AND INFORMATION SCIENCE IN ENGINEERING 2024年第1期24卷 011001页

作者： Zhang, Wentai Joseph, Joe Chen, Quan Koz, Can Xie, Liuyue Regmi, Amit Yamakawa, Soji Furuhata, Tomotake Shimada, Kenji Kara, Levent Burak Carnegie Mellon Univ Dept Mech Engn Pittsburgh PA 15213 USA

We present a new data generation method to facilitate an automatic machine interpretation of 2D engineering part drawings. While such drawings are a common medium for clients to encode design and manufacturing requirements, a lack of computer support to automatically interpret these drawings necessitates part manufacturers to resort to laborious manual approaches for interpretation which, in turn, severely limits processing capacity. Although recent advances in trainable computer vision methods may enable automatic machine interpretation, it remains challenging to apply such methods to engineering drawings due to a lack of labeled training data. As one step toward this challenge, we propose a constrained data synthesis method to generate an arbitrarily large set of synthetic training drawings using only a handful of labeled examples. Our method is based on the randomization of the dimension sets subject to two major constraints to ensure the validity of the synthetic drawings. The effectiveness of our method is demonstrated in the context of a binary component segmentation task with a proposed list of descriptors. An evaluation of several image segmentation methods trained on our synthetic dataset shows that our approach to new data generation can boost the segmentation accuracy and the generalizability of the machine learning models to unseen drawings.

关键词： computational synthesis computer aided design data-driven engineering machine learning for engineering applications

来源：评论

学校读者我要写书评

暂无评论

Research on Badminton Movement machine Learning Model Based on Computer vision Technology 2

Research on Badminton Movement Machine Learning Model Based ...

引用

2nd IEEE International Conference on image processing and Computer applications, ICIPCA 2024

作者： Zong, Cheng Hohhot Vocational College Hohhot China

ISBN: (纸本)9798350360240

This paper aims to explore an innovative method combining computer vision and machine learning to accurately identify and analyze various movements in badminton. This paper first summarizes the application prospect of computer vision in the field of sports analysis, and introduces its specific application scenarios in badminton in detail. By constructing a complete technical framework of image preprocessing module, feature extraction algorithm and deep learning model, the complex movements of badminton players such as swing, stroke and moving pace are captured and analyzed. In the research process, we used multi-view image fusion and key point detection technology to accurately extract action features in badminton, combined with convolutional neural network (CNN), recurrent neural network (RNN), long term memory network (LSTM) and other deep learning models to efficiently learn and model these features. Thus, the automatic classification and recognition of badminton movement can be realized. The experimental results show that the model has significant accuracy in badminton action recognition, good generalization ability and practicability, and can be effectively applied in the badminton teaching and training process of athlete performance evaluation, competition data analysis and other aspects. This research result not only expands the practical application of computer vision technology in the field of badminton, but also provides new ideas and tools for further promoting the development of sports intelligence and digitalization. © 2024 IEEE.

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

Improving image classification of one-dimensional convolutional neural networks using Hilbert space-filling curves

引用

APPLIED INTELLIGENCE 2023年第22期53卷 26655-26671页

作者： verbruggen, Bert Ginis, vincent Vrije Univ Brussel Data Analyt Lab Pl Laan 2 B-1050 Brussels Belgium Harvard Univ Sch Engn & Appl Sci 9 Oxford St Cambridge MA 02138 USA

Convolutional neural networks (CNNs) have significantly contributed to recent advances in machine learning and computer vision. Although initially designed for image classification, the application of CNNs has stretched far beyond the context of images alone. Some exciting applications, e.g., in natural language processing and image segmentation, implement one-dimensional CNNs, often after a pre-processing step that transforms higher-dimensional input into a suitable data format for the networks. However, local correlations within data can diminish or vanish when one converts higher-dimensional data into a one-dimensional string. The Hilbert space-filling curve can minimize this loss of locality. Here, we study this claim rigorously by comparing an analytical model that quantifies locality preservation with the performance of several neural networks trained with and without Hilbert mappings. We find that Hilbert mappings offer a consistent advantage over the traditional flatten transformation in test accuracy and training speed. The results also depend on the chosen kernel size, agreeing with our analytical model. Our findings quantify the importance of locality preservation when transforming data before training a one-dimensional CNN and show that the Hilbert space-filling curve is a preferential transformation to achieve this goal.

关键词： image classification image transformation machine learning Convolutional neural network Supervised learning Data preprocessing

来源：评论

学校读者我要写书评

暂无评论

Macro-Scale Pattern Recognition and Coordinate Identification in Real-time Spatio-temporal Overlap for Photonics Engineering applications 22

Macro-Scale Pattern Recognition and Coordinate Identificatio...

引用

22nd IFAC Conference on Technology, Culture and International Stability (TECIS)

作者： Al-Juboori, Haider South East Technol Univ Fac Engn Dept Elect Engn & Commun 806 Killeshin BldgKilkenny Rd Carlow R93 V960 Ireland

The significance of high-speed machine vision in scientific and technological fields is growing, especially with the era of Industry 4.0 technologies. There are several pattern-matching algorithms that have various intriguing applications in ultralow-latency machine vision processing. However, the low frame rate of image sensors-which usually operate at tens of hertz-fundamentally limits the processing rate. The paper will conceptualize and develop the computerized pattern recognition technique that can be applied to investigate light beam profiles and extract the desired information according to the purpose required in this case study. In the current work, the automatic detection and inspection of laser spots were designed to perform analysis and alignment for laser beam in comparison with the electron spot beam using the LabvIEW graphical programming environment, especially when the laser and electron beams overlap. This is one of the important steps for realizing the fundamental aim of test-FEL to produce short wavelengths with the second, third, and fifth harmonics at 131.5, 88, and 53 nm, respectively. The tentative version of the program achieved the elementary purpose, which fulfilled the accurate transversal alignment of the ultrashort laser pulses with the electron beam in the system of the FEL test facility at MAX-Lab, in addition to studying the beam's stability and jittering range. Copyright (C) 2024 The Authors.

关键词： intelligent systems pattern matching real-time tracking computer vision concepts supporting control automation and semi-robotic systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：