检索结果-内蒙古大学图书馆

Exploring optimizer efficiency for facial expression recognition with convolutional neural networks

JOURNAL OF ENGINEERING-JOE 2025年第1期2025卷

作者： Madni, Syed Hamid Hussain Pathmanatan, Lokessh A. L. Faheem, Muhammad Shahzad, Hafiz Muhammad Faisal Shah, Sajid Univ Southampton Malaysia Sch Elect & Comp Sci Johor Baharu Malaysia Univ Teknol Malaysia Fac Comp Skudai Johor Malaysia Univ Vaasa Sch Technol & Innovat Vaasa Finland VTT Tech Res Ctr Finland Ltd Espoo Finland Univ Sargodha Dept Comp Sci Sargodha Pakistan

It's widely accepted that human expressions, considering for roughly sixty percent of all daily interactions, are among the most authentic forms of communication. Numerous studies are being conducted to explore the importance of facial expressions and the development of machine-assisted recognition techniques. Significant progress is being made in facial and expression recognition, largely due to the rapid growth of machine learning and computer vision. A variety of algorithmic approaches and methods exist for detecting and recognizing facial expressions and features. This study investigates various optimization algorithms used with convolutional neural networks for facial expression recognition. The primary focus is on Adam, RMSProp, stochastic gradient descent and AdaMax optimizers. A comprehensive comparison is being made, examining the key aspects of each optimizer, including its advantages and disadvantages. Furthermore, the study also incorporates findings from recent studies that used these optimizers in various applications, highlighting their performance in terms of training time and precision. The aim is to illuminate the process of selecting a suitable optimizer for specific applications, analysing the trade-offs between training speed and higher accuracy levels. Moreover, this study provides a deeper analysis of the role optimizers play in machine learning-based facial expression recognition models. The discussion of the technical challenges posed by these optimizers and future improvements for achieving much more optimal results concludes the study.

关键词： image classification image processing image recognition optimisation convolution neural network machine leanring

来源：评论

学校读者我要写书评

暂无评论

Convolutional Neural Network Approach for Early Skin Cancer Detection

引用

JOURNAL OF ELECTRICAL SYSTEMS 2023年第3期19卷 1-14页

作者： Raut, Roshani Gavali, Niraj Amate, Prathamesh Amode, Mihir Ajay Malunjkar, Shraddha Borkar, Pradnya Pimpri Chinchwad Coll Engn Dept Informat Technol Pune Maharashtra India Symbiosis Inst Technol Nagpur Maharashtra India

The field of medical image processing is rapidly adopting artificial intelligence. Its use is required for many applications in the healthcare industry. A machine can learn from experience without explicit programming thanks to computer education. It is an area within AI. Deep learning, a kind of machine learning, infers critical features for image processing via multiple layer processing and mathematical operations based on artificial neural networks. In the field of healthcare, which encompasses medicine and dentistry, artificial intelligence has several *** melanoma skin cancer identification is necessary for effective therapy. Melanoma, among the various types of skin cancer, has recently gained international recognition as the most deadly one since it is much more likely to spread to other body regions if detected and treated quickly. Clinical diagnosis of various ailments is increasingly using non-invasive medical computer vision or medical image processing. These methods offer an automatic image processing tool that makes it possible to examine the lesion quickly and precisely. The procedures used in this study included building a database of dermoscopy images, preprocessing, segmenting using thresholding, extracting statistical features using asymmetry, border, colour, diameter, etc., and choosing features based on the total dermoscopy score, principal component analysis (PCA), and convocation neural network classification (CNN). According to the findings, a classification accuracy of 90.1% was attained.

关键词： AlexNet Benign Convolutional Neural Network Data pre-processing Feature extraction image Augmentation Malignant Maxpooling Training Testing Resnet VGG.

来源：评论

学校读者我要写书评

暂无评论

A Data Augmentation Method for Data-Driven Component Segmentation of Engineering Drawings

引用

JOURNAL OF COMPUTING AND INFORMATION SCIENCE IN ENGINEERING 2024年第1期24卷 011001页

作者： Zhang, Wentai Joseph, Joe Chen, Quan Koz, Can Xie, Liuyue Regmi, Amit Yamakawa, Soji Furuhata, Tomotake Shimada, Kenji Kara, Levent Burak Carnegie Mellon Univ Dept Mech Engn Pittsburgh PA 15213 USA

We present a new data generation method to facilitate an automatic machine interpretation of 2D engineering part drawings. While such drawings are a common medium for clients to encode design and manufacturing requirements, a lack of computer support to automatically interpret these drawings necessitates part manufacturers to resort to laborious manual approaches for interpretation which, in turn, severely limits processing capacity. Although recent advances in trainable computer vision methods may enable automatic machine interpretation, it remains challenging to apply such methods to engineering drawings due to a lack of labeled training data. As one step toward this challenge, we propose a constrained data synthesis method to generate an arbitrarily large set of synthetic training drawings using only a handful of labeled examples. Our method is based on the randomization of the dimension sets subject to two major constraints to ensure the validity of the synthetic drawings. The effectiveness of our method is demonstrated in the context of a binary component segmentation task with a proposed list of descriptors. An evaluation of several image segmentation methods trained on our synthetic dataset shows that our approach to new data generation can boost the segmentation accuracy and the generalizability of the machine learning models to unseen drawings.

关键词： computational synthesis computer aided design data-driven engineering machine learning for engineering applications

来源：评论

学校读者我要写书评

暂无评论

Application of Computer 3D image vision Algorithm in Intelligent image Recognition System 5

Application of Computer 3D Image Vision Algorithm in Intelli...

引用

2023 5th International Conference on Artificial Intelligence and Computer applications, ICAICA 2023

作者： Li, Yuan Yu, Xin Modern Finance Industry School Shandong Institute of Commerce and Technology Shandong Jinan China

ISBN: (纸本)9798350323313

In this paper, the 3D space imaging model of machine vision is constructed. Starting from the traditional machine vision image processing algorithm flow, the image denoising process and target tracking process are optimized. The method uses the camera to collect the image and video information of the measured object, and transmits it to the controller. The controller corrects the signal obtained by the wireless sensor in the database to reproduce the position of the measured object and the 3D image. A real-time tracking method of motion trajectory based on computer vision is presented. The object autonomous capture, 3D position and motion trajectory tracking. Simulation experiments show that this method is quite different from conventional image processing methods. This method has the advantages of small computation, fast running speed and good real-time performance. It meets the needs of embedded image processing. © 2023 IEEE.

关键词： image recognition

来源：评论

学校读者我要写书评

暂无评论

Research on Badminton Movement machine Learning Model Based on Computer vision Technology 2

Research on Badminton Movement Machine Learning Model Based ...

引用

2nd IEEE International Conference on image processing and Computer applications, ICIPCA 2024

作者： Zong, Cheng Hohhot Vocational College Hohhot China

ISBN: (纸本)9798350360240

This paper aims to explore an innovative method combining computer vision and machine learning to accurately identify and analyze various movements in badminton. This paper first summarizes the application prospect of computer vision in the field of sports analysis, and introduces its specific application scenarios in badminton in detail. By constructing a complete technical framework of image preprocessing module, feature extraction algorithm and deep learning model, the complex movements of badminton players such as swing, stroke and moving pace are captured and analyzed. In the research process, we used multi-view image fusion and key point detection technology to accurately extract action features in badminton, combined with convolutional neural network (CNN), recurrent neural network (RNN), long term memory network (LSTM) and other deep learning models to efficiently learn and model these features. Thus, the automatic classification and recognition of badminton movement can be realized. The experimental results show that the model has significant accuracy in badminton action recognition, good generalization ability and practicability, and can be effectively applied in the badminton teaching and training process of athlete performance evaluation, competition data analysis and other aspects. This research result not only expands the practical application of computer vision technology in the field of badminton, but also provides new ideas and tools for further promoting the development of sports intelligence and digitalization. © 2024 IEEE.

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

Improving image classification of one-dimensional convolutional neural networks using Hilbert space-filling curves

引用

APPLIED INTELLIGENCE 2023年第22期53卷 26655-26671页

作者： Verbruggen, Bert Ginis, Vincent Vrije Univ Brussel Data Analyt Lab Pl Laan 2 B-1050 Brussels Belgium Harvard Univ Sch Engn & Appl Sci 9 Oxford St Cambridge MA 02138 USA

Convolutional neural networks (CNNs) have significantly contributed to recent advances in machine learning and computer vision. Although initially designed for image classification, the application of CNNs has stretched far beyond the context of images alone. Some exciting applications, e.g., in natural language processing and image segmentation, implement one-dimensional CNNs, often after a pre-processing step that transforms higher-dimensional input into a suitable data format for the networks. However, local correlations within data can diminish or vanish when one converts higher-dimensional data into a one-dimensional string. The Hilbert space-filling curve can minimize this loss of locality. Here, we study this claim rigorously by comparing an analytical model that quantifies locality preservation with the performance of several neural networks trained with and without Hilbert mappings. We find that Hilbert mappings offer a consistent advantage over the traditional flatten transformation in test accuracy and training speed. The results also depend on the chosen kernel size, agreeing with our analytical model. Our findings quantify the importance of locality preservation when transforming data before training a one-dimensional CNN and show that the Hilbert space-filling curve is a preferential transformation to achieve this goal.

关键词： image classification image transformation machine learning Convolutional neural network Supervised learning Data preprocessing

来源：评论

学校读者我要写书评

暂无评论

Is Grad-CAM Explainable in Medical images? 1

引用

8th International Conference on Computer vision and image processing (CVIP)

作者： Suara, Subhashis Jha, Aayush Sinha, Pratik Sekh, Arif Ahmed XIM Univ Bhubaneswar India

ISBN: (数字)9783031581816

ISBN: (纸本)9783031581809;9783031581816

Explainable Deep Learning has gained significant attention in the field of artificial intelligence (AI), particularly in domains such as medical imaging, where accurate and interpretable machine learning models are crucial for effective diagnosis and treatment planning. Grad-CAM is a baseline that highlights the most critical regions of an image used in a deep learning model's decision-making process, increasing interpretability and trust in the results. It is applied in many computer vision (CV) tasks such as classification and explanation. This study explores the principles of Explainable Deep Learning and its relevance to medical imaging, discusses various explainability techniques and their limitations, and examines medical imaging applications of Grad-CAM. The findings highlight the potential of Explainable Deep Learning and Grad-CAM in improving the accuracy and interpretability of deep learning models in medical imaging. The code is available in (https://***/ beasthunter758/GradEML).

关键词： Explainable Deep Learning Gradient-weighted Class Activation Mapping (Grad-CAM) Medical image Analysis

来源：评论

学校读者我要写书评

暂无评论

Model Pruning for Infrared-Visible image Fusion 2

Model Pruning for Infrared-Visible Image Fusion

引用

2nd International Conference on machine vision, image processing and Imaging Technology, MVIPIT 2024

作者： Chen, Qi Feng, Rui State Grid Beijing Electric Power Company Beijing China

ISBN: (纸本)9798331543037

Infrared-visible image fusion combines complementary information from both modalities, enhancing scene perception in applications such as surveillance and autonomous driving. However, existing deep learning-based methods are often computationally expensive. Our approach involves training an over-parameterized fusion network, applying structured pruning to reduce model complexity, and fine-tuning the pruned model to maintain performance. The pruning process leverages the L2-norm of the Restormer blocks, ensuring that less critical components are removed while preserving essential fusion quality. Experiments on the benchmark datasets demonstrate that our approach achieves high fusion quality with significantly reduced computational costs. Ablation studies further validate the effectiveness of our pruning strategy. ©2024 IEEE.

关键词： image fusion

来源：评论

学校读者我要写书评

暂无评论

Instruct Me More! Random Prompting for Visual In-Context Learning

Instruct Me More! Random Prompting for Visual In-Context Lea...

引用

IEEE/CVF Winter Conference on applications of Computer vision (WACV)

作者： Zhang, Jiahao Wang, Bowen Li, Liangzhi Nakashima, Yuta Nagahara, Hajime Osaka Univ Suita Osaka Japan

ISBN: (纸本)9798350318920;9798350318937

Large-scale models trained on extensive datasets, have emerged as the preferred approach due to their high generalizability across various tasks. In-context learning (ICL), a popular strategy in natural language processing, uses such models for different tasks by providing instructive prompts but without updating model parameters. This idea is now being explored in computer vision, where an input-output image pair (called an in-context pair) is supplied to the model with a query image as a prompt to exemplify the desired output. The efficacy of visual ICL often depends on the quality of the prompts. We thus introduce a method coined Instruct Me More (InMeMo), which augments in-context pairs with a learnable perturbation (prompt), to explore its potential. Our experiments on mainstream tasks reveal that InMeMo surpasses the current state-of-the-art performance. Specifically, compared to the baseline without learnable prompt, InMeMo boosts mIoU scores by 7.35 and 15.13 for foreground segmentation and single object detection tasks, respectively. Our findings suggest that InMeMo offers a versatile and efficient way to enhance the performance of visual ICL with lightweight training. Code is available at https://***/Jackieam/InMeMo.

关键词： Algorithms Algorithms and algorithms formulations image recognition and understanding machine learning architectures

来源：评论

学校读者我要写书评

暂无评论

Towards 360∘ image compression for machines via modulating pixel significance

引用

Multimedia Tools and applications 2024年第42期83卷 90271-90288页

作者： Zheng, Silin Shen, Xuelin Zhang, Qiudan Chen, Zhuo Yang, Wenhan Wang, Xu College of Computer Science and Software Engineering Shenzhen University Guangdong Shenzhen51800 China Guangdong Shenzhen51800 China Peng Cheng Laboratory Guangdong Shenzhen51800 China

The rapid growth of computer vision-based applications, including smart cities and autonomous driving, has created a pressing demand for efficient 360∘ image compression and computer vision analytics. In most circumstances, 360∘ image compression and computer vision face challenges arising from the oversampling inherent in the Equirectangular Projection (ERP). However, these two fields often employ divergent technological approaches. Since image compression aims to reduce redundancy, computer vision analytics attempts to compensate for the semantic distortion caused by the projection process, resulting in a potential conflict between the two objectives. This paper explores a potential route, i.e.360∘ image Coding for machine (360-ICM), which offers an image processing framework that addresses both object deformation and oversampling redundancy within a unified framework. The key innovation lies in inferring a pixel-wise significant map by jointly considering the requirements of redundancy removal and object deformation offsetting. The significance map would be subsequently fed to a deformation-aware image compression network, guiding the bit allocation process as an external condition. More specifically, we employ a deformation-aware image compression network that is characterized by the Spatial Feature Transform (SFT) layer, which is capable of performing complex affine transformations of high-level semantic features, to be essential in dealing with the deformation. The image compression network and significance inference network are jointly trained under the supervision of a 360∘ image-specified object detection network, obtaining a compact representation that is both analytics-oriented and deformation-aware. Extensive experimental results have demonstrated the superiority of the proposed method over existing state-of-the-art image codecs in terms of rate-analytics performance. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：