检索结果-内蒙古大学图书馆

Steel Surface Defect Detection Using Learnable Memory Vision Transformer

computers, Materials & Continua 2025年第1期82卷 499-520页

作者： Syed Tasnimul Karim Ayon Farhan Md.Siraj Jia Uddin Department of Computer Science and Engineering BRAC UniversityDhaka1212Bangladesh Department of AI and Big Data Endicott CollegeWoosong UniversityDaejeon34606Republic of Korea

This study investigates the application of Learnable Memory Vision Transformers(LMViT)for detecting metal surface flaws,comparing their performance with traditional CNNs,specifically ResNet18 and ResNet50,as well as other transformer-based models including Token to Token ViT,ViT withoutmemory,and Parallel *** awidely-used steel surface defect dataset,the research applies data augmentation and t-distributed stochastic neighbor embedding(t-SNE)to enhance feature extraction and *** techniques mitigated overfitting,stabilized training,and improved generalization *** LMViT model achieved a test accuracy of 97.22%,significantly outperforming ResNet18(88.89%)and ResNet50(88.90%),aswell as the Token to TokenViT(88.46%),ViT without memory(87.18),and Parallel ViT(91.03%).Furthermore,LMViT exhibited superior training and validation performance,attaining a validation accuracy of 98.2%compared to 91.0%for ResNet 18,96.0%for ResNet50,and 89.12%,87.51%,and 91.21%for Token to Token ViT,ViT without memory,and Parallel ViT,*** findings highlight the LMViT’s ability to capture long-range dependencies in images,an areawhere CNNs struggle due to their reliance on local receptive fields and hierarchical feature *** additional transformer-based models also demonstrate improved performance in capturing complex features over CNNs,with LMViT excelling particularly at detecting subtle and complex defects,which is critical for maintaining product quality and operational efficiency in industrial *** instance,the LMViT model successfully identified fine scratches and minor surface irregularities that CNNs often *** study not only demonstrates LMViT’s potential for real-world defect detection but also underscores the promise of other transformer-based architectures like Token to Token ViT,ViT without memory,and Parallel ViT in industrial scenarios where complex spatial relationships are *** research m

关键词： Learnable Memory Vision Transformer(LMViT) Convolutional Neural Networks(CNN) metal surface defect detection deep learning,computer vision image classification learnable memory gradient clipping label smoothing t-SNE visualization

来源：评论

学校读者我要写书评

暂无评论

Enhanced Predictive Modeling Techniques for Early Detection of COPD Utilizing 1D Convolutional Neural Networks 15

Enhanced Predictive Modeling Techniques for Early Detection ...

引用

15th International Conference on Computing Communication and Networking Technologies, ICCCNT 2024

作者： Jha, Sneha Sahu, Prakash Kumar, Santosh CSVTU Dept. of Data Science Chhattisgarh India IIIT-Naya Computer Science & Engineering Raipur India

ISBN: (纸本)9798350370249

Artificial intelligence (AI) breakthroughs have created new opportunities in the field of medical diagnostics, especially for the early identification of respiratory conditions like Chronic Obstructive Pulmonary Disease (COPD), the most prevalent respiratory condition, is characterized by breathing difficulties, mucus (sputum) coughing, and wheezing. Early verification is critical for efficient administration. A one-dimensional convolutional neural network (1D CNN) model for COPD detection, optimized using Adam and RMSprop algorithms is proposed for this purpose. Utilizing the ICBHI 2017 dataset, the performance of each optimizer based on training and validation accuracy is evaluated. The Adam optimizer achieved a training accuracy of 9 4% and a validation accuracy of 9 0%. In comparison, the RMSprop optimizer yielded a training accuracy of 9 2% and a validation accuracy of 8 8%. These results demonstrated that the Adam optimizer surpassed RMSprop in terms of accuracy. It showcased the potential of the proposed 1D CNN model as a reliable diagnostic tool, contributing to the advancement of respiratory health analytics. © 2024 IEEE.

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

Defense Against Graph Injection Attack in Graph Neural Networks 9

Defense Against Graph Injection Attack in Graph Neural Netwo...

引用

9th IEEE International Conference on data science in Cyberspace, DSC 2024

作者： Tong, Hanjin Zhu, Tianqing Guan, Faqian Zhou, Wanlei China University of Geosciences School of Computer Science China University of Macau Faculty of Data Science City China

ISBN: (纸本)9798350391367

Graph neural networks (GNNs) have shown outstanding performance in graph node classification. However, as a deep learning model, GNNs can be influence by adversarial attacks, such as graph injection attacks or graph modification attacks, which modify edges or node features on the original graph. This paper focus on Graph Injection Attacks (GIA), which adds a small number of nodes and edges to the original graph to change the prediction results. GIA has stronger attack potential and it can cause more damage to the homogeneity of graphs. To tackle this problem, this paper proposes a novel defense strategy. We observe that real graphs are normally sparse, so that a link prediction model may be adopted to tell reliable edges from adversarial edges. By increasing the interaction information of edges, and reducing the sensitivity of vulnerable nodes to adversarial edges, the proposed method can increase the prediction acccuracy. Meanwhile, we designe a homogeneous filtering to help to identify adversarial edges, reducing the interference of the adversarial to the model. Experiments show our method has better defense performance than other baseline defense methods. © 2024 IEEE.

关键词： Adversarial machine learning

来源：评论

学校读者我要写书评

暂无评论

A Framework for Video Based Sign Language Interpretation Using Machine Learning and Statistical Methods 4

A Framework for Video Based Sign Language Interpretation Usi...

引用

4th International Conference on Sentiment Analysis and Deep Learning, ICSADL 2025

作者： Swathi, D. Susmetha, K. Sreeji, S. Mangairkarasi, S. B.E Computer Science and Engineering with Data Science Sathyabama Institute of Science and Technology Semmancheri TamilNadu Chennai600119 India Sathyabama Institute of Science and Technology Semmancheri Department of Computer Science and Engineering TamilNadu Chennai6001109 India

ISBN: (纸本)9798331523923

The Sign Language Recognition System has been designed to capture video input, process it to detect hand gestures, and translate these gestures into readable text. The project consists of several key components and steps: Video Processing: Using OpenCV, the system captures frames from the video input. MediaPipe processes these frames to detect and track hand landmarks in real time. OpenCV capabilities allow for efficient frame extraction and basic image processing tasks such as resizing and normalization. Hand Detection and Tracking: MediaPipe pre-trained models identify and track hand movements within the video frames. The accurate detection and tracking of the hand movements are critical for the subsequent recognition of the sign language gestures. Sign Language Recognition: The core system is the deep learning model, trained using the TensorFlow and Keras on a dataset of sign language gestures. The model learns to classify the detected hand movements into corresponding sign language characters or words. Convolutional Neural Networks (CNNs) are typically used for task due to their effectiveness in image recognition tasks. Text Display: Once the system recognizes the signs, it converts them into text and displays the output. This can be done through a console output or a graphical user interface (GUI) built with Tkinter. The GUI provides a user friendly experience, allowing users to see the translated text in real-time. © 2025 IEEE.

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

From data to Harvest through Machine Learning based Crop Yield Forecasting 2

From Data to Harvest through Machine Learning based Crop Yie...

引用

2nd International Conference on Advancements in Smart, Secure and Intelligent Computing, ASSIC 2024

作者： Budarapu, Aadithya Vikram Gaddam, John Hyde Boyedi, Ramu Muthyala, Thoshan Kumer Kathirisetty, Nikhila Dept. of Information Technology Hyderabad India Dept. of Computer Science and Data Science Hyderabad India

ISBN: (纸本)9798350370188

The maintenance and enhancement of dynamic soil characteristics are the primary focus of soil management in agriculture to increase crop productivity. Higher productivity may result from efficient soil control of resources and corrective micronutrient treatments. Using CNN and 'KNN' algorithms, the 'soil land classification and crop prediction system' application was created. In this study, two datasets are used: one to obtain crop prediction and the other for soil land categorization. The kind of soils are trained using CNN ('VGG- 19') algorithm, and the accuracy of the model is calculated. The trained model is then utilized in the Flask web app to forecast the type of soil. Another data set with nitrogen, phosphorus, potassium, pH, and temperature as features and the class type of crop as a label is used to forecast crop production. These two algorithms are used to create the flask website, which accepts inputs such as soil picture, soil type prediction, and land parameter inputs for crop prediction. © 2024 IEEE.

关键词： Soils

来源：评论

学校读者我要写书评

暂无评论

Multilingual Synopses of Movie Narratives: A dataset for Vision-Language Story Understanding

Multilingual Synopses of Movie Narratives: A Dataset for Vis...

引用

2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024

作者： Sun, Yidan Yu, Jianfei Li, Boyang College of Computing and Data Science Nanyang Technological University Singapore School of Computer Science and Engineering Nanjing University of Science and Technology China

ISBN: (纸本)9798891761681

Story video-text alignment, a core task in computational story understanding, aims to align video clips with corresponding sentences in their descriptions. However, progress on the task has been held back by the scarcity of manually annotated video-text correspondence and the heavy concentration on English narrations of Hollywood movies. To address these issues, in this paper, we construct a large-scale multilingual video story dataset named Multilingual Synopses of Movie Narratives (M-SYMON), containing 13,166 movie summary videos from 7 languages, as well as manual annotation of fine-grained video-text correspondences for 101.5 hours of video. Training on the human annotated data from M-SYMON outperforms the SOTA methods by 15.7 and 16.2 percentage points on Clip Accuracy and Sentence IoU scores, respectively, demonstrating the effectiveness of the annotations. As benchmarks for future research, we create 6 baseline approaches with different multilingual training strategies, compare their performance in both intra-lingual and cross-lingual setups, exemplifying the challenges of multilingual video-text alignment. The dataset is released at: https://***/insundaycathy/M-SyMoN. © 2024 Association for Computational Linguistics.

关键词： Visual languages

来源：评论

学校读者我要写书评

暂无评论

Comparison of the SOLA model and ARIMA for Cloud Computing Resources Allocations 4

Comparison of the SOLA model and ARIMA for Cloud Computing R...

引用

4th International Multidisciplinary Information Technology and Engineering Conference, IMITEC 2024

作者： Prince Sekwatlakwatla, Sello Malele, Vusumuzi North-West University Unit for Data Science and Computing Department of Computer Science and Information Systems South Africa

ISBN: (纸本)9798350387988

Cloud computing solutions are becoming more and more popular as a way for organizations to improve productivity, save costs, and simplify procedures. The advantage of cloud services is that they enable users to store data remotely and access on-demand applications and services from a shared pool of configurable computing resources. Cloud computing resource allocation faces challenges in load balancing, effective resource management, and compliance with regulatory and legal requirements, requiring complex tools and techniques for optimal use. Most organizations are moving to cloud services and encouraging their clients to access services online. However, effective resource allocation is critical for improving performance and lowering costs in this area. Due to unpredictable network traffic in cloud computing, resource allocation is challenging, which causes customers to complain about application timeouts, delayed system times, and higher bandwidth use during peak hours. This entails allocating resources to various users and programs, such as memory, processing power, storage, and network bandwidth. In this regard, this study compares the ensemble method, which is stepwise Gaussian Linear Autoregressive (SGLA) and the individual method, which is autoregressive integrated moving average (ARIMA). The results show SGLA prediction accuracy increased with an average of 97.9%, and ARIMA prediction showed an accuracy of 71.5%. In this regard the ensemble method performed better than individual methods using the same datasets. The study recommends the ensemble method for the prediction and allocation of resources in cloud computing. © 2024 IEEE.

关键词： Cloud platforms

来源：评论

学校读者我要写书评

暂无评论

Reveal training performance mystery between Tensor Flow and PyTorch in the single GPU environment

引用

science China(Information sciences) 2022年第1期65卷 147-163页

作者： Hulin DAI Xuan PENG Xuanhua SHI Ligang HE Qian XIONG Hai JIN National Engineering Research Center for Big Data Technology and System Service Computing Technology and System LabSchool of Computer Science and Technology Huazhong University of Science and Technology Department of Computer Science University of Warwick

Deep learning has gained tremendous success in various fields while training deep neural networks(DNNs) is very compute-intensive, which results in numerous deep learning frameworks that aim to offer better usability and higher performance to deep learning practitioners. Tensor Flow and Py Torch are the two most popular frameworks. Tensor Flow is more promising within the industry context, while Py Torch is more appealing in academia. However, these two frameworks differ much owing to the opposite design philosophy:static vs dynamic computation graph. Tensor Flow is regarded as being more performance-friendly as it has more opportunities to perform optimizations with the full view of the computation graph. However, there are also claims that Py Torch is faster than Tensor Flow sometimes, which confuses the end-users on the choice between them. In this paper, we carry out the analytical and experimental analysis to unravel the mystery of comparison in training speed on single-GPU between Tensor Flow and Py Torch. To ensure that our investigation is as comprehensive as possible, we carefully select seven popular neural networks, which cover computer vision, speech recognition, and natural language processing(NLP). The contributions of this work are two-fold. First, we conduct the detailed benchmarking experiments on Tensor Flow and Py Torch and analyze the reasons for their performance difference. This work provides the guidance for the end-users to choose between these two frameworks. Second, we identify some key factors that affect the performance,which can direct the end-users to write their models more efficiently.

关键词： deep learning performance comparison TensorFlow PyTorch

来源：评论

学校读者我要写书评

暂无评论

Distributionally Robust data Valuation 41

Distributionally Robust Data Valuation

引用

41st International Conference on Machine Learning, ICML 2024

作者： Lin, Xiaoqiang Xu, Xinyi Wu, Zhaoxuan Ng, See-Kiong Low, Bryan Kian Hsiang Department of Computer Science National University of Singapore Singapore Institute of Data Science National University of Singapore Singapore

data valuation quantifies the contribution of each data point to the performance of a machine learning model. Existing works typically define the value of data by its improvement of the validation performance of the trained model. However, this approach can be impractical to apply in collaborative machine learning and data marketplace since it is difficult for the parties/buyers to agree on a common validation dataset or determine the exact validation distribution a priori. To address this, we propose a distributionally robust data valuation approach to perform data valuation without known/fixed validation distributions. Our approach defines the value of data by its improvement of the distributionally robust generalization error (DRGE), thus providing a worst-case performance guarantee without a known/fixed validation distribution. However, since computing DRGE directly is infeasible, we propose using model deviation as a proxy for the marginal improvement of DRGE (for kernel regression and neural networks) to compute data values. Furthermore, we identify a notion of uniqueness where low uniqueness characterizes low-value data. We empirically demonstrate that our approach outperforms existing data valuation approaches in data selection and data removal tasks on real-world datasets (e.g., housing price prediction, diabetes hospitalization prediction). Copyright 2024 by the author(s)

关键词：

来源：评论

学校读者我要写书评

暂无评论

Plant Fettle Detector: Unifying Fruit Quality Tracking and Plant Disease Detection through Advanced Image Analysis 3

Plant Fettle Detector: Unifying Fruit Quality Tracking and P...

引用

3rd International Conference for Innovation in Technology, INOCON 2024

作者： Nair, Sruthi Bajpai, Khushi Fadnavis, Sakshi Jain, Shruti Sheikh, Zeenat Shri Ramdeobaba College of Engineering and Management Department of Computer Science and Engineering - Data Science Nagpur44013 India

ISBN: (纸本)9798350381931

India being an agricultural country, food quality tracking is a major challenge faced by common farmers across the country. This research presents an innovative integration of Convolutional Neural Networks (CNNs) to address key challenges in agriculture - fruit quality tracking and plant disease detection. The unified system comprises the Fruit Quality Tracker and the Plant Disease Detector, leveraging advanced image analysis for precision agriculture. The Fruit Quality Tracker uses computer vision to assess ripeness and defects in real-time, empowering farmers with informed decisions. Simultaneously, the Plant Disease Detector swiftly identifies and classifies diseases, mitigating economic losses. The research explores architectural nuances, development methodologies, and deployment on Google Cloud Platform. Testing shows high accuracy rates of 98.5% for fruit quality and 98.83% for plant disease detection. The system's transformative potential in agriculture is highlighted, bridging the gap between technology and sustainable food production systems. © 2024 IEEE.

关键词： Image analysis

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：