检索结果-内蒙古大学图书馆

6th International Conference on Computing and Informatics, ICCI 2024

作者： Alsahafí, Yousef S. Asad, Muhammad Department of Information technology Computing and Information technology Khulis College University of Jeddah Jeddah Saudi Arabia Department of Computer and Software Engineering College of Electrical and Mechanical Engineering National University of Science and Technology Islamabad Pakistan

ISBN: (纸本)9798350373875

This paper presents an extensive empirical study aiming to identify the optimal combination of feature extraction techniques and machine learning algorithms, including deep learning, for automated mispronunciation detection during Quran recitation. Three feature extraction methods-Mel-Frequency Cepstral Coefficients (MFCC), Spectrogram, and Kaldi pitch - are systematically evaluated in conjunction with a diverse set of classifiers, including Gradient Boosting (GB), Support Vector Machines (SVM), Logistic Regression (LR), Random Forest (RF), Artificial Neural Network (ANN), Convolutional Neural Networks (CNN), and Long Short-Term Memory (LSTM) networks. The study leverages the QDAT dataset, comprising 1500 audio samples from 150 readers, with metadata specifying age, gender, and three distinct rules for mispronunciation detection. Experimental results demonstrate that the integration of MFCC with deep learning classifiers consistently achieves the highest performance across multiple evaluation metrics. Model robustness is validated using receiver operator characteristic curves (ROC) and accuracy metrics. © 2024 IEEE.

关键词： Speech recognition

来源：评论

学校读者我要写书评

暂无评论

The Bifurcation Method: White-Box Observation Perturbation Attacks on Reinforcement Learning Agents on a Cyber Physical System 23

The Bifurcation Method: White-Box Observation Perturbation A...

引用

23rd International Conference on Machine Learning and Cybernetics, ICMLC 2024

作者： Broda-Milian, Kiernan Dagdougui, Hanane Mallah, Ranwa Al Royal Military College of Canada Electrical and Computer Engineering Kingston Canada Polytechnique Montreal Computer and Software Engineering Quebec Canada

ISBN: (纸本)9798331528041

Components of cyber physical systems, which affect real-world processes, are often exposed to the internet. Replacing conventional control methods with Deep Reinforcement Learning (DRL) in energy systems is an active area of research, as these systems become increasingly complex with the advent of renewable energy sources and the desire to improve their efficiency. Artificial Neural Networks (ANN) are vulnerable to specific perturbations of their inputs or features, called adversarial examples. These perturbations are difficult to detect when properly regularized, but have significant effects on the ANN's output. Because DRL uses ANN to map optimal actions to observations, they are similarly vulnerable to adversarial examples. This work proposes a novel attack technique for continuous control using Group Difference Logits loss with a bifurcation layer. By combining aspects of targeted and untargeted attacks, the attack significantly increases the impact compared to an untargeted attack, with drastically smaller distortions than an optimally targeted attack. We demonstrate the impacts of powerful gradient-based attacks in a realistic smart energy environment, show how the impacts change with different DRL agents and training procedures, and use statistical and time-series analysis to evaluate attacks' stealth. The results show that adversarial attacks can have significant impacts on DRL controllers, and constraining an attack's perturbations makes it difficult to detect. However, certain DRL architectures are far more robust, and robust training methods can further reduce the impact. © 2024 IEEE.

关键词： Adversarial machine learning

来源：评论

学校读者我要写书评

暂无评论

MILG:Realistic lip-sync video generation with audio-modulated image inpainting

引用

Visual Informatics 2024年第3期8卷 71-81页

作者： Han Bao Xuhong Zhang Qinying Wang Kangming Liang Zonghui Wang Shouling Ji Wenzhi Chen School of Software Technology Zhejiang UniversityHangzhouChina College of Computer Science Zhejiang UniversityHangzhouChina College of Engineering Zhejiang UniversityHangzhouChina

Existing lip synchronization(lip-sync)methods generate accurately synchronized mouths and faces in a generated ***,they still confront the problem of artifacts in regions of non-interest(RONI),e.g.,background and other parts of a face,which decreases the overall visual *** solve these problems,we innovatively introduce diverse image inpainting to lip-sync *** propose Modulated Inpainting Lip-sync GAN(MILG),an audio-constraint inpainting network to predict synchronous *** utilizes prior knowledge of RONI and audio sequences to predict lip shape instead of image generation,which can keep the RONI ***,we integrate modulated spatially probabilistic diversity normalization(MSPD Norm)in our inpainting network,which helps the network generate fine-grained diverse mouth movements guided by the continuous audio ***,to lower the training overhead,we modify the contrastive loss in lipsync to support small-batch-size and few-sample *** experiments demonstrate that our approach outperforms the existing state-of-the-art of image quality and authenticity while keeping lip-sync.

关键词： Lip-sync Image inpainting Face generation Modulated SPD normalization

来源：评论

学校读者我要写书评

暂无评论

Comparison among YOLO Series in SAR Ship Detection with Preliminary Results 24

Comparison among YOLO Series in SAR Ship Detection with Prel...

引用

13th International Conference on Computing and Pattern Recognition, ICCPR 2024

作者： Chen, Runlin Yu, Guanghao Chen, Feng College of Computer and Information Engineering Xiamen University of Technology Xiamen China

ISBN: (纸本)9798400717482

Deep learning has shown significant advantages in object detection, particularly with the You Only Look Once (YOLO) model. YOLO adopted an end-to-end training and detection method that balances speed and accuracy, making it an effective way for synthetic aperture radar (SAR) ship detection. This paper mainly compared several mainstream YOLO series, focusing on detection accuracy and efficiency in SAR ship detection. The experiment results with SAR Ship Detection Dataset indicated that YOLOv6 outperformed other YOLO series models in terms of accuracy and efficiency, achieving a mAP of 0.742 with FPS being 200.00. However, it has a larger parameter size (of 16.30M). Meanwhile, both YOLOv5 and YOLOv8 showed good overall performance with fewer parameters compared against YOLOv6, while their mAP and FPS were relatively lower. This comprehensive analysis highlighted the advancements of different YOLO series, providing valuable insights for further investigation on improving object detection (e.g., SAR ship detection). © 2024 Copyright held by the owner/author(s).

关键词： Synthetic aperture radar

来源：评论

学校读者我要写书评

暂无评论

OCRBench: on the hidden mystery of OCR in large multimodal models

引用

Science China(information Sciences) 2024年第12期67卷 23-35页

作者： Yuliang LIU Zhang LI Mingxin HUANG Biao YANG Wenwen YU Chunyuan LI Xu-Cheng YIN Cheng-Lin LIU Lianwen JIN Xiang BAI School of Artificial Intelligence and Automation Huazhong University of Science and Technology School of Electronic and Information Engineering South China University of Technology Microsoft Research School of Computer & Communication Engineering University of Science and Technology Beijing Institute of Automation Chinese Academy of Sciences School of Software Engineering Huazhong University of Science and Technology

Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual tasks remains relatively unexplored. In this paper, we conducted a comprehensive evaluation of large multimodal models, such as GPT4V and Gemini, in various text-related visual tasks including text recognition, scene text-centric visual question answering(VQA), document-oriented VQA, key information extraction(KIE), and handwritten mathematical expression recognition(HMER). To facilitate the assessment of optical character recognition(OCR) capabilities in large multimodal models, we propose OCRBench, a comprehensive evaluation benchmark. OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available. Furthermore, our study reveals both the strengths and weaknesses of these models, particularly in handling multilingual text, handwritten text, non-semantic text, and mathematical expression *** importantly, the baseline results presented in this study could provide a foundational framework for the conception and assessment of innovative strategies targeted at enhancing zero-shot multimodal *** evaluation pipeline and benchmark are available at https://***/Yuliang-Liu/Multimodal OCR.

关键词： large multimodal model OCR text recognition scene text-centric VQA document-oriented VQA key information extraction handwritten mathematical expression recognition

来源：评论

学校读者我要写书评

暂无评论

An implementation of personalized therapy in Clinical Decision Support System using adaptive transformer and hybrid deep learning network

引用

Australian Journal of Electrical and Electronics engineering 2025年第1期22卷 105-120页

作者： Praveena Rachel Kamala, S. T, Saraswathi. Kaliraj, V. Nithya, A. Hema, R. Ramyadevi, R. Department of Information Technology Easwari Engineering college Tamil Nadu Chennai India Department of CSE S.A Engineering College Tamil Nadu Chennai India Department of Computer Science and Engineering St.Joseph’s College of Engineering Tamil Nadu Chennai India Department of Electronics and Communication Engineering Easwari Engineering College Tamil Nadu Chennai India Department of Computer Science and Applications SRM Institute of Science and Technology Tamil Nadu Chennai India

This paper suggests a new mechanism from deep learning concept for personalised therapy in Clinical Decision Support Systems (CDSS). Basically, the texts used for the observation are acquired from the standard data sources and then forwarded to the text preprocessing task. In the pre-processing phase, the punctuation and special character removal, stop word removal, and stemming process are applied to remove noise and help to eliminate the redundant information in order to improve the data quality. Further, the pre-processed text is applied to the Adaptive Transformer Net (ATN) for the feature extraction purpose, where the attributes in this task are optimally determined with the aid of the Adaptive Walrus Optimization Algorithm (AWOA). Finally, the resultant text is subjected to the Hybrid Deep Learning Network (HDLNet). The HDLNet model is implemented by integrating the ‘Residual Long Short-Term Memory (Residual LSTM) with Dilated Recurrent Neural Network (Dilated RNN)’. From the results, the sensitivity analysis performed in the implemented technique secured 3.7% more efficient than LSTM, 7.76% improved than MobileNet, 6.7% superior to residual LSTM, and 0.90% effective than dilated RNN in dataset 1. Throughout the validation, the conventional techniques are evaluated with the suggested personalised therapy in CDSS to prove its efficacy. ©, Engineers Australia.

关键词： Long short-term memory

来源：评论

学校读者我要写书评

暂无评论

The Internal Drive Force Analysis of Learning for engineering Students 5th

The Internal Drive Force Analysis of Learning for Engineer...

引用

5th International Conference on computer Science and Educational Informatization, CSEI 2023

作者： Du, Xiaoyu Zhou, Guanying Han, Zhijie Du, Ying Qiao, Baojun College of Computer and Information Engineering Henan University Kaifeng475004 China Henan Engineering Laboratory of Spatial Information Processing Henan University Kaifeng475004 China College of Software Henan University Kaifeng475004 China

ISBN: (纸本)9789819994984

Emerging engineering Education is oriented to cultivate the ability to solve complex engineering problems, which required engineering students to have higher learning initiative and self-discipline. This paper explores the formation mechanism of college students’ learning drive from teachers, counselors, learning environment and learning motivation, then we propose methods and suggestions for cultivating learning drive for these aspects. © 2024, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

关键词： engineering education

来源：评论

学校读者我要写书评

暂无评论

Harnessing LSTM Classifier to Suggest Nutrition Diet for Cancer Patients

引用

Intelligent Automation & Soft Computing 2023年第2期35卷 2171-2187页

作者： S.Raguvaran S.Anandamurugan A.M.J.Md.Zubair Rahman Department of Computer Science and Engineering Al-Ameen Engineering CollegeErode638115India Department of Information and Technology Kongu Engineering CollegeErode638060India Al-Ameen Engineering College Erode638115India

A customized nutrition-rich diet plan is of utmost importance for cancer patients to intake healthy and nutritious foods that help them to be strong enough to maintain their body weight and body *** nutrition-rich diet foods will prevent them from the side effects caused before and after treatment thereby minimizing *** work is proposed here to provide them with an effec-tive diet assessment plan using deep learning-based automated medical diet ***,an Enhanced Long-Short Term Memory(E-LSTM)has been proposed in this paper,especially for cancer *** proposed method will be very useful for cancer patients as this would help them predict the foods which can be consumed by them based on the nutrition analysis of food *** classiﬁcation will be performed in E-LSTM by analyzing the two datasets,one with food images and another with cancer patients’*** an in-depth analysis of the major research papers concerning deep learning strategies to iden-tify the foods along with their nutrition composition,this method has been iden-tiﬁed as one of theﬁnest deep learning approaches that are used for classiﬁcation *** work has been identiﬁed as theﬁrst work producing a new layer for feature extraction and providing nutrition suggestions,especially for cancer patients using the LSTM *** accuracy of prediction and classiﬁcation will be improved by the dedicated layer for feature extraction in ***,it is proved that this proposed method outperforms all other existing techniques in terms of F1 Score,Precision,Recall,Classiﬁcation accuracy,Training loss and Validation loss.

关键词： Classiﬁcation diet assessment enhanced long short term memory nutrition suggestion

来源：评论

学校读者我要写书评

暂无评论

A New Low-light Image Enhancement Based on Disentanglement Representation Learning 7

A New Low-light Image Enhancement Based on Disentanglement R...

引用

7th IEEE information Technology, Networking, Electronic and Automation Control Conference, ITNEC 2024

作者： Chen, Xinran Hu, Yanxiang Zhang, Bo Gao, Yaru Hao, Caixia College of Computer and Information Engineering Tianjin Normal University China

ISBN: (纸本)9798350370782

Visible light image is the most important information source and has been widely used in many computer vision tasks. However, the existing low-light image enhancement methods focus on generating visual pleasing results without take the downstream CV tasks into account. In this paper, we propose a new low-light image enhancement method based on disentanglement representation learning. Based on an auto-encoder network, we decompose a low-light image into a content feature and an illumination feature at first;then a detail enhancement module is designed to enhance the content feature. At the same time, rather than adjusting illumination feature directly, a downstream CV task oriented illumination enhancement strategy is devised. A preferred illumination alternative can be selected from a set of pre-extracted illumination feature adaptively by the subsequent vision tasks. At final, a decoder reconstructs the enhancement image. Our method tries to collaborate with the downstream CV tasks to pursuit the best whole performance. Quantitative and qualitative experiments demonstrate the advantages of the proposed method. © 2024 IEEE.

关键词： Image enhancement

来源：评论

学校读者我要写书评

暂无评论

Intelligent biomedical image classification in a big data architecture using metaheuristic optimization and gradient approximation

引用

Wireless Networks 2024年第8期30卷 7087-7108页

作者： Almutairi, Laila Abugabah, Ahed Alhumyani, Hesham Mohamed, Ahmed A. Department of Computer Engineering College of Computer and Information Sciences Majmaah University Majmaah11952 Saudi Arabia College of Technological Innovation Zayed University Abu Dhabi Campus Abu Dhabi United Arab Emirates Department of Computer Engineering College of Computers and Information Technology Taif University P.O. Box 11099 Taif21944 Saudi Arabia Department of Information Technology Faculty of Computer and Information Assiut University Assiut71515 Egypt

Medical imaging has experienced significant development in contemporary medicine and can now record a variety of biomedical pictures from patients to test and analyze the illness and its severity. computer vision and artificial intelligence may outperform human diagnostic ability and uncover hidden information in biomedical images. In healthcare applications, fast prediction and reliability are of the utmost importance parameters to assure the timely detection of disease. The existing systems have poor classification accuracy, and higher computation time and the system complexity is higher. Low-quality images might impact the processing method, leading to subpar results. Furthermore, extensive preprocessing techniques are necessary for achieving accurate outcomes. Image contrast is one of the most essential visual parameters. Insufficient contrast may present many challenges for computer vision techniques. Traditional contrast adjustment techniques may not be adequate for many applications. Occasionally, these technologies create photos that lack crucial information. The primary contribution of this work is designing a Big Data Architecture (BDA) to improve the dependability of medical systems by producing real-time warnings and making precise forecasts about patient health conditions. A BDA-based Bio-Medical Image Classification (BDA-BMIC) system is designed to detect the illness of patients using Metaheuristic Optimization (Genetic Algorithm) and Gradient Approximation to improve the biomedical image classification process. Extensive tests are conducted on publicly accessible datasets to demonstrate that the suggested retrieval and categorization methods are superior to the current methods. The suggested BDA-BMIC system has average detection accuracy of 94.6% and a sensitivity of 97.3% in the simulation analysis. © The Author(s) 2023.

关键词： Image classification

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：