检索结果-内蒙古大学图书馆

Classification and study of music genres with multimodal Spectro-Lyrical Embeddings for Music (SLEM)

Multimedia Tools and Applications 2025年第7期84卷 3701-3721页

作者： Mehra, Ashman Mehra, Aryan Narang, Pratik Department of Computer Science and Information Systems Birla Institute of Technology and Science Goa Pilani India Computer Science Department Carnegie Mellon University PittsburghPA United States Department of Computer Science and Information Systems Birla Institute of Technology and Science Pilani Rajasthan Pilani India

The essence of music is inherently multi-modal – with audio and lyrics going hand in hand. However, there is very less research done to study the intricacies of the multi-modal nature of music, and its relation with genres. Our work uses this multi-modality to present spectro-lyrical embeddings for music representation (SLEM), leveraging the power of open-sourced, lightweight, and state-of-the-art deep learning vision and language models to encode songs. This work summarises extensive experimentation with over 20 deep learning-based music embeddings of a self-curated and hand-labeled multi-lingual dataset of 226 recent songs spread over 5 genres. Our aim is to study the effects of varying the weight of lyrics and spectrograms in the embeddings on the multi-class genre classification. The purpose of this study is to prove that a simple linear combination of both modalities is better than either modality alone. Our methods achieve an accuracy ranging between 81.08% to 98.60% for different genres, by using the K-nearest neighbors algorithm on the multimodal embeddings. We successfully study the intricacies of genres in this representational space, including their misclassification, visual clustering with EM-GMM, and the domain-specific meaning of the multi-modal weight for each genre with respect to ’instrumentalness’ and ’energy’ metadata. SLEM presents one of the first works on an end-to-end method that uses spectro-lyrical embeddings without hand-engineered features. © The Author(s), under exclusive licence to Springer science+Business Media, LLC, part of Springer Nature 2024.

关键词： Music

来源：评论

学校读者我要写书评

暂无评论

MCFusion: Frequency Domain Characteristics Enhancement and Feature Compensation Fusion Network for RGB-T Object Detection

引用

IEEE Sensors Journal 2025年第11期25卷 20880-20893页

作者： Gao, Yinbo Liao, Zhuhua Liu, Yizhi Yi, Aiping Zhang, Guoqiang Hunan University of Science and Technology China Hunan University of Science and Technology Department of Computer Science and Engineering Xiangtan China Hainan Normal University Department of Information Science and Technology Haikou China

The multi-modal object detection technology based on visible-thermal vision sensors has drawn significant attention as it is capable of achieving reliable object detection in complex scenes with challenging lighting conditions such as low light or backlight. However, there has been a lack of focus on the frequency domain feature information of the visible and thermal modalities themselves, as well as the complementarity of cross-modal features. Furthermore, current visible and thermal feature fusion methods only utilize feature information from the current layer, neglecting context information. Therefore, this paper proposes a novel network framework for enhancing frequency domain Characteristics and fusing cross-layer and cross-modal features to compensate for these limitations. This framework introduces two key modules: the Frequency Domain Characteristics Enhancement (FCE) module and the Cross-Layer and Cross-Modal Feature Compensation Fusion (CCF) module. The FCE module consists of two sub-modules. Reduce high-frequency information loss(FCE-RHFL) module operates on the visible modality to reduce high-frequency information loss, utilizing methods such as high-frequency masking and frequency recombination. Meanwhile, Enhance high-frequency information representation(FCE-EHFR) module enhances high-frequency information representation for the thermal modality through convolutions with different kernels and frequency domain enhancement techniques. In the CCF module, cross-modal and cross-layer feature compensation methods are employed to compensate for differences in modalities, followed by capture complementary information across modalities using a query-guided cross-attention mechanism. Finally, we conduct experimental comparisons on the KAIST and FLIR datasets, and the experimental results demonstrate that our method has excellent performance and robust detection results. © 2001-2012 IEEE.

关键词： Frequency domain analysis

来源：评论

学校读者我要写书评

暂无评论

Segmentation of Head and Neck Tumors Using Dual PET/CT Imaging:Comparative Analysis of 2D,2.5D,and 3D Approaches Using UNet Transformer

引用

computer Modeling in engineering & sciences 2024年第12期141卷 2351-2373页

作者： Mohammed A.Mahdi Shahanawaj Ahamad Sawsan A.Saad Alaa Dafhalla Alawi Alqushaibi Rizwan Qureshi Information and Computer Science Department College of Computer Science and EngineeringUniversity of Ha’ilHa’il55476Saudi Arabia Software Engineering Department College of Computer Science and EngineeringUniversity of Ha’ilHa’il55476Saudi Arabia Computer Engineering Department College of Computer Science and EngineeringUniversity of Ha’ilHa’il55476Saudi Arabia Department of Computer and Information Sciences Universiti Teknologi PetronasSeri Iskandar32610Malaysia Center for Research in Computer Vision(CRCV) University of Central FloridaOrlandoFL 32816USA

The segmentation of head and neck(H&N)tumors in dual Positron Emission Tomography/Computed Tomogra-phy(PET/CT)imaging is a critical task in medical imaging,providing essential information for diagnosis,treatment planning,and outcome *** by the need for more accurate and robust segmentation methods,this study addresses key research gaps in the application of deep learning techniques to multimodal medical ***,it investigates the limitations of existing 2D and 3D models in capturing complex tumor structures and proposes an innovative 2.5D UNet Transformer model as a *** primary research questions guiding this study are:(1)How can the integration of convolutional neural networks(CNNs)and transformer networks enhance segmentation accuracy in dual PET/CT imaging?(2)What are the comparative advantages of 2D,2.5D,and 3D model configurations in this context?To answer these questions,we aimed to develop and evaluate advanced deep-learning models that leverage the strengths of both CNNs and *** proposed methodology involved a comprehensive preprocessing pipeline,including normalization,contrast enhancement,and resampling,followed by segmentation using 2D,2.5D,and 3D UNet Transformer *** models were trained and tested on three diverse datasets:HeckTor2022,AutoPET2023,and *** was assessed using metrics such as Dice Similarity Coefficient,Jaccard Index,Average Surface Distance(ASD),and Relative Absolute Volume Difference(RAVD).The findings demonstrate that the 2.5D UNet Transformer model consistently outperformed the 2D and 3D models across most metrics,achieving the highest Dice and Jaccard values,indicating superior segmentation *** instance,on the HeckTor2022 dataset,the 2.5D model achieved a Dice score of 81.777 and a Jaccard index of 0.705,surpassing other model *** 3D model showed strong boundary delineation performance but exhibited variability across datasets,while the

关键词： PET/CT imaging tumor segmentation weighted fusion transformer multi-modal imaging deep learning neural networks clinical oncology

来源：评论

学校读者我要写书评

暂无评论

Artificial Intelligence and Natural Language Processing Inspired Chabot Technologies

引用

Recent Advances in computer science and Communications 2024年第1期17卷 11-20页

作者： Singh, Deepti Manju Jatain, Aman Netaji Subhash Institute of Technology New Delhi India Department of Computer Science and Engineering & Information Technology Jaypee Institute of Information Technology Noida India Department of Computer Science and Engineering Amity University Haryana India

Chatbots use artificial intelligence (AI) and natural language processing (NLP) algorithms to construct a clever system. By copying human connections in the most helpful way possi-ble, chatbots emulate individuals and serve as virtual assistants. They easily interface and respond to customers' requests. In the modern technical environment, these conversation agents or chatbots are considered the next-generation invention. Chatbot has become more popular in the business field right now as it can reduce customer service cost and handle multiple users at a time. There are many techniques used to involve such intelligent experts in daily business. A comprehensive analysis of the methods is needed to determine the viability of the different strategies. This paper tracks the progress of this invention and further clarifies the influence of chatbots on numerous businesses. Besides, a survey of the multiple chatbot methodologies suggested by various researchers is provid-ed. Along with the survey, a chatbot e-commerce customer service is designed to provide an efficient and accurate answer for any query based on the dataset of frequently asked questions. This chatbot can reduce customer service costs and can handle multiple customers at the same time. © 2024 Bentham science Publishers.

关键词： Machine learning

来源：评论

学校读者我要写书评

暂无评论

Social Media-Based Surveillance Systems for Health Informatics Using Machine and Deep Learning Techniques:A Comprehensive Review and Open Challenges

引用

computer Modeling in engineering & sciences 2024年第5期139卷 1167-1202页

作者： Samina Amin Muhammad Ali Zeb Hani Alshahrani Mohammed Hamdi Mohammad Alsulami Asadullah Shaikh Institute of Computing Kohat University of Science and TechnologyKohat26000Pakistan Department of Computer Science College of Computer Science and Information SystemsNajran UniversityNajran61441Saudi Arabia Department of Information System College of Computer Science and Information SystemsNajran UniversityNajran61441Saudi Arabia

Social media(SM)based surveillance systems,combined with machine learning(ML)and deep learning(DL)techniques,have shown potential for early detection of epidemic *** review discusses the current state of SM-based surveillance methods for early epidemic outbreaks and the role of ML and DL in enhancing their ***,every year,a large amount of data related to epidemic outbreaks,particularly Twitter data is generated by *** paper outlines the theme of SM analysis for tracking health-related issues and detecting epidemic outbreaks in SM,along with the ML and DL techniques that have been configured for the detection of epidemic *** has emerged as a promising ML technique that adaptsmultiple layers of representations or features of the data and yields state-of-the-art extrapolation *** recent years,along with the success of ML and DL in many other application domains,both ML and DL are also popularly used in SM *** paper aims to provide an overview of epidemic outbreaks in SM and then outlines a comprehensive analysis of ML and DL approaches and their existing applications in SM ***,this review serves the purpose of offering suggestions,ideas,and proposals,along with highlighting the ongoing challenges in the field of early outbreak detection that still need to be addressed.

关键词： Social media epidemic machine learning deep learning health informatics pandemic

来源：评论

学校读者我要写书评

暂无评论

Enhancing aviation safety: Machine learning for real-time ADS-B injection detection through advanced data analysis

引用

Alexandria engineering Journal 2025年 126卷 262-276页

作者： Rahman, Md. Atiqur Bhuiyan, Touhid Ali, M. Ameer Department of Computer Science and Engineering East West university Dhaka Bangladesh School of Information Technology Washington University of Science and Technology Alexandria United States Department of Computer Science and Engineering Bangladesh University of Business and Technology Dhaka Bangladesh

Airplanes play a critical role in global transportation, ensuring the efficient movement of people and goods. Although generally safe, aviation systems occasionally encounter incidents and accidents that underscore the need for proactive risk management. This study employs machine learning to detect abnormalities in commercial aircraft operations using Automatic Dependent Surveillance–Broadcast (ADS-B) data. Given the growing reliance on ADS-B technology, concerns regarding its susceptibility to security breaches, such as injection attacks, have intensified. To address these vulnerabilities, we propose a robust ADS-B injection detection system. Employing GridSearchCV for model optimization, it effectively identifies and categorizes injection risks. The system's performance, evaluated using the ADS-B Message Injection Attacks Dataset, achieves outstanding results, including a value of 0.9970 for the accuracy, precision, recall, and F1 score. The proposed classifier also demonstrates a higher area under the curve (0.9999), specificity (0.9956), and Cohen's kappa (0.9954) than existing approaches, while achieving a lower log loss (0.0107). This research significantly enhances aviation security by introducing a highly accurate, computationally efficient, and reliable real-time detection model for ADS-B injection attacks, ensuring the integrity and resilience of modern flight control systems. © 2025 The Authors

关键词： Risk management

来源：评论

学校读者我要写书评

暂无评论

Enhancing User Experience in AI-Powered Human-computer Communication with Vocal Emotions Identification Using a Novel Deep Learning Method

引用

computers, Materials & Continua 2025年第2期82卷 2909-2929页

作者： Ahmed Alhussen Arshiya Sajid Ansari Mohammad Sajid Mohammadi Department of Computer Engineering College of Computer and Information SciencesMajmaah UniversityAl-Majmaah11952Saudi Arabia Department of Information Technology College of Computer and Information SciencesMajmaah UniversityAl-Majmaah11952Saudi Arabia Department of Computer Science College of Engineering and Information TechnologyOnaizah CollegesQassim51911Saudi Arabia

Voice, motion, and mimicry are naturalistic control modalities that have replaced text or display-driven control in human-computer communication (HCC). Specifically, the vocals contain a lot of knowledge, revealing details about the speaker’s goals and desires, as well as their internal condition. Certain vocal characteristics reveal the speaker’s mood, intention, and motivation, while word study assists the speaker’s demand to be understood. Voice emotion recognition has become an essential component of modern HCC networks. Integrating findings from the various disciplines involved in identifying vocal emotions is also challenging. Many sound analysis techniques were developed in the past. Learning about the development of artificial intelligence (AI), and especially Deep Learning (DL) technology, research incorporating real data is becoming increasingly common these days. Thus, this research presents a novel selfish herd optimization-tuned long/short-term memory (SHO-LSTM) strategy to identify vocal emotions in human communication. The RAVDESS public dataset is used to train the suggested SHO-LSTM technique. Mel-frequency cepstral coefficient (MFCC) and wiener filter (WF) techniques are used, respectively, to remove noise and extract features from the data. LSTM and SHO are applied to the extracted data to optimize the LSTM network’s parameters for effective emotion recognition. Python Software was used to execute our proposed framework. In the finding assessment phase, Numerous metrics are used to evaluate the proposed model’s detection capability, Such as F1-score (95%), precision (95%), recall (96%), and accuracy (97%). The suggested approach is tested on a Python platform, and the SHO-LSTM’s outcomes are contrasted with those of other previously conducted research. Based on comparative assessments, our suggested approach outperforms the current approaches in vocal emotion recognition.

关键词： Human-computer communication(HCC) vocal emotions live vocal artificial intelligence(AI) deep learning(DL) selfish herd optimization-tuned long/short K term memory(SHO-LSTM)

来源：评论

学校读者我要写书评

暂无评论

Leveraging User-Generated Comments and Fused BiLSTM Models to Detect and Predict Issues with Mobile Apps

引用

computers, Materials & Continua 2024年第4期79卷 735-759页

作者： Wael M.S.Yafooz Abdullah Alsaeedi Department of Computer Science College of Computer Science and EngineeringTaibah UniversityMedina42353Saudi Arabia

In the last decade, technical advancements and faster Internet speeds have also led to an increasing number ofmobile devices and users. Thus, all contributors to society, whether young or old members, can use these mobileapps. The use of these apps eases our daily lives, and all customers who need any type of service can accessit easily, comfortably, and efficiently through mobile apps. Particularly, Saudi Arabia greatly depends on digitalservices to assist people and visitors. Such mobile devices are used in organizing daily work schedules and services,particularly during two large occasions, Umrah and Hajj. However, pilgrims encounter mobile app issues such asslowness, conflict, unreliability, or user-unfriendliness. Pilgrims comment on these issues on mobile app platformsthrough reviews of their experiences with these digital services. Scholars have made several attempts to solve suchmobile issues by reporting bugs or non-functional requirements by utilizing user ***, solving suchissues is a great challenge, and the issues still exist. Therefore, this study aims to propose a hybrid deep learningmodel to classify and predict mobile app software issues encountered by millions of pilgrims during the Hajj andUmrah periods from the user perspective. Firstly, a dataset was constructed using user-generated comments fromrelevant mobile apps using natural language processing methods, including information extraction, the annotationprocess, and pre-processing steps, considering a multi-class classification problem. Then, several experimentswere conducted using common machine learning classifiers, Artificial Neural Networks (ANN), Long Short-TermMemory (LSTM), and Convolutional Neural Network Long Short-Term Memory (CNN-LSTM) architectures, toexamine the performance of the proposed model. Results show 96% in F1-score and accuracy, and the proposedmodel outperformed the mentioned models.

关键词： Mobile apps issues play store user comments deep learning LSTM bidirectional LSTM

来源：评论

学校读者我要写书评

暂无评论

Conditional-GAN-Based Face Inpainting Approaches With Symmetry and View-Degree Utilization

引用

IEEE Access 2024年 12卷 87467-87478页

作者： Hong, Tzung-Pei Wu, Jin-Hang Su, Ja-Hwung Yin, Tang-Kai National University of Kaohsiung Department of Computer Science and Information Engineering Kaohsiung811726 Taiwan National Sun Yat-sen University Department of Computer Science and Engineering Kaohsiung804201 Taiwan

Recently, image inpainting has been proposed as a solution for restoring the polluted image in the field of computer vision. Further, face inpainting is a subfield of image inpainting, which refers to a set of image editing algorithms re-conducting the missing regions in face smoothly. Actually, face inpainting is more challenging than general image inpainting because it needs more face structure information. Although a number of past studies were proposed for face inpainting by using face segmentation, face edge and face topology, there is some important information ignored, such as geometric and symmetric properties. Based on such concepts, in this paper, we propose a two-stage face inpainting method called CGAN (Conditional Generative Adversarial Network) which integrates face landmarks and Generative Adversarial Network (called GAN). In the first stage, the face landmark is predicted as the condition, providing GAN with important information of geometry and symmetry. The main idea in this stage is to dynamically adjust the loss by the proposed view degree. Accordingly, the masked face image and the corresponding face landmark are used as conditions input to the GAN in the second stage. Finally, the missing-regions are inpainted by the proposed CGAN. To reveal the effectiveness of proposed method, a number of evaluations were conducted on real datasets. The experimental results show that, the proposed method predicts a better face landmark by information of geometric structures and symmetric outlooks, and thereupon the proposed CGAN reconstructs the missing regions superior to the compared methods. © 2013 IEEE.

关键词： Image reconstruction

来源：评论

学校读者我要写书评

暂无评论

Fake News Detection on Social Media Using Ensemble Methods

引用

computers, Materials & Continua 2024年第12期81卷 4525-4549页

作者： Muhammad Ali Ilyas Abdul Rehman Assad Abbas Dongsun Kim Muhammad Tahir Naseem Nasro Min Allah Department of Computer Science COMSATS UniversityIslamabad45550Pakistan School of Computer Science and Engineering Kyungpook National UniversityDaegu41566Republic of Korea Department of Computer Science and Engineering Korea UniversitySeoul02841Republic of Korea Department of Electronic Engineering Yeungnam UniversityGyeongsan-si38541Republic of Korea Department of Computer Science College of Computer Science and Information TechnologyImam Abdulrahman Bin Faisal UniversityDammam34223Saudi Arabia

In an era dominated by information dissemination through various channels like newspapers,social media,radio,and television,the surge in content production,especially on social platforms,has amplified the challenge of distinguishing between truthful and deceptive *** news,a prevalent issue,particularly on social media,complicates the assessment of news *** pervasive spread of fake news not only misleads the public but also erodes trust in legitimate news sources,creating confusion and polarizing *** the volume of information grows,individuals increasingly struggle to discern credible content from false narratives,leading to widespread misinformation and potentially harmful *** numerous methodologies proposed for fake news detection,including knowledge-based,language-based,and machine-learning approaches,their efficacy often diminishes when confronted with high-dimensional datasets and data riddled with noise or *** study addresses this challenge by evaluating the synergistic benefits of combining feature extraction and feature selection techniques in fake news *** employ multiple feature extraction methods,including Count Vectorizer,Bag of Words,Global Vectors for Word Representation(GloVe),Word to Vector(Word2Vec),and Term Frequency-Inverse Document Frequency(TF-IDF),alongside feature selection techniques such as information Gain,Chi-Square,Principal Component Analysis(PCA),and Document *** comprehensive approach enhances the model’s ability to identify and analyze relevant features,leading to more accurate and effective fake news *** findings highlight the importance of a multi-faceted approach,offering a significant improvement in model accuracy and ***,the study emphasizes the adaptability of the proposed ensemble model across diverse datasets,reinforcing its potential for broader application in real-world *** introduce a pioneering ensemble

关键词： Fake news detection Machine Learning(ML) Deep Learning(DL) Chi-Square ensembling

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：