检索结果-内蒙古大学图书馆

Deepfake Audio Detection for Urdu Language Using Deep Neural Networks

IEEE Access 2025年 13卷 97765-97778页

作者： Ahmad, Omair Khan, Muhammad Sohail Jan, Salman Khan, Inayat University of Engineering and Technology Department of Computer Software Engineering Mardan Pakistan Arab Open University Faculty of Computer Studies A’Ali732 Bahrain University of Engineering and Technology Department of Computer Science Mardan Pakistan

Audio Deepfakes, which are highly realistic fake audio recordings driven by AI tools that clone human voices, With Advancements in Text-Based Speech Generation (TTS) and Vocal Conversion (VC) technologies have enabled it easier to create realistic synthetic and imitative speech, making audio Deepfakes a common and potentially dangerous form of deception. Well-known people, like politicians and celebrities, are often targeted. They get tricked into saying controversial things in fake recordings, causing trouble on social media. Even kids’ voices are cloned to scam parents into ransom payments, etc. Therefore, developing effective algorithms to distinguish Deepfake audio from real audio is critical to preventing such frauds. Various Machine learning (ML) and Deep learning (DL) techniques have been created to identify audio Deepfakes. However, most of these solutions are trained on datasets in English, Portuguese, French, and Spanish, expressing concerns regarding their correctness for other languages. The main goal of the research presented in this paper is to evaluate the effectiveness of deep learning neural networks in detecting audio Deepfakes in the Urdu language. Since there’s no suitable dataset of Urdu audio available for this purpose, we created our own dataset (URFV) utilizing both genuine and fake audio recordings. The Urdu Original/real audio recordings were gathered from random youtube podcasts and generated as Deepfake audios using the RVC model. Our dataset has three versions with clips of 5, 10, and 15 seconds. We have built various deep learning neural networks like (RNN+LSTM, CNN+attention, TCN, CNN+RNN) to detect Deepfake audio made through imitation or synthetic techniques. The proposed approach extracts Mel-Frequency-Cepstral-Coefficients (MFCC) features from the audios in the dataset. When tested and evaluated, Our models’ accuracy across datasets was noteworthy. 97.78% (5s), 98.89% (10s), and 98.33% (15s) were remarkable results for the RNN+LSTM

关键词： Deep neural networks

来源：评论

学校读者我要写书评

暂无评论

AsyCo: an asymmetric dual-task co-training model for partial-label learning

引用

Science China(Information Sciences) 2025年第5期68卷 332-347页

作者： Beibei LI Yiyuan ZHENG Beihong JIN Tao XIANG Haobo WANG Lei FENG College of Computer Science Chongqing University State Key Laboratory of Computer Science Institute of Software Chinese Academy of Sciences University of Chinese Academy of Sciences School of Software Technology Zhejiang University School of Computer Science and Engineering Nanyang Technological University

Partial-label learning(PLL) is a typical problem of weakly supervised learning, where each training instance is annotated with a set of candidate labels. Self-training PLL models achieve state-of-the-art performance but suffer from error accumulation problems caused by mistakenly disambiguated instances. Although co-training can alleviate this issue by training two networks simultaneously and allowing them to interact with each other, most existing co-training methods train two structurally identical networks with the same task, i.e., are symmetric, rendering it insufficient for them to correct each other due to their similar limitations. Therefore, in this paper, we propose an asymmetric dual-task co-training PLL model called AsyCo,which forces its two networks, i.e., a disambiguation network and an auxiliary network, to learn from different views explicitly by optimizing distinct tasks. Specifically, the disambiguation network is trained with a self-training PLL task to learn label confidence, while the auxiliary network is trained in a supervised learning paradigm to learn from the noisy pairwise similarity labels that are constructed according to the learned label confidence. Finally, the error accumulation problem is mitigated via information distillation and confidence refinement. Extensive experiments on both uniform and instance-dependent partially labeled datasets demonstrate the effectiveness of AsyCo.

关键词： machine learning weakly supervised learning partial-label learning co-training models candidate label sets

来源：评论

学校读者我要写书评

暂无评论

Semi-Supervised Skin Lesion Segmentation Based on Pseudo-Labels

IAENG International Journal of Computer Science

引用

IAENG International Journal of computer Science 2025年第2期52卷 325-332页

作者： Mu, Bo Wei, JingXin Zhang, Yujun School of Computer Science and Software Engineering University of Science and Technology Liaoning Anshan114051 China

In recent years, deep learning has significantly advanced skin lesion segmentation. However, annotating medical image data is specialized and costly, while obtaining unlabeled medical data is easier. To address this challenge, we propose a semi-supervised learning strategy to improve segmentation accuracy by combining a small amount of annotated data with a larger volume of unlabeled data. Our approach employs a teacher-student model framework. In this framework, the teacher model generates pseudo-labels for the unlabeled data, and the student model is trained using both these pseudo-labels and the limited true labels. To improve the student model’s learning capacity, we introduce auxiliary segmentation heads that provide joint guidance during training. We use the crossentropy (CE) loss function to quantify the discrepancies between the segmentation outputs of the main head and auxiliary heads. Since pseudo-labels generated by the teacher model may contain noise, we developed a mechanism to identify and exclude uncertain regions in each unlabeled image. This reduces pseudolabel noise and mitigates its negative impact on the student model. Our method demonstrates significant improvements in skin lesion segmentation on the publicly available ISIC2018 dataset, achieving Dice coefficients of 87.84% and 88.73% with only 5% and 10% of the total annotated data, respectively, outperforming existing methods. © (2025), (International Association of Engineers). All rights reserved.

关键词： Self-supervised learning

来源：评论

学校读者我要写书评

暂无评论

A Dangerous Driving Behavior Detection Method Based on Improved YOLOv8s

引用

engineering Letters 2025年第3期33卷 721-731页

作者： Zhou, Tong Zhang, Xiaoxia Chen, Huilong School of Computer Science and Software Engineering University of Science and Technology LiaoNing Anshan114051 China

Detecting dangerous driving behavior is a critical research area focused on identifying and preventing actions that could lead to traffic accidents, such as smoking, drinking, yawning, and drowsiness, through technical methods. Advanced computer vision and machine learning technologies enable efficient detection models to monitor and analyze driver behavior, improving road safety. Due to challenges posed by complex environments, this paper introduces an enhanced detection algorithm, YOLOv8s-CDS, to improve the identification of dangerous driving behaviors. First, the ConvNeXt V2 module is integrated with the C2f module to form C2fNeb2, optimizing feature extraction for behaviors like smoking or phone use. Second, the DASI (Dimension-Aware Selective Integration) module enhances detection accuracy through multi-scale fusion and dimension perception. Additionally, the SCConv module replaces the Conv module in the Bottleneck, forming C2fSCConv, which reduces spatial redundancy and improves detection efficiency.A comprehensive experimental analysis of dangerous driving image datasets demonstrates that the mean average precision (mAP) of the YOLOv8s-CDS algorithm is 91.20%, which is 2.4% higher than that of the YOLOv8s algorithm. Compared to other object detection algorithms, such as Faster R-CNN, YOLOv5s, YOLOv7s, YOLOX and YOLOv8, YOLOv8s-CDS demonstrates greater practicality in detecting dangerous driving behaviors, contributing to a reduction in traffic accidents and enhancing the safety of life and property. © 2025, International Association of Engineers. All rights reserved.

关键词： Highway accidents

来源：评论

学校读者我要写书评

暂无评论

Vision-Text Bidirectional Collaborative Image Captioning Algorithm

IAENG International Journal of Computer Science

引用

IAENG International Journal of computer Science 2025年第2期52卷 515-523页

作者： Li, Mei-Qi Zhou, Zi-Wei School of Computer Science and Software Engineering University of Science and Technology LiaoNing Anshan114051 China

Image captioning is an interdisciplinary research hotspot at the intersection of computer vision and natural language processing, representing a multimodal task that integrates core technologies from both fields. This task requires the use of computer vision techniques to analyze and extract key visual features from images, followed by the application of natural language processing techniques to generate descriptive text that is syntactically and semantically aligned with human cognition. This process poses a significant challenge for computers. Existing models mostly ignore the relative positional information of visual objects and struggle to efficiently capture the complex relationships between visual and textual data. To address these challenges, we propose a vision-to-text bidirectional collaborative image captioning method. This approach extracts both visual features and positional information of objects, allowing the model to better understand the spatial relationships between objects. The CEW word embedding approach encodes textual information more profoundly, enhancing semantic expression and contextual understanding. In the decoding phase, a bidirectional cross-attention mechanism strengthens the interaction between vision and text, leading to improved accuracy in image understanding. The model is trained and tested on the MSCOCO 2014 dataset and compared with several popular models. Experimental results demonstrate that the proposed method achieves significant improvements on the CIDEr and BLEU-1 evaluation metrics with an increase of 1.5 and 1.1, respectively. In addition, we conduct ablation experiments, quantitative analysis, and qualitative analysis to comprehensively validate the effectiveness and stability of the proposed algorithm. © (2025), (International Association of Engineers). All rights reserved.

关键词： Embeddings

来源：评论

学校读者我要写书评

暂无评论

The Improved Unet Semantic Segmentation Network for Remote Sensing Images

IAENG International Journal of Computer Science

引用

IAENG International Journal of computer Science 2025年第4期52卷 1187-1195页

作者： Zhu, Hang Zhao, Ji School of Computer Science and Software Engineering University of Science and Technology Liaoning Anshan114051 China

With the development of artificial intelligence, deep learning has been increasingly used to achieve automatic detection of geographic information, replacing manual interpretation and improving efficiency. However, remote sensing images themselves have the issue of slight inter-class variance and significant intra-class variance, making it challenging to extract valuable information. Additionally, the increasing resolution and size of remote sensing images in recent years have introduced more complexity in the types of information, further increasing the difficulty of extracting valuable data. This paper proposes an improved Unet semantic segmentation network (referred to as RAUnet). First, in the encoder, continuous convolutional blocks are enhanced to extract features. At the same time, the EMAM multi-scale attention module is employed for cross-channel learning, capturing information from different feature channels of the target and using the surrounding feature information to assist in distinguishing target information. To capture multi-directional long-range dependencies, the Lo2 module is used for long-range modeling, which captures not only local contextual information but also long-range dependencies. In the decoder, a Dysample upsampling module is used to restore feature details, and in the skip connection layer, features are added for feature fusion. Experimental results show that compared to mainstream models, the proposed method achieves superior segmentation results on the Potsdam and Vihingen datasets. © (2025), (International Association of Engineers). All rights reserved.

关键词： Semantic Segmentation

来源：评论

学校读者我要写书评

暂无评论

Revolutionizing Image Captioning: Integrating Attention Mechanisms with Adaptive Fusion Gates

IAENG International Journal of Computer Science

引用

IAENG International Journal of computer Science 2024年第3期51卷 212-221页

作者： Sheng, Shou-Jun Zhou, Zi-Wei School of Computer and Software Engineering University of Science and Technology Liaoning AnShan114051 China

In order to dynamically create a sequence of textual descriptions for images, image description models often make use of the attention mechanism, which involves an automatic focus on different regions within an image. However, a prevalent issue with current attention mechanisms is their tendency to overlook essential elements within the image, prioritizing contextual aspects of the object when generating descriptive text. This constraint results in a decrease in the precision of the textual descriptions produced. To address this issue and improve the accuracy of image interpretation, a proposed model for image description utilizes an attention-based approach and includes a multi-layer decoder and a fusion gate. This model is based on an encoder-decoder architecture and utilizes Residual Network (ResNet) framework for feature extraction during the encoding phase, thereby extending the encoder-decoder structure into the decoding phase. Within this framework, an adaptive fusion gate mechanism is introduced and combined with multi-layer cascade decoders to facilitate the generation of utterances. This allows decoders from lower layers to actively contribute to the final text prediction phase, thereby incrementally improving the accuracy of the predicted text and generating more precise descriptions. MS COCO 2014 dataset has been utilized for the purpose of training and validating the effectiveness of this model in understanding images. The results clearly and unequivocally establish the model’s capacity to generate exceptional predictions. When compared with the top-performing models, it has demonstrated significant enhancements, as indicated by a 0.096 rise in BLEU_1 metric, a 0.153 improvement in ROUGE_L metric, and a remarkable 0.32 increase in CIDEr metric on MS COCO dataset. The overall improvement in performance across all evaluation criteria highlights the model’s alignment with the requirements of image understanding applications. © (2024), (International Associ

关键词： computer vision

来源：评论

学校读者我要写书评

暂无评论

YOLOv7-DSE: An Efficient Safety Equipment Detection Network

IAENG International Journal of Computer Science

引用

IAENG International Journal of computer Science 2024年第6期51卷 572-581页

作者： Ren, Jiaxin Cui, Wenhua Tao, Ye Shi, Tianwei School of Computer Science and Software Engineering University of Science and Technology Liaoning Anshan China

Safety equipment detection is an important application of object detection, receiving widespread attention in fields such as smart construction sites and video surveillance. Significant progress has been made in object detection due to the rapid development of deep learning. Multi-scale targets and complex scenes increase the likelihood of false positives and missed detections, which can affect the accuracy of the detection. To address this issue, this study proposes YOLOv7-DSE. It is a small complex target scene detection network based on the improved YOLOv7. Also, we have created a private dataset of safety equipment. First, we enhanced the ELAN and MP backbone networks. Backbone is replaced by ordinary convolution by the depthwise separable convolution. We enabled the backbone network to extract deeper image features without increasing the amount of parameters and computation. Simultaneously, the model incorporates the EIOU loss function to improve its convergence speed and positioning effect. Secondly, we propose a new ELAN-SPD structure in the head network. Based on the ELAN structure, a space-to-depth convolutional layer is added to fully downsample the feature map, preserving all learnable features. Our network model can better detect objects with significant size differences faced with complex scenes. YOLOv7-DSE achieved the mAP of 82.38%, surpassing the original YOLOv7 with 2.64%. The YOLOv7-DSE model has a minor size compared to the baseline model. Our improvement has reduced the model parameters by 22.4%. © (2024), (International Association of Engineers). All rights reserved.

关键词： Object detection

来源：评论

学校读者我要写书评

暂无评论

Application of Face Detection and Recognition Algorithm Based on Deep Learning in Automatic Interception of Target Person Video Clips

IAENG International Journal of Computer Science

引用

IAENG International Journal of computer Science 2024年第10期51卷 1546-1559页

作者： Wang, Zhibiao Tao, Ye Liu, Xu Yu, Jiayi Cui, Wenhua School of Computer Science and Software Engineering University of Science and Technology Liaoning Anshan China

Current automatic segment extraction techniques for identifying target characters in videos have several limitations, including low accuracy, slow processing speeds, and poor adaptability to diverse scenes. This paper introduces an optimized algorithm to address these issues that enhance the RetinaFace and FaceNet models. We selected RetinaFace for face detection, employing MobileNetV1-0.25 as the backbone network and simplifying its Feature Pyramid Network (FPN) structure to boost detection speeds. Analysis of 460 images with a 720P resolution demonstrated an average speed improvement of 20.6%. For face recognition, we utilized FaceNet with MobileNetV3 as the backbone, augmenting its feature extraction capability by integrating four Receptive Field Block (RFB) structures and replacing the Squeeze-and-Excitation (SE) module with the Convolutional Block Attention Module (CBAM). Experimental results indicate that our enhancements elevate the maximum accuracy to 97%, outperforming the original model. Additionally, we integrated these refined algorithms and conducted disintegration experiments on segment extraction in 10 videos, evaluating various metrics. The findings show improvements in both precision and recall. We also compared our algorithm against the Dlib model;our system achieved an overall interception accuracy of 79.94%, surpassing Dlib’s 75.55%. This confirms the enhanced performance and feasibility of our proposed algorithm. © (2024), (International Association of Engineers). All rights reserved.

关键词： Face recognition

来源：评论

学校读者我要写书评

暂无评论

GTE: learning code AST representation efficiently and effectively

引用

Science China(Information Sciences) 2025年第3期68卷 393-394页

作者： Yihao QIN Shangwen WANG Bo LIN Kang YANG Xiaoguang MAO College of Computer Science and Technology National University of Defense Technology Key Laboratory of Software Engineering for Complex Systems National University of Defense Technology

With the development of deep learning in recent years, code representation learning techniques have become the foundation of many software engineering tasks such as program classification [1] and defect detection. Earlier approaches treat the code as token sequences and use CNN, RNN, and the Transformer models to learn code representations.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：