Audio Deepfakes, which are highly realistic fake audio recordings driven by AI tools that clone human voices, With Advancements in Text-Based Speech Generation (TTS) and Vocal Conversion (VC) technologies have enabled...
详细信息
Audio Deepfakes, which are highly realistic fake audio recordings driven by AI tools that clone human voices, With Advancements in Text-Based Speech Generation (TTS) and Vocal Conversion (VC) technologies have enabled it easier to create realistic synthetic and imitative speech, making audio Deepfakes a common and potentially dangerous form of deception. Well-known people, like politicians and celebrities, are often targeted. They get tricked into saying controversial things in fake recordings, causing trouble on social media. Even kids’ voices are cloned to scam parents into ransom payments, etc. Therefore, developing effective algorithms to distinguish Deepfake audio from real audio is critical to preventing such frauds. Various Machine learning (ML) and Deep learning (DL) techniques have been created to identify audio Deepfakes. However, most of these solutions are trained on datasets in English, Portuguese, French, and Spanish, expressing concerns regarding their correctness for other languages. The main goal of the research presented in this paper is to evaluate the effectiveness of deep learning neural networks in detecting audio Deepfakes in the Urdu language. Since there’s no suitable dataset of Urdu audio available for this purpose, we created our own dataset (URFV) utilizing both genuine and fake audio recordings. The Urdu Original/real audio recordings were gathered from random youtube podcasts and generated as Deepfake audios using the RVC model. Our dataset has three versions with clips of 5, 10, and 15 seconds. We have built various deep learning neural networks like (RNN+LSTM, CNN+attention, TCN, CNN+RNN) to detect Deepfake audio made through imitation or synthetic techniques. The proposed approach extracts Mel-Frequency-Cepstral-Coefficients (MFCC) features from the audios in the dataset. When tested and evaluated, Our models’ accuracy across datasets was noteworthy. 97.78% (5s), 98.89% (10s), and 98.33% (15s) were remarkable results for the RNN+LSTM
Partial-label learning(PLL) is a typical problem of weakly supervised learning, where each training instance is annotated with a set of candidate labels. Self-training PLL models achieve state-of-the-art performance b...
详细信息
Partial-label learning(PLL) is a typical problem of weakly supervised learning, where each training instance is annotated with a set of candidate labels. Self-training PLL models achieve state-of-the-art performance but suffer from error accumulation problems caused by mistakenly disambiguated instances. Although co-training can alleviate this issue by training two networks simultaneously and allowing them to interact with each other, most existing co-training methods train two structurally identical networks with the same task, i.e., are symmetric, rendering it insufficient for them to correct each other due to their similar limitations. Therefore, in this paper, we propose an asymmetric dual-task co-training PLL model called AsyCo,which forces its two networks, i.e., a disambiguation network and an auxiliary network, to learn from different views explicitly by optimizing distinct tasks. Specifically, the disambiguation network is trained with a self-training PLL task to learn label confidence, while the auxiliary network is trained in a supervised learning paradigm to learn from the noisy pairwise similarity labels that are constructed according to the learned label confidence. Finally, the error accumulation problem is mitigated via information distillation and confidence refinement. Extensive experiments on both uniform and instance-dependent partially labeled datasets demonstrate the effectiveness of AsyCo.
In recent years, deep learning has significantly advanced skin lesion segmentation. However, annotating medical image data is specialized and costly, while obtaining unlabeled medical data is easier. To address this c...
详细信息
Detecting dangerous driving behavior is a critical research area focused on identifying and preventing actions that could lead to traffic accidents, such as smoking, drinking, yawning, and drowsiness, through technica...
详细信息
Image captioning is an interdisciplinary research hotspot at the intersection of computer vision and natural language processing, representing a multimodal task that integrates core technologies from both fields. This...
详细信息
With the development of artificial intelligence, deep learning has been increasingly used to achieve automatic detection of geographic information, replacing manual interpretation and improving efficiency. However, re...
详细信息
In order to dynamically create a sequence of textual descriptions for images, image description models often make use of the attention mechanism, which involves an automatic focus on different regions within an image....
详细信息
Safety equipment detection is an important application of object detection, receiving widespread attention in fields such as smart construction sites and video surveillance. Significant progress has been made in objec...
详细信息
Current automatic segment extraction techniques for identifying target characters in videos have several limitations, including low accuracy, slow processing speeds, and poor adaptability to diverse scenes. This paper...
详细信息
With the development of deep learning in recent years, code representation learning techniques have become the foundation of many softwareengineering tasks such as program classification [1] and defect detection. Ear...
With the development of deep learning in recent years, code representation learning techniques have become the foundation of many softwareengineering tasks such as program classification [1] and defect detection. Earlier approaches treat the code as token sequences and use CNN, RNN, and the Transformer models to learn code representations.
暂无评论