Technology in today’s world has wholly modified payment methods from cash to digital or card payments. therefore, in these situations, the authentication system is essential for guaranteeing the legitimacy of a ...
详细信息
We explore the use of Wav2Vec 2.0, NeMo, and ESPNet models trained on a dataset in Macedonian language for the development of Automatic Speech recognition (ASR) models for low-resource languages. the study aims to eva...
详细信息
ISBN:
(数字)9783031390593
ISBN:
(纸本)9783031390586;9783031390593
We explore the use of Wav2Vec 2.0, NeMo, and ESPNet models trained on a dataset in Macedonian language for the development of Automatic Speech recognition (ASR) models for low-resource languages. the study aims to evaluate the performance of recent state-of-the-art models for speech recognition in low-resource languages, such as Macedonian, where there are limited resources available for training or fine-tuning. the paper presents a methodology used for data collection and preprocessing, as well as the details of the three architectures used in the study. the study evaluates the performance of each model using WER and CER metrics and provides a comparative analysis of the results. the findings of the research showed that Wav2Vec 2.0 outperformed the other models for the Macedonian language with a WER of 0.21, and CER of 0.09, however, NeMo and ESPNet models are still good candidates for creating ASR tools for low-resource languages such as Macedonian. the research presented provides insights into the effectiveness of different models for ASR in low-resource languages and highlights the potentials for using these models to develop ASR tools for other languages in the future. these findings have significant implications for the development of ASR tools for other low-resource languages in the future, and can potentially improve accessibility to speech recognition technology for individuals and communities who speak these languages.
Despite recent advances in video action recognition achieving strong performance on existing benchmarks, these models often lack robustness when faced with natural distribution shifts between training and test data. W...
详细信息
ISBN:
(纸本)9789819787012
Despite recent advances in video action recognition achieving strong performance on existing benchmarks, these models often lack robustness when faced with natural distribution shifts between training and test data. We propose two novel evaluation methods to assess model resilience to such distribution disparity. One method uses two different datasets collected from different sources and uses one for training and validation, and the other for testing. More precisely, we created dataset splits of HMDB-51 or UCF-101 for training, and Kinetics-400 for testing, using the subset of the classes that are overlapping in both train and test datasets. the other proposed method extracts the feature mean of each class from the target evaluation dataset’s training data (i.e. class prototype), and estimates test video prediction as a cosine similarity score between each sample to the class prototypes of each target class. this procedure does not alter model weights using the target dataset and it does not require aligning overlapping classes of two different datasets, thus it is a very efficient method to test the model robustness to distribution shifts, without prior knowledge of the target distribution. We address the robustness problem by adversarial augmentation training – generating augmented views of videos that are "hard" for the classification model by applying gradient ascent on the augmentation parameters – as well as "curriculum" scheduling the strength of the video augmentations. We experimentally demonstrate the superior performance of the proposed adversarial augmentation approach over baselines across three state-of-the-art action recognition models - TSM, Video Swin Transformer, and Uniformer. Our curated datasets and source code are publicly available (https://***/kiyoon/video-adversarial-augmentation). the presented work provides critical insight into model robustness to distribution shifts and presents effective techniques to enhance video action recogni
In response to the challenges posed by sweeping jamming and Digital Radio Frequency Memory (DRFM) jamming in Frequency Modulation Continuous Wave (FMCW) fuzes, a jamming mitigation method based on deep learning method...
详细信息
Customer journey analysis is important for organizations to get to know as much as possible about the main behavior of their customers. this provides the basis to improve the customer experience within their organizat...
详细信息
ISBN:
(纸本)9783031278143;9783031278150
Customer journey analysis is important for organizations to get to know as much as possible about the main behavior of their customers. this provides the basis to improve the customer experience within their organization. this paper addresses the problem of predicting the occurrence of a certain activity of interest in the remainder of the customer journey that follows the occurrence of another specific activity. For this, we propose the HIAP framework which uses process mining techniques to analyze customer journeys. Different prediction models are researched to investigate which model is most suitable for high importance activity prediction. Furthermore the effect of using a sliding window or landmark model for (re)training amodel is investigated. the framework is evaluated using a health insurance real dataset and a benchmark data set. the efficiency and prediction quality results highlight the usefulness of the framework under various realistic online business settings.
the proceedings contain 80 papers. the special focus in this conference is on Communication, Computing and Electronics Systems. the topics include: Smart Door Unlocking System;An Insight into AI and ICT Towards Sustai...
ISBN:
(纸本)9789811977527
the proceedings contain 80 papers. the special focus in this conference is on Communication, Computing and Electronics Systems. the topics include: Smart Door Unlocking System;An Insight into AI and ICT Towards Sustainable Manufacturing;FPGA Implementation of Efficient 32-Bit 3-Operand Addition Using Kogge–Stone (KS) Parallel Prefix Adder;automatic Mulching machine;multi-user Hybrid Beamforming for mmWave Systems Using learning-Aided Link Adaptation;prediction of Disease Using Retinal Image in Deep learning;plant Health Analyzer Using Convolutional Neural Networks;behaviors of Modern Game Non-playable Characters;Low-Noise Amplifier with Co-designed Microstrip Antenna for 60 GHz Wireless Communications;Impact of High Dimensionality Reduction in Financial datasets of SMEs with Feature Pre-processing in datamining;generation of Counters and Compressors Using Sorting Network;deep learning-Based Triphase Community Detection for Multimedia data;Design of a High-Speed and Low-Power AES Architecture;improving Sleep Apnea Screening with Variational Mode Decomposition and Deep learning Techniques;effect of Selectively-Filled-Ethanol on Dispersion Characteristics of Circular Shaped Hollow Core Photonic Crystal Fiber;a Demand Management Planning System for a Meat Factory Based on the Predicted Market Price Under Indian Market Scenario;Performance Comparison of MCML, PFSCL, and Dynamic CML Gates with Parametric Analysis in 45 nm CMOS Technology;VANET Authentication with Privacy-Preserving Schemes—A Survey;An Effective Protection Approach for Deceive Attacker in AES Attack;Effective EMI Reductıon in Medical Devices and Automotive Power Converters;conformal Antenna with Bow and Arrow Shaped Radiator for Wireless Capsule Endoscopy;gujarati Language Automatic Speech recognition Using Integrated Feature Extraction and Hybrid Acoustic Model;A CNN-Based Underage Driver Detection System.
the proceedings contain 191 papers. the topics discussed include: virtual inertia control strategy for PV using dc capacitive and electrochemical energy storage;liquid drip detection in power plant based on machine vi...
ISBN:
(纸本)9781665467735
the proceedings contain 191 papers. the topics discussed include: virtual inertia control strategy for PV using dc capacitive and electrochemical energy storage;liquid drip detection in power plant based on machine vision;intelligent recognition algorithm of specific pattern content information based on mobile terminal equipment;research on power communication defect diagnosis technology based on unsupervised learning;key technologies of high speed modulator for satellite data transmission;research on privacy fraud detection of logistic regression based on homomorphic encryption;application research on x-ray imaging technology for online detection of gas insulated switchgear;power transformer state identification method based on operational deflection shapes and visual measurement technology;research on identification method of potential abnormal station for miniature circuit breaker;trajectory tracking error optimization based on iterative learning;research and design of the intelligent bulk transfer laboratory interconnection;energy management cloud platform in softwarized energy internet;simulation and application of a new power stealing method based on the comparison of two types of transformers;lane line detection based on machine vision;an analysis method of GIS equipment fault causes based on online monitoring and joint diagnosis;and virtual prototype design of flexible exoskeleton based on patternrecognition technology.
Recently, PWLU has been proposed to learn specialized activation functions with straightforward piecewise linear definition and SOTA performance in different vision tasks and neural networks. However, the uniformly di...
详细信息
ISBN:
(数字)9781665490627
ISBN:
(纸本)9781665490627
Recently, PWLU has been proposed to learn specialized activation functions with straightforward piecewise linear definition and SOTA performance in different vision tasks and neural networks. However, the uniformly distributed intervals strongly limit the flexibility of PWLU, and the definition of PWLU requires the statistic-based realignment method to handle the misalignment between PWLU and input data. this paper proposes a new piecewise linear activation function called Nonuniform Piecewise Linear Unit (N-PWLU). N-PWLU has two advantages to overcome the drawbacks in PWLU. First, nonuniformly distributed intervals are used to increase flexibility. Second, the cumulative definition establishes close connections to the parameters in different intervals, which helps alleviate the misalignment issue. Withthese advantages, N-PWLU significantly outperforms PWLU, especially with fewer intervals. For example, on ImageNet classification dataset, 4-interval N-PWLU outperforms 4-interval PWLU with 1.15% top-1 accuracy in MobileNetV3. Besides, the expressivity of 4-interval N-PWLU is compatible with 16-interval PWLU in different datasets and architectures. Fewer intervals simplify the computation of N-PWLU, which is friendly to be deployed on edge devices. We believe that our NPWLU gets a step further in learning better parametric activation functions.
Models based on machinelearning are optimization models that collect data, assess it, and deliver the reports required by specialists and management to make the best decisions. the application of contemporary machine...
详细信息
the aim of this paper to develop a machine-learning (ML) model for UNSW-NBI5 dataset cyberattack prediction. ML is a subset of AI which is a powerful tool for detection of the cyber-attacks. It uses existing data to i...
详细信息
暂无评论