This paper investigates the effect of bitrate control methods on QoE of multi-view video and audio streaming with MPEG-DASH. We adopt three bitrate control methods for conventional single-view video streaming to the M...
This paper investigates the effect of bitrate control methods on QoE of multi-view video and audio streaming with MPEG-DASH. We adopt three bitrate control methods for conventional single-view video streaming to the MVV-A system with MPEG-DASH. We conduct a subjective experiment changing available network bandwidth and investigate the effect of the methods on QoE.
Lampung Province is one of the regions with the highest cocoa production, but many factors can interfere with cocoa production. One of them is the factor of pests and diseases of cocoa that cannot be identified and pr...
详细信息
This paper evaluates the QoE of video and audio transmission over a full-duplex wireless LAN with interference traffic through a computer simulation and a subjective experiment. We employ a simulation environment with...
This paper evaluates the QoE of video and audio transmission over a full-duplex wireless LAN with interference traffic through a computer simulation and a subjective experiment. We employ a simulation environment with a pair of audiovisual transmission and reception terminals and a pair of interference traffic transmission and reception ones. We investigate the effect of the transmission rate of interference traffic and communication distance in a wireless channel on the output quality of the video and audio stream at the reception terminal. We perform a subjective experiment with the output timing of video and audio obtained by the simulation.
Wireless communication via unmanned aerial vehicles (UAVs) has drawn a great deal of attention due to its flexibility in establishing line-of-sight (LoS) communications. However, in complex urban and dynamic environme...
详细信息
Emotion recognition can help human-computer interactions by enabling systems to respond empathetically and adapt to users' emotional conditions. This capability improves user experience, supporting the development...
详细信息
ISBN:
(数字)9798331508579
ISBN:
(纸本)9798331508586
Emotion recognition can help human-computer interactions by enabling systems to respond empathetically and adapt to users' emotional conditions. This capability improves user experience, supporting the development of a more intuitive and emotionally responsive communication system. This study analyzes a bimodal approach based on gender (male and female) to recognize emotions without contextual information in dialogue analysis. Utilizing the Multimodal EmoryNLP dataset extracted from the TV series Friends with acted speech, we focused on four primary emotions: Angry, Neutral, Joy, and Scared. The model used in this study for text classification is RoBERTa, and wav2vec 2.0 is used for audio feature extraction with the Bi-LSTM model for classification. The experiment results using weighted F1-score reveal that data augmentation enhanced the performance of analyzing the original dataset from 0.46% to 0.52% and the male dataset from 0.43% to 0.51 %. In comparison, the female dataset remained consistent at 0.46%. The weighted F1-score and Unweighted Averaged Recall (UAR) from the male dataset are higher, 51 % and 48%, respectively, than those from the female dataset, 46% and 47%, respectively. Gender-based analysis indicated that male and female datasets exhibited distinct performance patterns, highlighting variations in emotional expression and recognition between genders. These findings underscore the effectiveness of multimodal strategies in emotion recognition and suggest that gender-specific factors play a significant role in enhancing classification performance. While these results highlight performance trends, further validation through repeated trials and statistical analyses could provide stronger generalizations and insights into gender-based differences.
Speech content is closely related to the stability of speaker embeddings in speaker verification tasks. In this paper, we propose a novel architecture based on self-constraint learning (SCL) and reconstruction task (R...
详细信息
In our study, we explore methods for detecting unwanted content lurking in visual datasets. We provide a theoretical analysis demonstrating that a model capable of successfully partitioning visual data can be obtained...
ISBN:
(纸本)9798331314385
In our study, we explore methods for detecting unwanted content lurking in visual datasets. We provide a theoretical analysis demonstrating that a model capable of successfully partitioning visual data can be obtained using only textual data. Based on the analysis, we propose Hassle-Free Textual Training (HFTT), a streamlined method capable of acquiring detectors for unwanted visual content, using only synthetic textual data in conjunction with pre-trained vision-language models. HFTT features an innovative objective function that significantly reduces the necessity for human involvement in data annotation. Furthermore, HFTT employs a clever textual data synthesis method, effectively emulating the integration of unknown visual data distribution into the training process at no extra cost. The unique characteristics of HFTT extend its utility beyond traditional out-of-distribution detection, making it applicable to tasks that address more abstract concepts. We complement our analyses with experiments in out-of-distribution detection and hateful image detection. Our codes are available at https://***/Saehyung-Lee/HFTT
Artificial Intelligence (AI) and marketing have transformed consumer behavior and shopping experiences, especially through Recommender Systems (RSs) in e-commerce. RSs use algorithms to provide personalized recommenda...
详细信息
暂无评论