The task of Referring Expression Comprehension is a multimodal task, which involves two different fields: computer Vision and Natural Language Processing. Specifically, the task is to locate image region that correspo...
详细信息
With the advancement of deep learning technology, the accuracy of object detection in remote sensing images is significantly influenced by the data sources. Fusing optical and Synthetic Aperture Radar (SAR) images can...
详细信息
While there already exist a number of 2D and 3D pose estimation models with high accuracy, in special domains like sports, which usually require even higher accuracy, there are still spaces to be improved. Existing po...
详细信息
Data-driven neural network models trained on human motion data facilitate human activity recognition and identity verification applications. However, large annotated and processed human motion datasets are scarce, lea...
详细信息
Animal size impacts locomotion, due to its effect on the influence of gravity, inertia, and an animal’s internal elasticity and damping. We are developing a cat robot to further explore the impact of these four chara...
详细信息
This study simulates and examinesthe MEMS design process for a hypothetical sensor with an electromechanical interface. Conceptualizing the sensor entails establishing its purpose, operational environment, and capacit...
详细信息
This study evaluates the relevance of Phillips’ reward taxonomy in the context of contemporary video game design, with implications for modern gamification strategies. Phillips’ taxonomy categorizes game rewards int...
详细信息
CAPTCHA (Completely Automated Public Turing test to tell computers and Humans Apart) is an automated verification mechanism designed to distinguish between human visitors and automated systems as a vital component for...
详细信息
The relevance of this study is due to the solution of the problem of developing the basic principles and algorithms for providing adaptive settings for autonomous robots intelligent control systems as part of a human-...
详细信息
With the advancement of information technology, computer-Assisted Pronunciation Training (CAPT) has become an effective method for non-native(L2) speakers to learn foreign language pronunciation. However, existing aut...
详细信息
ISBN:
(纸本)9789819620531;9789819620548
With the advancement of information technology, computer-Assisted Pronunciation Training (CAPT) has become an effective method for non-native(L2) speakers to learn foreign language pronunciation. However, existing automatic pronunciation quality assessment methods have not fully leveraged the inter-granularity relationships and lack further extraction of contextual features at each granularity. To address these issues, this paper proposes Bfhaformer. Bfhaformer employs an LSTM-augmented BranchFormer encoder for encoding GOP features and reference phoneme features. Compared to Transformer encoders, the BranchFormer encoder introduces parallel branch structures, which enhances the capture of local features while retaining global feature information. Additionally, this paper aggregates features across different granularities within a hierarchical model structure. By aggregating and suprasegmental feature fusion of the encoded features at pronunciation granularity such as word level and utterance level, better attention is paid to local information at the current granularity and contextual hierarchical relationships. Experiments on the publicly available Speechocean762 dataset demonstrate that our proposed method significantly improves all metrics at all granularities compared to the baseline models.
暂无评论