Accents, characterized by deviations from standard pronunciation, often lead to a sharp decline in the performance of speech recognition systems. This issue becomes even more serious when dealing with unsupervised acc...
详细信息
Accents, characterized by deviations from standard pronunciation, often lead to a sharp decline in the performance of speech recognition systems. This issue becomes even more serious when dealing with unsupervised accents under low-resource conditions. And the effective utilization of limited unsupervised accent speech data to enhance accent-robust ASR remains largely unexplored and presents significant challenges. In this study, we introduce a novel approach termed Joint Unsupervised and Supervised Data Training (JUSDT) to handle this issue. In JUSDT, the supervised ASR training and unsupervised representation learning are treated as two distinct tasks but jointly trained in a single training process. Building upon JUSDT, we further explore two additional variants: Phoneme-Quantizer JUSDT and JUSDT+, which respectively employ phoneme codebook learning to generate accent-invariant representations for accent normalization, and enable supervised training for unsupervised accent speech data to well-incorporate both textual and acoustic contextual information. Our experiments are performed on both the Librispeech dataset and accented English ASR tasks. Results demonstrate that the proposed methods outperform our strong baseline by relative 3.2% to 9.3% word error rate reductions across multiple test sets.
Large industrial facilities such as particle accelerators and nuclear power plants are critical infrastructures for scientific research and industrial processes. These facilities are complex systems that not only requ...
详细信息
Continuous monitoring of human vital signs using non-contact mmWave radars is attractive due to their ability to penetrate garments and operate under different lighting conditions. Unfortunately, most prior research r...
详细信息
Pōwhiri is a traditional welcoming ceremony in Māori culture in Aotearoa New Zealand. Certain protocols (tikanga) have to be followed and must be learned. There is a general lack of understanding regarding this due ...
详细信息
ISBN:
(纸本)9798400717079
Pōwhiri is a traditional welcoming ceremony in Māori culture in Aotearoa New Zealand. Certain protocols (tikanga) have to be followed and must be learned. There is a general lack of understanding regarding this due to the scarcity of pōwhiri. The immersive and experiential nature of Virtual Reality (VR) can be used as a tool to increase understanding and confidence. We have implemented a VR learning tool in the reconstructed context of Te Rau Aroha marae in Bluff which allows for safe practice of pōwhiri before applying it in a real welcoming ceremony. A cultural evaluation study was conducted first followed by a user study to determine understanding and confidence gains, highlighting the complexity of the topic. The user study clearly showed that the system was useful to increase participants’ understanding in regards to pōwhiri as well as their confidence surrounding pōwhiri. We are confident that our experiences and key findings from the studies can be used to drive further development of VR tools in the context of visualising cultural ceremonies.
To make social robots effective in education, they need to be autonomous both in terms of assessing the student’s engagement state as well as intervening effectively in soft real-time when necessary. Hidden Markov Mo...
To make social robots effective in education, they need to be autonomous both in terms of assessing the student’s engagement state as well as intervening effectively in soft real-time when necessary. Hidden Markov Model (HMM) is an interpretable machine learning technique for modeling temporal data that is commonly used post-hoc to analyse latent learning processes. In this paper, we contribute by proposing an HMM-based intervention methodology for assessing and classifying the state of the student as either productive or unproductive in soft real-time. The system identifies and tracks states and patterns not conducive to learning, and a robot intervention is triggered whenever a too-high non-productive engagement is detected. In a pilot study with 22 children, we evaluate this methodology in terms of both 1) the effectiveness of the interventions on the students’ learning gains and on behaviors found conducive to learning, and 2) the students’ perception of the robotic interventions. Results suggest that the robot interventions have a positive effect on the post-test scores relative to the baseline robot, although there isn’t a significant difference in the learning gains. Moreover, interventions that try to induce reflective behaviors are most effective in inducing the required learning behavior, followed by communication-inducing interventions. Lastly, students’ perception of intervention usefulness does not reflect their actual effectiveness.
A wider incorporation of robots into classrooms is hampered by current technological limitations on full autonomy in social robots. Automated speech recognition, for example, a key enabler for vocal communication, is ...
详细信息
ISBN:
(数字)9781728188591
ISBN:
(纸本)9781665406802
A wider incorporation of robots into classrooms is hampered by current technological limitations on full autonomy in social robots. Automated speech recognition, for example, a key enabler for vocal communication, is still unable to perform with sufficient accuracy. Past studies have shown that humans adjust their speech patterns to accommodate less skilled interlocutors. If such a response holds in human-robot interactions as well, we may be able to exploit it to lessen the burden on social robots and enable rich, autonomous vocal communication. In this paper we explore whether a robot’s speaking ability could have an impact on children’s speech patterns, learning, and engagement by designing an interaction where a child and a robot collaborate on a Tower of Hanoi puzzle. Sixteen children aged 7-14 completed this collaborative task partnered with a social robot that communicated with either high verbal (full sentences), low verbal (short phrases or single words), or nonverbal (sound-based utterances) vocalization. While we found no significant impact on children’s speech patterns or learning due to the robot’s method of vocalization, children in the non-verbal condition had a significantly lower perception of the robot’s intelligence along with higher rates of providing feedback and more instances of undoing its moves. This suggests that a link may exist between a robot’s perceived speaking ability and children’s confidence in that robot’s overall intelligence and capability in a collaborative task, as well as their empathy towards a peer they perceive as less skilled in the task.
The social aspects of therapy and training are important for patients to avoid social isolation and must be considered when designing a platform, especially for home-based rehabilitation. We proposed an online version...
详细信息
ISBN:
(数字)9781728188591
ISBN:
(纸本)9781665406802
The social aspects of therapy and training are important for patients to avoid social isolation and must be considered when designing a platform, especially for home-based rehabilitation. We proposed an online version of the previously proposed tangible Pacman game for upper limb training with haptic-enabled tangible Cellulo robots. Our main objective is to enhance motivation and engagement through social integration and also to form a gamified multiplayer rehabilitation at a distance. Thus, allowing relatives, children, and friends to connect and play with their loved ones while also helping them with their training from anywhere in the world. As well as connecting therapists to their patients through haptically linking capabilities. This is especially relevant when there are social distancing measures which might isolate the elderly population, a majority of all rehabilitation patients.
Young adults often encounter challenges in career exploration. Self-guided interventions, such as the letter-exchange exercise, where participants envision and adopt the perspective of their future selves by exchangin...
详细信息
This paper introduces a novel approach for enabling real-time imitation of human head motion by a Nao robot, with a primary focus on elevating human-robot interactions. By using the robust capabilities of the MediaPip...
This paper introduces a novel approach for enabling real-time imitation of human head motion by a Nao robot, with a primary focus on elevating human-robot interactions. By using the robust capabilities of the MediaPipe as a computervision library and the DeepFace as an emotion recognition library, this research endeavors to capture the subtleties of human head motion, including blink actions and emotional expressions, and seamlessly incorporate these indicators into the robot’s responses. The result is a comprehensive framework which facilitates precise head imitation within human-robot interactions, utilizing a closed-loop approach that involves gathering real-time feedback from the robot’s imitation performance. This feedback loop ensures a high degree of accuracy in modeling head motion, as evidenced by an impressive R2 score of 96.3 for pitch and 98.9 for yaw. Notably, the proposed approach holds promise in improving communication for children with autism, offering them a valuable tool for more effective interaction. In essence, proposed work explores the integration of real-time head imitation and real-time emotion recognition to enhance human-robot interactions, with potential benefits for individuals with unique communication needs.
Visual Simultaneous Localization and Mapping (vSLAM) has achieved great progress in the computervision and robotics communities, and has been successfully used in many fields such as autonomous robot navigation and A...
详细信息
暂无评论