We propose a real-time gaze estimation method based oil facial-feature tracking using single video camera that does not require any special user action for calibration. Many gaze estimation methods have been already p...
详细信息
ISBN:
(纸本)9781595939821
We propose a real-time gaze estimation method based oil facial-feature tracking using single video camera that does not require any special user action for calibration. Many gaze estimation methods have been already proposed;however, most conventional gaze track-in, algorithms can only he applied to experimental environments due to their complex calibration procedures and lacking of usability. In this paper, we propose a gaze estimation method that can apply to daily-life situations. Gaze directions are determined as D vectors connecting both the eyeball bind the iris centers. Since the eyeball center and radius cannot be directly observed from images, the geometrical relationship between the eyeball centers and the facial features and eyeball radius (face/eye model) are calculated in advance. Then, the 2D positions of the eyeball centers can be determined by tracking the facial features. While conventional methods require instructing users to perform such special actions as looking at several reference points in the calibration process, the proposed method does not require such special calibration action of users and call he realized by combining 3D eye-model-based gaze estimation and circle-based algorithms for eye-model calibration. Experimental results show that the gaze estimation accuracy of the proposed method is 5 degrees horizontally and 7 degrees vertically. With our proposed method, various application such as gaze-communication robots, gaze-based interactive signboards, etc. that require gaze information in daily-life situations are possible.
This article introduces automatic speech recognition based on Electro-Magnetic Articulography (EMA). Movements of the tongue, lips, and jaw are tracked by an EMA device, which are used as features to create Hidden Mar...
详细信息
ISBN:
(纸本)9781424474936
This article introduces automatic speech recognition based on Electro-Magnetic Articulography (EMA). Movements of the tongue, lips, and jaw are tracked by an EMA device, which are used as features to create Hidden Markov Models (HMM) and recognize speech only from articulation, that is, without any audio information. Also, automatic phoneme recognition experiments are conducted to examine the contribution of the EMA parameters to robust speech recognition. Using feature fusion, multistream HMM fusion, and late fusion methods, noisy audio speech has been integrated with EMA speech and recognition experiments have been conducted. The achieved results show that the integration of the EMA parameters significantly increases an audio speech recognizer's accuracy, in noisy environments.
Applying the technologies of a network robot system, we incorporate the recommendation methods used in Ecommerce in a retail shop in the real world. We constructed a platform for ubiquitous-networked robots that focus...
详细信息
This study addresses a method to predict pedestrians' long term behavior in order to enable a robot to provide them services. In order to do that we want to be able to predict their final goal and the trajectory t...
详细信息
This paper introduces a "teleoperated communication robot" whose unique point is that its language component is performed by an operator to avoid the automatic recognition difficulty of spoken language. On t...
详细信息
Mono-syllabic interjections are often used to express a reaction in conversational speech. It is known that there is relationship between the speaking style, given by intonation and voice quality-related prosodic feat...
ISBN:
(纸本)9787560848693
Mono-syllabic interjections are often used to express a reaction in conversational speech. It is known that there is relationship between the speaking style, given by intonation and voice quality-related prosodic features, and the paralinguistic information carried by an interjection. However, it is also known that this relationship is dependent on the interjection type. In the present work, we analyzed the relationship between speaking style and the conveyed paralinguistic information item for several mono-syllabic interjection types in Japanese. Evaluation results show that acoustic parameters related to intonation and voice quality features in conjunction with the identity of the interjection are effective for disambiguating 71% of the paralinguistic information items.
Vocal fry is a voice quality that often appears in relaxed voices indicating low tension, or in pressed voices expressing attitudes/feelings of surprise, admiration and suffering. We propose a set of acoustic measures...
详细信息
Vocal fry is a voice quality that often appears in relaxed voices indicating low tension, or in pressed voices expressing attitudes/feelings of surprise, admiration and suffering. We propose a set of acoustic measures for automatically detecting vocal fry segments in speech utterances. In order to deal with vocal fry utterances with very low fundamental frequencies, where classic short-term analysis methods become problematic, a glottal pulse synchronized method is proposed. The acoustic measures are based on power, periodicity and similarity properties of vocal fry signals. The basic idea is to scan for power peaks in a "very short-term" power contour, check for periodicity properties and evaluate a similarity measure between power peaks for deciding the possibility of vocal fry pulses. Sub-harmonic properties are also taken into account in the periodicity analysis. Evaluation of the proposed measures in automatic detection resulted in 73.3 % correct detection, with an insertion error rate of 3.9 %.
Qualitative analyses are conducted in spontaneous dialogue speech of several speakers, to verify the paralinguistic roles of breathy/whispery voice qualities in communication. Analyses show that breathy/whispery voice...
详细信息
ISBN:
(纸本)9780616220030
Qualitative analyses are conducted in spontaneous dialogue speech of several speakers, to verify the paralinguistic roles of breathy/whispery voice qualities in communication. Analyses show that breathy/whispery voices carry a variety of emotionor attitude-related paralinguistic information. Breathiness often appeared in emphasized words/phrases, having the effect of calling/catching the listener's attention. It also rhythmically appeared in utterances expressing the speaker's excitement. In backchannels, breathiness has the effect of expressing politeness or interest to the listener's talk. When accompanied by a softer voice quality, breathiness is used to call/catch the listener's attention, while expressing gentleness or tenderness. A more whispered and low-powered voice quality appears in confidential talking, embarrassment, or when the speaker is talking to oneself.
The use of voice quality features in addition to prosodic features is proposed for automatic extraction of paralinguistic information (like speech acts, attitudes and emotions) in dialog speech. Perceptual experiments...
详细信息
This paper presents an analysis on the functions carried by phrase final tones in turn-taking and dialog acts, taking into account linguistic information about the part of speech (particles and auxiliary verbs) attrib...
详细信息
ISBN:
(纸本)9781604234497
This paper presents an analysis on the functions carried by phrase final tones in turn-taking and dialog acts, taking into account linguistic information about the part of speech (particles and auxiliary verbs) attributed to the morphemes at phrase finals. Natural conversational speech data are segmented in inter-pause units, and each utterance unit is arranged according to the phrase final morphemes. Turn-taking functions are annotated, and tones of each phrase final are described by acoustic-prosodic features. Analysis results show a relationship between tones and turn-taking functions in most of the morphemes, while no clear relationship is found in some classes of morphemes which are final particles.
暂无评论