Mobile devices have small displays, so a virtual display that would bring the benefits of a large computer display to users would be an interesting mobile accessory. However, sickness symptoms are a problem that shoul...
Mobile devices have small displays, so a virtual display that would bring the benefits of a large computer display to users would be an interesting mobile accessory. However, sickness symptoms are a problem that should be solved before a virtual display can be successful. To explore the symptom levels induced by virtual displays, we have tested several head-worn virtual display types in various contexts. Our results indicate that monocular and stereoscopic displays induce significant amount of adverse symptoms. On the other hand, the symptom levels induced by biocular displays are not different from the symptoms induced by direct view displays. The results suggest that a biocular display might be the best alternative as a mobile accessory.
This paper investigates the suitability of different tone rendering curves (gamma functions) at different external illumination levels and with different display content. It is difficult to find an optimum gamma value...
This paper investigates the suitability of different tone rendering curves (gamma functions) at different external illumination levels and with different display content. It is difficult to find an optimum gamma value because image quality strongly depends on the image content and the external illumination. Tuning the tone rendering curve based on ambient light sensor and / or image content of the display would significantly increase the perceived image quality.
In recent years, the unit selection based concatenative speech synthesis system that uses large speech database has become popular because it can produce high quality synthesized speech. However, using such a large sp...
详细信息
ISBN:
(纸本)9781424429424
In recent years, the unit selection based concatenative speech synthesis system that uses large speech database has become popular because it can produce high quality synthesized speech. However, using such a large speech database is not practical for many applications such as those ported on embedded devices with the storage requirement and the computational complexity involved in searching it. In this paper, it proposed the context based pruning algorithm and waveform adjustment effect based pruning algorithm to compact the speech database. At last, it presents experimental results and discussion.
A method is decribed which analyzes the basic pattern of beats in a piece of music, the musical meter. The analysis is performed jointly at three different time scales: at the temporally atomic tatum pulse level, at t...
详细信息
Adaptive Multi-Rate (AMR) codec [1] was standardised for GSM in 1999. AMR offers substantial improvement over previous GSM speech codecs [6] in error robustness by adapting speech and channel coding depending on chann...
详细信息
ISBN:
(纸本)0769521088
Adaptive Multi-Rate (AMR) codec [1] was standardised for GSM in 1999. AMR offers substantial improvement over previous GSM speech codecs [6] in error robustness by adapting speech and channel coding depending on channel conditions. However, current standard do not exploit the multi-rate capability of AMR codec in source signal based adaptation that would optimise the average bit-rate vs. quality trade-off. This paper presents a source signal based rate adaptation algorithm for AMR codec in GSM system. Together with fast power control, it can be used to increase the system capacity and further increase the robustness of GSM AMR codec.
Syllabification is an essential component of many speech and language processing systems. The development of automatic speech recognizers frequently requires working with subword units such as syllables. More importan...
详细信息
Syllabification is an essential component of many speech and language processing systems. The development of automatic speech recognizers frequently requires working with subword units such as syllables. More importantly, syllabification is an inevitable part of speech synthesis system. In this paper we present data-driven approaches to supervised learning and automatic detection of syllable boundaries. The generalization capability of the learning is investigated on the assignment of syllable boundaries to phoneme sequence representation in English. A rule-based self-correction algorithm is also proposed to automatically correct some syllabification errors. We conducted a series of experiments and the neural network approach is clearly better in terms of generalization performance and complexity.
Pronunciation dictionaries are often used with other data-driven methods to model the pronunciations in phoneme-based automatic speech recognition (ASR) and text-to-speech (TTS) systems. The dictionaries usually take ...
详细信息
Pronunciation dictionaries are often used with other data-driven methods to model the pronunciations in phoneme-based automatic speech recognition (ASR) and text-to-speech (TTS) systems. The dictionaries usually take a great amount of memory, which is a limiting factor in portable handheld devices. Compressing the pronunciation dictionaries results in minimal transmission bandwidth and less storage memory. In this paper we present a new procedure to efficiently compress pronunciation dictionaries. First, a novel method transforms the dictionary to a lower entropy representation. Second, the variability in the aligned pronunciation dictionary is reduced to further lower its entropy. Finally, generic lossless compression is applied on the transformed dictionary. Experiments were carried out on English names and words from US English CMU dictionary. The proposed scheme achieved 37.5% improvement over general-purpose lossless text compression.
An integrated method of text pre-processing and language identification is introduced to deal with the problem of mixed-language e-mail messages in a speech-enabled e-mail reading system. Our method can confidently di...
详细信息
An integrated method of text pre-processing and language identification is introduced to deal with the problem of mixed-language e-mail messages in a speech-enabled e-mail reading system. Our method can confidently distinguish between the supported languages and switch between several TTS engines or languages to read the portions of the text in the appropriate language. This is achieved by making use of the combined information from a text pre-processor and a language identifier that relies on both statistical information and linguistic features indicative of a particular language.
In this paper, a novel method for beginning of utterance detection is proposed for low complexity ASR systems. Assuming MFCC calculations in the ASR front-end, the additional computational load due to the algorithm is...
详细信息
In this paper, a novel method for beginning of utterance detection is proposed for low complexity ASR systems. Assuming MFCC calculations in the ASR front-end, the additional computational load due to the algorithm is negligible. The algorithm makes use of the delay between the MFCC calculation and decoding process, which is typical in front-ends with feature normalization. The main steps of the algorithm involve LDA projection of MFCC features, mean calculation over the projected features, simple implicit SNR estimation and weighting of the decision statistics according to the estimate. Our experimental results show that high performance is obtained down to fairly low SNR conditions as the beginning of utterance detection starts to fail in a safe way at about 5 dB SNR. These properties make the algorithm an attractive choice for low complexity ASR engines.
This paper introduces a novel speech coder structure for storage applications operating at low bit rates. The coder exploits the inherent segmental nature of speech signals by dividing the input into segments of varia...
详细信息
This paper introduces a novel speech coder structure for storage applications operating at low bit rates. The coder exploits the inherent segmental nature of speech signals by dividing the input into segments of variable length. Quite often the length of the segment is the same as the length of the phoneme. The individual segments are coded using adaptive techniques that take into account the relative perceptual importance of different types of speech, e.g. voiced and unvoiced speech. These main features of the proposed approach are enabled by the fact that many of the design constraints related to real-time conversational speech can be relaxed in storage applications. A practical implementation containing the speech-adaptive segmentation is described and its performance is verified in a listening test at average bit rates of about 1.0 kbps and 2.4 kbps respectively. The results show that the segmental model significantly improves the coding efficiency.
暂无评论