This paper describes a model of pattern processing for music, based on the cost of remembrance. The algorism works to make a hierarchical pattern model with the lowest remembrance cost, which is defined by the functio...
This paper describes a conceptual basis for a compositional environment with multi-media. This project will be open to the public and will be provided for artists who are creating multi-media arts. The platforms are S...
详细信息
In this paper our recent activities concerning sensor integration in interactive digital art is described The approach we employed for the information input from performers of interactive art is the multi-modal utiliz...
详细信息
This is the second report of the PEGASUS (Performing Environment of Granulation, Automata, Succession, and Unified-Synchronism) project, in the first report at 1CMC92, compact/portable/real-time Granular Synthesizer w...
详细信息
An unsupervised acoustic model adaptation algorithm using MLLR and speaker selection for noisy environments is proposed. The proposed algorithm requires only one arbitrary utterance and environmental noise data. the a...
详细信息
ISBN:
(纸本)8790834100
An unsupervised acoustic model adaptation algorithm using MLLR and speaker selection for noisy environments is proposed. The proposed algorithm requires only one arbitrary utterance and environmental noise data. the adaptation procedure is composed of the following four steps. (!) Speaker selection from a large number of database speakers is carried out using GMM speaker models based on one arbitrary utterance. (2) Initial speaker adapted HMM acoustic models are calculated from the HMM sufficient statistics of the selected speakers, where the sufficient HMM statistics are pre-calculated and stored. (3) A small subset of the clean speech database from the selected speakers |and the environment noise data are superimposed. (4) MLLR adaptation is carried out using the noise-superimposed speech database from the selected speakers. I he proposed algorithm is evaluated in a 20k vocabulary dictation task for newspaper in noisy environments. We attain 85.7% word correct rate in 25dB SNR. which is slightly better than the matched model by the l.-M training using noise superimposed whole speech database. The proposed algorithm is also 7% better than the HMM composition algorithm.
This paper presents a real time gesture recognition method for interactive systems which can perform an interaction lire human communication. This method recognizes a user's gesture using the eigenspace constructe...
详细信息
ISBN:
(纸本)0818683449
This paper presents a real time gesture recognition method for interactive systems which can perform an interaction lire human communication. This method recognizes a user's gesture using the eigenspace constructed from multi input image sequences. Since this eigenspace represents the approximate 3 dimensional information of gesture, complicated gestures which may have a self-occlusion and confusion in a single input image sequence can be recognized correctly. Moreover, our method is suitable for obtaining the degree information of gesture, such as speed and magnitude. Using our method, we realize a real time interactive system, the Virtual Conductor System, which can control music played by a computer using gesture recognition results, and indicates the usefulness of our method.
This paper describes an automatic building of N-gram language models from Web texts for large vocabulary continuous speech recognition. Although a huge amount of well-formed texts are needed to train a model, collecti...
详细信息
ISBN:
(纸本)8790834100
This paper describes an automatic building of N-gram language models from Web texts for large vocabulary continuous speech recognition. Although a huge amount of well-formed texts are needed to train a model, collecting and organizing such text corpus for every task by hand needs a great labor. We need the language model to update frequently to cover the current topics. To deal with this problem, we propose an automatic language model creation method by collecting Web texts via keywordbased Web search engines. We can build a task-dependent language model by selecting suitable keywords for the task. A text filtering algorithm based on character perplexity is developed to extract proper Japanese texts from Web texts. A language model for a medical consulting task created by the proposed method shows the higher word recognition rate by 11.4% than that of a conventional newspaper language model.
In improvisational music, human accompanists understand patterns from the features of a soloist's playing, and use them effectively in many situations of real time performance. The authors have originated a comput...
详细信息
This paper describes an efficient method of unsupervised speaker adaptation. This method is based on (1) selecting a subset of speakers who are acoustically close to a test speaker, and (2) calculating adapted model p...
详细信息
ISBN:
(纸本)8790834100
This paper describes an efficient method of unsupervised speaker adaptation. This method is based on (1) selecting a subset of speakers who are acoustically close to a test speaker, and (2) calculating adapted model parameters according to the previously stored sufficient statistics of the selected speakers' data. In this method, only a few unsupervised test speaker's data are necessary for the adaptation. Also, by using the sufficientHMM statistics of the selected speakers' data, a quick adaptation can be done. Compared with a pre-clustering method, the proposed method can obtain a more optimal cluster because the clustering result is determined according to test speaker's data on-line. Experimental results show that the proposed method attains better improvement than MLLR from the speaker-independent model. The proposed method is evaluated in details and discussed.
This paper is intended as an investigation of some new interfaces for computer music and interactive multimedia art. We have been producing many sensors, interfaces and interactive systems for computer music, composin...
详细信息
暂无评论