We describe improvements made over the past year to Joshua, an open-source translation system for parsing-based machine translation. The main contributions this past year are significant improvements in both speed and...
详细信息
GMM-UBM-based speaker verification heavily relies on a well trained UBM. In practice, it is not often easy to obtain an UBM that fully matches acoustic channels in operation. To solve this problem, we propose a novel ...
详细信息
How do humans attend to and pick out relevant auditory objects amongst all other sounds in the environment? Based on neurophysiological findings we propose two task oriented attentional mechanisms acting as Bayesian p...
详细信息
ISBN:
(纸本)9781479903573
How do humans attend to and pick out relevant auditory objects amongst all other sounds in the environment? Based on neurophysiological findings we propose two task oriented attentional mechanisms acting as Bayesian priors which act on two separate levels of processing: a sensory mapping stage and object representation stage. The former sensory stage is modeled as a high dimensional mapping which captures the spectrotemporal nuances and cues of auditory objects. The latter object representation stage then captures the statistical distribution of the different classes of acoustic scenes. This scheme shows a relative improvement in performance by 81 % compared to a baseline system.
Speaker verification suffers from significant performance degradation on emotional speech. We present an adaptation approach based on maximum likelihood linear regression (MLLR) and its feature-space variant, CMLLR. O...
详细信息
This paper presents Korean-Thai lexicon. This research aims to study and collect necessary features to construct the Korean-Thai lexicon for natural languageprocessing (NLP) and speechprocessing researches. The rese...
详细信息
This paper presents Korean-Thai lexicon. This research aims to study and collect necessary features to construct the Korean-Thai lexicon for natural languageprocessing (NLP) and speechprocessing researches. The research method used for study was that of (1) creating Korean-Thai lexicon consisting of 7 parts : Korean words, Korean Revised Romanization, part of speech, sub part of speech, special characteristic, Thai meaning and description of meaning (2) Korean transcription. According to lack of useful tools for the Korean- Thai machine translation, therefore we have a proposal for creating Korean-Thai lexicon for machine translation. The Korean-Thai lexicon consists of 36,000 Korean words. As it would take a lot of time and effort to gather enough Korean words to cover all domains, Korean Revised Romanization was applied for some words such as terminology, names and places.
Hearing engages in a seemingly effortless way, complex processes that allow our brains to parse the acoustic environment around us into perceptual sound objects, in a phenomenon called streaming or stream segregation....
详细信息
Hearing engages in a seemingly effortless way, complex processes that allow our brains to parse the acoustic environment around us into perceptual sound objects, in a phenomenon called streaming or stream segregation. In this paper, we explore the hypothesis that the auditory system relies on the regularity inherent to each stream to segregate it from other competing streams in the scene. Tracking these regularities is achieved via a recursive prediction that tracks the evolution of each stream, using a Kalman filtering approach. The proposed approach combines spectral analysis operating at the level of the auditory periphery with a temporal analysis using Kalman tracking. To incorporate nonlinear relationships in the signal patterns, we employ an extended Kalman filter. This scheme is tested on sinusoidal patterns, or the two tone paradigm. The combined spectral and temporal analysis developed here is able to predict perceptual results of stream segregation by human listeners in a two tone paradigm.
language can describe our visual world at many levels, including not only what is literally there but also the sentiment that it invokes. In this paper, we study visual language, both literal and sentimental, that des...
详细信息
language can describe our visual world at many levels, including not only what is literally there but also the sentiment that it invokes. In this paper, we study visual language, both literal and sentimental, that des...
详细信息
A table-of-contents (TOC) provides a quick reference to a document's content and structure. We present the first study on identifying the hierarchical structure for automatically generating a TOC using only textua...
详细信息
A table-of-contents (TOC) provides a quick reference to a document's content and structure. We present the first study on identifying the hierarchical structure for automatically generating a TOC using only textual features instead of structural hints e.g. from HTML-tags. We create two new datasets to evaluate our approaches for hierarchy identification. We find that our algorithm performs on a level that is sufficient for a fully automated system. For documents without given segment titles, we extend our work by automatically generating segment titles. We make the datasets and our experimental framework publicly available in order to foster future research in TOC generation.
Word recognition testing may be defined as a procedure to assess a listener’s ability to identify one-syllable words (such as phonetically-balanced/PB words) that are presented at a given suprathreshold level to arri...
Word recognition testing may be defined as a procedure to assess a listener’s ability to identify one-syllable words (such as phonetically-balanced/PB words) that are presented at a given suprathreshold level to arrive at a word recognition score. For Thai, Thammasat University and Ramathibodi Hospital Phonetically Balanced Word Lists 2015 (TU-RAMA PB’15) were created with five lists, each with 25 monosyllabic words. Besides its phoneme distributions being based on large-scale Thai spoken corpora, TU-RAMA PB’15 is in line with TU PB’14 with emphasis on phonetic balance, symmetrical phoneme occurrence, and word familiarity. To evaluate its homogeneity in terms of decibel intelligibility, the lists were recorded and presented to 10 normal hearing participants, ranging from 0 to 50 dB HL in 2 dB increments (ascending order) until they repeated correct verbal responses. Using logistic regression, regression slopes and intercepts were calculated to estimate percentage of correct performance at any given intensity and to construct psychometric functions for every list. Derived psychometric function slopes ranged from 0.2015 to 0.2262 while intensities required for 50% intelligibility ranged from 17.0876 to 20.8856. Two-way Chi-Square analysis performed on both parameters indicated that there was no significant difference among the five lists.
暂无评论