We investigate predictive coding for reducing the amount of data communicated between a haptic controller and a host. This allows increased update rate, which potentially improves quality even if coding is lossy. A lo...
详细信息
We investigate predictive coding for reducing the amount of data communicated between a haptic controller and a host. This allows increased update rate, which potentially improves quality even if coding is lossy. A low-order predictive coding is investigated for a pneumatic force display. Due to human and device characteristics, some compression is possible without loss, although the technique is lossy in general. Lossy uniform and nonuniform quantizers are also investigated. An experiment was conducted to determine how much data reduction is possible before compression artifacts become detectable to users.
On the basis of recent binary signal detection theory (BSDT), optimal recognition algorithms for complex images are constructed and their optimal performance are calculated. A methodology for comparing BSDT prediction...
详细信息
On the basis of recent binary signal detection theory (BSDT), optimal recognition algorithms for complex images are constructed and their optimal performance are calculated. A methodology for comparing BSDT predictions and measured human performance is developed and applied to explaining particular face recognition experiment. The BSDT makes possible computer codes with recognition performance better than that in humans, its fundamental discreteness is consistent with the experiment. Related neurobiological and behavioral effects are briefly discussed.
Progressive encoding of a signal generally involves an estimation step, designed to reduce the entropy of the residual of an observation over the entropy of the observation itself. Oftentimes the conditional distribut...
详细信息
Progressive encoding of a signal generally involves an estimation step, designed to reduce the entropy of the residual of an observation over the entropy of the observation itself. Oftentimes the conditional distributions of an observation, given already-encoded observations, are well fit within a class of symmetric and unimodal distributions (e.g., the two-sided geometric distributions in images of natural scenes, or symmetric Paretian distributions in models of financial data). It is common practice to choose an estimator that centers, or aligns, the modes of the conditional distributions, since it is common sense that this will minimize the entropy, and hence the coding cost of the residuals. But with the exception of a special case, there has been no rigorous proof. Here we prove that the entropy of an arbitrary mixture of symmetric and unimodal distributions is minimized by aligning the modes. The result generalizes to unimodal and rotation-invariant distributions in R(n). We illustrate the result through some experiments with natural images.
The JPEG baseline algorithm codes the dc of a block by giving its difference with the dc of the previous block. We propose to use ac coefficients for this purpose. Our method computes the difference of the sum of pixe...
详细信息
The JPEG baseline algorithm codes the dc of a block by giving its difference with the dc of the previous block. We propose to use ac coefficients for this purpose. Our method computes the difference of the sum of pixels of two boundary columns (or rows), one belonging to the current block and the other to a previous block, and then manipulates it in the direct cosine transform (DCT) domain so that the average of the coded differences for the whole image is near zero. Experimental results show that our method reduces the average JPEG dc residual by about 75% for images compressed at the default quality level. The reduction is even higher for unquantized DCT blocks. (c) 2008 Society of Photo-optical Instrumentation Engineers.
To identify and categorize complex stimuli such as familiar objects or speech, the human brain integrates information that is abstracted at multiple levels from its sensory inputs. Using cross-modal priming for spoken...
详细信息
To identify and categorize complex stimuli such as familiar objects or speech, the human brain integrates information that is abstracted at multiple levels from its sensory inputs. Using cross-modal priming for spoken words and sounds, this functional magnetic resonance imaging study identified 3 distinct classes of visuoauditory incongruency effects: visuoauditory incongruency effects were selective for 1) spoken words in the left superior temporal sulcus (STS), 2) environmental sounds in the left angular gyrus (AG), and 3) both words and sounds in the lateral and medial prefrontal cortices (IFS/mPFC). From a cognitive perspective, these incongruency effects suggest that prior visual information influences the neural processes underlying speech and sound recognition at multiple levels, with the STS being involved in phonological, AG in semantic, and mPFC/IFS in higher conceptual processing. In terms of neural mechanisms, effective connectivity analyses (dynamic causal modeling) suggest that these incongruency effects may emerge via greater bottom-up effects from early auditory regions to intermediate multisensory integration areas (i.e., STS and AG). This is consistent with a predictive coding perspective on hierarchical Bayesian inference in the cortex where the domain of the prediction error (phonological vs. semantic) determines its regional expression (middle temporal gyrus/STS vs. AG/intraparietal sulcus).
Human face-to-face communication is essentially audiovisual. Typically, people talk to us face-to-face, providing concurrent auditory and visual input. Understanding someone is easier when there is visual input, becau...
详细信息
Human face-to-face communication is essentially audiovisual. Typically, people talk to us face-to-face, providing concurrent auditory and visual input. Understanding someone is easier when there is visual input, because visual cues like mouth and tongue movements provide complementary information about speech content. Here, we hypothesized that, even in the absence of visual input, the brain optimizes both auditory-only speech and speaker recognition by harvesting speaker-specific predictions and constraints from distinct visual face-processing areas. To test this hypothesis, we performed behavioral and neuroimaging experiments in two groups: subjects with a face recognition deficit (prosopagnosia) and matched controls. The results show that observing a specific person talking for 2 min improves subsequent auditory-only speech and speaker recognition for this person. In both prosopagnosics and controls, behavioral improvement in auditory-only speech recognition was based on an area typically involved in face-movement processing. Improvement in speaker recognition was only present in controls and was based on an area involved in face-identity processing. These findings challenge current unisensory models of speech processing, because they show that, in auditory-only speech, the brain exploits previously encoded audiovisual correlations to optimize communication. We suggest that this optimization is based on speaker-specific audiovisual internal models, which are used to simulate a talking face.
Visual changes in feature movies, like in real-live, can be partitioned into global flow due to self/camera motion, local/differential flow due to object motion, and residuals, for example, due to illumination changes...
详细信息
Visual changes in feature movies, like in real-live, can be partitioned into global flow due to self/camera motion, local/differential flow due to object motion, and residuals, for example, due to illumination changes. We correlated these measures with brain responses of human volunteers viewing movies in an fMRI scanner. Early visual areas responded only to residual changes, thus lacking responses to equally large motion-induced changes, consistent with predictive coding. Motion activated V5+ (MT+), V3A, medial posterior parietal cortex (mPPC) and, weakly, lateral occipital cortex (LOC). V5+ responded to local/differential motion and depended on visual contrast, whereas mPPC responded to global flow spanning the whole visual field and was contrast independent. mPPC thus codes for flow compatible with unbiased heading estimation in natural scenes and for the comparison of visual flow with nonretinal, multimodal motion cues in it or downstream. mPPC was functionally connected to anterior portions of V5+, whereas laterally neighboring putative homologue of lateral intraparietal area (LIP) connected with frontal eye fields. Our results demonstrate a progression of selectivity from local and contrast-dependent motion processing in V5+ toward global and contrast-independent motion processing in mPPC. The function, connectivity, and anatomical neighborhood of mPPC imply several parallels to monkey ventral intraparietal area (VIP).
Edges provide critical information which enables viewers to better discern objects in images. Although transform-based image compression schemes have been successful, they are unable to efficiently represent 2D edges....
详细信息
ISBN:
(纸本)9781424417650
Edges provide critical information which enables viewers to better discern objects in images. Although transform-based image compression schemes have been successful, they are unable to efficiently represent 2D edges. Wedgelets capture geometrical structures in images by explicitly defining an edge. In this paper, we introduce high order wedgelets in a more generalized form of quad-tree partitioning to realize improved compression performance compared to existing compression methods.
An algorithm for designing linear prediction-based two-channel multiple-description predictive-vector quantizers;(MD-PVQs) for packet-loss channels is presented. This algorithm iteratively improves the encoder partiti...
详细信息
An algorithm for designing linear prediction-based two-channel multiple-description predictive-vector quantizers;(MD-PVQs) for packet-loss channels is presented. This algorithm iteratively improves the encoder partition, the set of multiple description codebooks, and the linear predictor for a given channel loss probability, based on a training set of source data. The effectiveness of the designs obtained with the given algorithm is demonstrated using a waveform coding example involving a Markov source as well as vector quantization of speech line' spectral pairs.
Image compression reduces time and cost in image storage without significant reduction of the image quality. This paper puts forward a wavelet-based predictive image coding algorithm, which has a higher coding rate th...
详细信息
ISBN:
(纸本)9780769534893
Image compression reduces time and cost in image storage without significant reduction of the image quality. This paper puts forward a wavelet-based predictive image coding algorithm, which has a higher coding rate than traditional coding algorithm. Based on the algorithm above, this article adopts the selective image compression technique to compress facial images. This algorithm attains a compression ratio from decade to several decades and settles the transmission and storage problem preferably.
暂无评论