It is well known that speech production and perception process is inherently bimodal consisting of audio and visual components. Recently there has been increased interest in using the visual modality in combination wi...
详细信息
ISBN:
(纸本)9781424417513
It is well known that speech production and perception process is inherently bimodal consisting of audio and visual components. Recently there has been increased interest in using the visual modality in combination with the acoustic modality for improved speech processing. This field of study has gained the title of audio-visual speech processing. Lip movement recognition, also known as lip reading, is a communication skill which involves the interpretation of lip movements in order to estimate some important parameters of the lips that include, but not limited to, size, shape and orientation. In this paper, we represent a hybrid framework for lip reading which is based on both audio and visual speech parameters extracted from a video stream of isolated spoken words. The proposed algorithm is self-tuned in the sense that it starts with an estimations of speech parameters based on visual lip features and then the coefficients of the algorithm are fine-tuned based on the extracted audio parameters. In the audio speech processing part, extracted audio features are used to generate a vector containing information of the speech phonemes. These information are used later to enhance the recognition and matching process. For lip feature extraction, we use a modified version of the method used by F. Huang and T. Chen for tracking of multiple faces. This method is based on statistical color modeling and the deformable template. The experiments based on the proposed framework showed interesting results in recognition of isolated words.
Perception processing in the human brain is mediated by functional connectivity among brain areas. To a certain extent, functional connectivity builds a communication channel among brain regions and even supports our ...
详细信息
ISBN:
(数字)9786165904773
ISBN:
(纸本)9786165904773
Perception processing in the human brain is mediated by functional connectivity among brain areas. To a certain extent, functional connectivity builds a communication channel among brain regions and even supports our perception and consciousness functions. However, the flow of information doesn't always go in the same direction between different parts of the brain, and it also uses bidirectional exchanges of information. The role of bidirected functional connectivity in the brain is still not well explored so far. More narrowly, we want to know how bidirectional communication regions in the different brain areas help with informationprocessing. To do this, we used directed information to quantify information exchange from simulation neural signals and real neural data that includes each part of the visual cortex region, and it showed that directed information can discover causal effects in both simulation and real experiments. All in all, in this study, we tried to understand information processes with bidirected functional connectivity and explored both feedforward and feedback information flow in the brain regions. On the one hand, we discovered feedback functional connectivity between visual regions, such as LRSC, LLOC, LOPA, and RPPA. Biologically speaking, this makes sense because natural images, such as objects, places, or more complex images, are typically represented in high-level visual regions. On the other hand, we also found that there is information flow between the scene-selective areas, e.g., OPA, PPA, RSC, and object-selective regions, e.g., LOC. As a result, we can gain a better understanding of how information is shared and communicated from an information-theoretical perspective with directed information.
As deep networks begin to be deployed as autonomous agents, the issue of how they can communicate with each other becomes important. Here, we train two deep nets from scratch to perform large-scale referent identifica...
ISBN:
(纸本)9781713845393
As deep networks begin to be deployed as autonomous agents, the issue of how they can communicate with each other becomes important. Here, we train two deep nets from scratch to perform large-scale referent identification through unsupervised emergent communication. We show that the partially interpretable emergent protocol allows the nets to successfully communicate even about object classes they did not see at training time. The visual representations induced as a by-product of our training regime, moreover, when re-used as generic visual features, show comparable quality to a recent self-supervised learning model. Our results provide concrete evidence of the viability of (interpretable) emergent deep net communication in a more realistic scenario than previously considered, as well as establishing an intriguing link between this field and self-supervised visual learning.(1)
Some products are explicitly or implicitly designed so that objects can be seen as a face;this will possibly support a fluent human-environment communication. The present study investigated the effects of seeing objec...
详细信息
ISBN:
(纸本)9786163618238
Some products are explicitly or implicitly designed so that objects can be seen as a face;this will possibly support a fluent human-environment communication. The present study investigated the effects of seeing objects as face on human's visual search performance by means of psychological experiments. The participants were asked to search a target among distractors on a computer display as quickly as possible. The target and distractors differed in the vertical direction. The participants were randomly assigned to a face task or a triangle task. In the face task, the visual stimulus was either a cartoon face or three dots arranged in triangle that could be seen as a face, and the participants were instructed to search a upright or inverted face among distractors. In the triangle task, the visual stimulus was either the three dots same as the face task or a line-drawing triangle, and the participants were instructed to search a triangle. In both tasks, two types of stimuli were randomly presented during the trial sequence. We found that visual search for the three-dot target was slower in the face task than in the triangle task. However, when the target stimulus was informed immediately before each trial, the results were reversed;visual search for the three-dot target in the face task was faster than in the triangle task. These results suggest that, even if the target stimulus par se is identical, seeing the target as face modulates visual search performance, and the effects interact with expectation or preparation of the subsequent target.
This paper deals with the extension of information theory to the assessment of visualcommunication from scene to observer. The mathematical development rigorously unites the electro-optical design of image gathering ...
详细信息
This paper deals with the extension of information theory to the assessment of visualcommunication from scene to observer. The mathematical development rigorously unites the electro-optical design of image gathering and display devices with the digital processing algorithms for image coding and restoration. Results show that: End-to-end system analysis closely correlates with measurable and perceptual performance characteristics, such as data rate and image quality, respectively. The goal of producing the best possible image at the lowest data rate can be realized only if (a) the electro-optical design of the image-gathering device is optimized for the maximum-realizable information rate and (b) the image-restoration algorithm properly accounts for the perturbations in the visualcommunication channel.
Noise interference and data loss are two major problems that affect the processing results of image data transmission and storage. Restoration of the lost information of an image based on the existing information is t...
详细信息
ISBN:
(纸本)9781424479078
Noise interference and data loss are two major problems that affect the processing results of image data transmission and storage. Restoration of the lost information of an image based on the existing information is the essence of inpainting. In this paper a new algorithm based on Sample and Hold interpolation and Iteration is proposed for reconstructing damaged images from existing regions and is compared to some other methods. The experimental results show the superiority of the visual quality and PSNR performance of the proposed method. It is observed that this approach can efficiently fill in the holes with visually plausible information.
B/S mode don't need client download any software as long as client installed a browser, and can implement multipoint to multipoint multimedia communication. The visual e-commerce platform applied video technology ...
详细信息
ISBN:
(纸本)9783037851579
B/S mode don't need client download any software as long as client installed a browser, and can implement multipoint to multipoint multimedia communication. The visual e-commerce platform applied video technology in e-commerce field to solve the credibility problem well. This paper above all introduced the technology framework of the visual e-commerce platform, furthermore shown two kinds of interface mode of the platform, induced the innovation places of the e-commerce platform in end, namely platform based on B/S architecture, simple e-commerce mode and operation easily, visual and interactive.
By utilizing computer graphics and image processing technology, we can effectively enhance the effectiveness of visualcommunication design. Specifically, using the principle of three primary colors to create a set of...
详细信息
Computer imaging technology is a kind of use of digital photography, using a computer as amedium to realize interactive communication and interaction between humans and machines through the collection and processing o...
详细信息
ISBN:
(纸本)9783031243660;9783031243677
Computer imaging technology is a kind of use of digital photography, using a computer as amedium to realize interactive communication and interaction between humans and machines through the collection and processing of images and the editing and storage of graphic information. The purpose of this paper to study the design of the 3D image visualcommunication system based on computer image technology is to improve the mastery of 3D image technology and design the visualcommunication system. This article mainly uses experimental and comparative methods to analyze the feature extraction situation of the 3D image visualcommunication system, and finds that the error of the improved RANSAC algorithm in image feature extraction is about 54%, while the unimproved algorithm and other algorithms The error is greater. This shows that the improved algorithm proposed in this paper is incomparable in the 3D image visualcommunication system.
暂无评论