Currently available Personal video Recorders find and store whole TV programs. Our system, video Scouting, not only finds and stores programs;it automatically segments and indexes story segments from the programs acco...
详细信息
ISBN:
(纸本)0780370414
Currently available Personal video Recorders find and store whole TV programs. Our system, video Scouting, not only finds and stores programs;it automatically segments and indexes story segments from the programs according to viewers' profiles. The extracted descriptions serve the viewers' content information requests for program segment selection, e.g. play the three minute interview with Hillary Clinton. To achieve this, the system combines information from the audio, visual, and transcript domains in a probabilistic framework based on Bayesian networks. In this paper we describe the overall architecture, a system implementation, and discuss some experimental results.
作者:
Baazaoui, AbirBarhoumi, WalidZagrouba, EzzeddineUniv Tunis El Manar
Inst Super Informat El Manar Res Team Intelligent Syst Imaging & Artificial Vi Lab Rech Informat Modelisat & Traitement Informat 2 Rue Abou Raihane Bayrouni Ariana 2080 Tunisia Univ Carthage
ENICarthage Rue Abou Raihane Bayrouni45 Rue Entrepreneurs Ariana 2080 Tunisia
Semantic gap, which is the difference between low-level image features and their high-level semantics, has become very popular and witnessed great interest in the last two decades. This paper deals with this problem a...
详细信息
ISBN:
(纸本)9781538633687
Semantic gap, which is the difference between low-level image features and their high-level semantics, has become very popular and witnessed great interest in the last two decades. This paper deals with this problem and proposes a hybrid approach to learn image semantic concepts for modeling visual features in discriminative learning stage. It combines the advantages of human-in-the-loop and discriminative semantic models. Herein, we investigate the expert-domain knowledge and expertise owing to expert-in-the-loop to determine medical-knowledge informations. Semantic models aim to learn the correlations between low-level features and textual words to describe malignancy signs in terms of semantic visual descriptors. These descriptors are automatically generated from low-level image features by exploiting the semantic concepts-based clinician medical-knowledge. Reported results over mammography image analysis society (MIAS) database prove the effectiveness of this work and its outperformance relative to compared approaches.
Infographic as a form of presenting information that is designed combining visual with text has become a form that allows the reader easily understand series or complex information. This paper will begin with brief ex...
详细信息
Infographic as a form of presenting information that is designed combining visual with text has become a form that allows the reader easily understand series or complex information. This paper will begin with brief explanation about how the brain processes the information and the history of Infographic that has become part of the visualcommunication design heritage. Then will be discussed how infographic both on printed and digital media convey comprehensive information that related to the human information-processing system. At the end of the article can be obtained a clearer picture on how infographic with its visual approach makes impacts to the reader, in order to understand complex information.
We propose an effective method to measure the capture-to-display delay (CDD) of a visualcommunication application. The method does not require modifications to the existing system, nor require the encoder and decoder...
详细信息
ISBN:
(纸本)9789869000604
We propose an effective method to measure the capture-to-display delay (CDD) of a visualcommunication application. The method does not require modifications to the existing system, nor require the encoder and decoder clocks to be synchronized. Furthermore, we propose a solution to solve the multiple overlapped-timestamp problems due to the response time of the display and the exposure time of the camera. We implemented the method in software to measure the capture-to-display delay of a cellphone video chat application over various types of networks. Experiments confirmed the effectiveness of our proposed methods.
It is well known that speech production and perception process is inherently bimodal consisting of audio and visual components. Recently there has been increased interest in using the visual modality in combination wi...
详细信息
ISBN:
(纸本)9781424417513
It is well known that speech production and perception process is inherently bimodal consisting of audio and visual components. Recently there has been increased interest in using the visual modality in combination with the acoustic modality for improved speech processing. This field of study has gained the title of audio-visual speech processing. Lip movement recognition, also known as lip reading, is a communication skill which involves the interpretation of lip movements in order to estimate some important parameters of the lips that include, but not limited to, size, shape and orientation. In this paper, we represent a hybrid framework for lip reading which is based on both audio and visual speech parameters extracted from a video stream of isolated spoken words. The proposed algorithm is self-tuned in the sense that it starts with an estimations of speech parameters based on visual lip features and then the coefficients of the algorithm are fine-tuned based on the extracted audio parameters. In the audio speech processing part, extracted audio features are used to generate a vector containing information of the speech phonemes. These information are used later to enhance the recognition and matching process. For lip feature extraction, we use a modified version of the method used by F. Huang and T. Chen for tracking of multiple faces. This method is based on statistical color modeling and the deformable template. The experiments based on the proposed framework showed interesting results in recognition of isolated words.
Semantic indexing of sports videos is a subject of great interest to researchers working on multimedia content characterization. Sports programs appeal to large audiences and their efficient distribution over various ...
详细信息
ISBN:
(纸本)0780377508
Semantic indexing of sports videos is a subject of great interest to researchers working on multimedia content characterization. Sports programs appeal to large audiences and their efficient distribution over various networks should contribute to widespread usage of multimedia services. In this paper, we propose a semantic indexing algorithm for soccer programs which uses both audio and visualinformation for content characterization. The video signal is processed first by extracting low-level visual descriptors from the MPEG compressed bit-stream. The temporal evolution of these descriptors during a semantic event is supposed to be governed by a controlled Markov chain. This allows to determine a list of those video segments where a semantic event of interest is likely to be found. based on the maximum likelihood criterion. The audio information is then used to refine the results of the video classification procedure by ranking the candidate video segments in the list so that the segments associated to the event of interest appear vi the very first positions of the ordered list. The proposed method is applied to goal detection. Experimental results show the effectiveness of the proposed cross-modal approach.
Perception processing in the human brain is mediated by functional connectivity among brain areas. To a certain extent, functional connectivity builds a communication channel among brain regions and even supports our ...
详细信息
ISBN:
(数字)9786165904773
ISBN:
(纸本)9786165904773
Perception processing in the human brain is mediated by functional connectivity among brain areas. To a certain extent, functional connectivity builds a communication channel among brain regions and even supports our perception and consciousness functions. However, the flow of information doesn't always go in the same direction between different parts of the brain, and it also uses bidirectional exchanges of information. The role of bidirected functional connectivity in the brain is still not well explored so far. More narrowly, we want to know how bidirectional communication regions in the different brain areas help with informationprocessing. To do this, we used directed information to quantify information exchange from simulation neural signals and real neural data that includes each part of the visual cortex region, and it showed that directed information can discover causal effects in both simulation and real experiments. All in all, in this study, we tried to understand information processes with bidirected functional connectivity and explored both feedforward and feedback information flow in the brain regions. On the one hand, we discovered feedback functional connectivity between visual regions, such as LRSC, LLOC, LOPA, and RPPA. Biologically speaking, this makes sense because natural images, such as objects, places, or more complex images, are typically represented in high-level visual regions. On the other hand, we also found that there is information flow between the scene-selective areas, e.g., OPA, PPA, RSC, and object-selective regions, e.g., LOC. As a result, we can gain a better understanding of how information is shared and communicated from an information-theoretical perspective with directed information.
We present constrained Cramer-Rao bounds for multi-input multi-output (MIMO) channel and source estimation. We find the MIMO Fisher information matrix (FIM) and consider its properties, including the maximum rank of t...
详细信息
ISBN:
(纸本)0780370414
We present constrained Cramer-Rao bounds for multi-input multi-output (MIMO) channel and source estimation. We find the MIMO Fisher information matrix (FIM) and consider its properties, including the maximum rank of the unconstrained FIM, and develop necessary conditions for the FIM to achieve full rank. Equality constraints provide a means to study the potential value of side information, such as training (semi-blind case), constant modulus (CM) sources, or source non-Gaussianity. Nonredundant constraints may be combined in an arbitrary fashion, so that side information may be different for different sources. The bounds are useful for evaluating various MIMO source and channel estimation algorithms. We present an example using the constant modulus blind equalization algorithm.
As deep networks begin to be deployed as autonomous agents, the issue of how they can communicate with each other becomes important. Here, we train two deep nets from scratch to perform large-scale referent identifica...
ISBN:
(纸本)9781713845393
As deep networks begin to be deployed as autonomous agents, the issue of how they can communicate with each other becomes important. Here, we train two deep nets from scratch to perform large-scale referent identification through unsupervised emergent communication. We show that the partially interpretable emergent protocol allows the nets to successfully communicate even about object classes they did not see at training time. The visual representations induced as a by-product of our training regime, moreover, when re-used as generic visual features, show comparable quality to a recent self-supervised learning model. Our results provide concrete evidence of the viability of (interpretable) emergent deep net communication in a more realistic scenario than previously considered, as well as establishing an intriguing link between this field and self-supervised visual learning.(1)
Interactive communication concerns the number of bits that a person must transmit to convey information to another, and how this number of bits can be reduced if the two communicators are allowed to interact.
ISBN:
(纸本)0819422347
Interactive communication concerns the number of bits that a person must transmit to convey information to another, and how this number of bits can be reduced if the two communicators are allowed to interact.
暂无评论