The robustness against geometrical attacks remains one of the most challenging issues in watermarking of images and video. This paper presents several improvements of the video watermarking approach presented in Haits...
详细信息
The robustness against geometrical attacks remains one of the most challenging issues in watermarking of images and video. This paper presents several improvements of the video watermarking approach presented in Haitsma et al. (2001), namely (i) using a temporally low-pass watermark and (ii) synchronization to resist attacks along the temporal axis. In order to improve the watermark detection performance, we propose to use an amplitude-limiting filter and a whitening filter during the watermark extraction process. Experimental results show that the proposed techniques achieve good performance.
We describe integrated multimedia processing for Video Scout, a system that segments and indexes TV programs according to their audio, visual, and transcript information. Video Scout represents a future direction for ...
详细信息
We describe integrated multimedia processing for Video Scout, a system that segments and indexes TV programs according to their audio, visual, and transcript information. Video Scout represents a future direction for personal video recorders. In addition to using electronic program guide metadata and a user profile, Scout allows the users to request specific topics within a program. For example, users can request the video clip of the USA president speaking from a half-hour news program. Video Scout has three modules: (i) video pre-processing, (ii) segmentation and indexing, and (iii) storage and user interface. Segmentation and indexing, the core of the system, incorporates a Bayesian framework that integrates information from the audio, visual, and transcript (closed captions) domains. This framework uses three layers to process low, mid, and high-level multimedia information. The high-level layer generates semantic information about TV program topics. This paper describes the elements of the system and presents results from running Video Scout on real TV programs.
For MPEG-ii and other hybrid MC/DPCM/DCT based video coding standards, it is very important to reconstruct the predicted frames based on the block motion information. In case of transmission over unreliable channels, ...
详细信息
ISBN:
(纸本)0819437034
For MPEG-ii and other hybrid MC/DPCM/DCT based video coding standards, it is very important to reconstruct the predicted frames based on the block motion information. In case of transmission over unreliable channels, error concealment methods are introduced to recover the lost or erroneous motion vectors. In this paper, a novel side motion estimation method is proposed to recover the lost motion vectors by selecting from a candidate motion vector set. The outer boundary of the lost block is used to perform motion estimation and the recovered motion vector is the one that minimises the squared error of the block boundary pixels between two consecutive frames. The method takes advantage of the same motion direction of most blocks and their boundaries. It releases the boundary pixel gray level continuity assumption of traditional boundary match/side match approaches so that better estimation result can be achieved. Overlapped block motion compensation is also incorporated in the proposed method to reduce the blocking artefacts. By reducing the number of motion vectors in the candidate set, the performance of the proposed algorithm can be further improved.
Digital images have become an important source of information in the modern world of communication systems. In their raw form, digital images require a tremendous amount of memory. Many research efforts have been devo...
详细信息
Digital images have become an important source of information in the modern world of communication systems. In their raw form, digital images require a tremendous amount of memory. Many research efforts have been devoted to the problem of image compression in the last two decades. Two different compression categories must be distinguished: lossless and lossy. Lossless compression is achieved if no distortion is introduced in the coded image. Applications requiring this type of compression include medical imaging and satellite photography. For applications such as video telephony ol multimedia applications, some loss of information is usually tolerated in exchange for a high compression ratio. In this two-part paper, the major building blocks of image coding schemes are overviewed. Part I covets still image coding, and Parr ii covers motion picture sequences. In this first part, still image coding schemes have been classified into predictive, block transform, and multiresolution approaches. Predictive methods are suited to lossless and low-compression applications. Transform-based coding schemes achieve higher compression ratios for lossy compression but suffer from blocking artifacts at high-compression ratios. Multiresolution approaches are suited for lossy as well for lossless compression. At lossy high-compression ratios, the typical artifact visible in the reconstructed images is the ringing effect. New applications in a multimedia environment drove the need for new functionalities of the image coding schemes. For that purpose, second-generation coding techniques segment the image into semantically meaningful parts. Therefore, parts of these methods have been adapted to work for arbitrarily shaped regions. In ol-der to add another functionality, such as progressive transmission of the information, specific quantization algorithms must be defined. A final step in the compression scheme is achieved by the codeword assignment. Finally, coding results ale presented
In this research, we analyzed conversations between a pair of subjects, under two conditions. One is face to face conversation that has a visual contact, and the other is conversation through telephone line that has n...
详细信息
ISBN:
(纸本)0818679204
In this research, we analyzed conversations between a pair of subjects, under two conditions. One is face to face conversation that has a visual contact, and the other is conversation through telephone line that has not. From the recorded videotape we extracted the subject's actions especially focusing on the head movements. By comparing the dialogues under two conditions, it seems that there are two types of head movements, one is intended to give a response to his partner and the other is to send some signal. We are going to analyze how visualinformation contributes in spoken dialogue perceptions, and possibility of adopting it in a multi-modal human interface.
In [6], we have introduced Complexity Distortion Theory, a mathematical framework characterizing the design of programmable communication systems. In this paper, we show how Complexity Distortion Theory fits in the MP...
详细信息
ISBN:
(纸本)0818681837
In [6], we have introduced Complexity Distortion Theory, a mathematical framework characterizing the design of programmable communication systems. In this paper, we show how Complexity Distortion Theory fits in the MPEG-4 context and more generally in any system allowing programmability, by formalizing the concept of programmable decoders. We also show how it can be used to design intelligent encoders at two flexibility levels: the first one corresponding to the case where flexibility in the algorithm selection is allowed and the second where downloadability of new tools for representation is also allowed.
From the early days of multimedia communication development, ITU-T has paid much attention to establish standard systems and to consummate them. This is necessary in the process of turning into the world of Informatio...
详细信息
ISBN:
(纸本)7505338900
From the early days of multimedia communication development, ITU-T has paid much attention to establish standard systems and to consummate them. This is necessary in the process of turning into the world of information Highway. In this paper, the authors conclude systematically the ITU-T Recommendation H series and T series on Audio-graphic and audio-visual multimedia services according to the research works on multimedia communication which our research lab has been doing these years as well as our experience of participating the research and discussion works of ITU-T standard files on image and multimedia communication which was held by the telecommunication department in order to establish our country's own corresponding standards.
This paper involves data structures in planning to combine engineering research areas considered as communication modes: image, outline-sketches, and speech. Images are enhanced compressed and transmitted, but in grap...
详细信息
ISBN:
(纸本)0819417572
This paper involves data structures in planning to combine engineering research areas considered as communication modes: image, outline-sketches, and speech. Images are enhanced compressed and transmitted, but in graphics solid display is central, while in speech recognition/identification dominate. Outside computing, graphics uses sketch, outline-drawing, or schematic summaries of other data (photographic images). Practical image-processing involves comparisons, features/edges, shape, and segmentation, using both transforms and other global analyses. Most speech work involves domain restriction. This limit can be deleted by focussing on data structures: they can link word and picture domains, and allow for captioning, for indexing/highlighting-domains to users. This shows data structures enable implementing useful functions, support information-handling with synergistic benefits: the paper's theme. Data structuring is also the theme of recent research literature on alternate means for visual presentation of multiple-measure numerical data. This paper briefly surveys these materials. We show how research from the data structure field enables new methods for addressing visualization issues, improves large-record data-handling, and aids greater use of visual and numerical records. (This expands on a talk presented 8 July 1994 at Argonne National Laboratory.)
Multiple resource theory proposes that attention is a process of resource allocation. These resources may be shifted among different modalities and information-processing tasks. This study investigated whether selecti...
Multiple resource theory proposes that attention is a process of resource allocation. These resources may be shifted among different modalities and information-processing tasks. This study investigated whether selective attention to a particular television modality results in different levels of attention to the visual and auditory modalities. Two independent variables manipulated selective attention-the modality with the most information (audio or video) and viewers' instructed focus (audio or video). These variables were fully crossed in a within-subjects experimental design. Attention levels were investigated by measuring reaction times to cues in each modality (audio tones and color flashes). All five manipulation checks suggest that subjects were able to focus on a particular message channel. Reactions to cues were faster, however, when the audio channel contained the most information and when viewers focused on the audio channel. These results suggest a common pool of limited attentional resources and therefore bimodal attention.
A wide variety of image interchange and communication (de facto) standards are employed today in different systems, applications and environments. Within the AMICS project a framework, called the Image communication O...
详细信息
暂无评论