In this paper, we address the problem of automatic discovery of speech patterns using audio-visualinformation fusion. Unlike those previous studies based on single audio modality, our work not only uses the acoustic ...
详细信息
ISBN:
(纸本)9781622767595
In this paper, we address the problem of automatic discovery of speech patterns using audio-visualinformation fusion. Unlike those previous studies based on single audio modality, our work not only uses the acoustic information, but also takes into account the visual features extracted from the mouth region. To improve the effectiveness of the use of multimodal information, several audio-visual fusion strategies, including feature concatenation, similarity weighting and decision fusion, are utilized. Specifically, our decision fusion approach retains the reliable patterns discovered in the audio and visual modalities. Moreover, we use canonical correlation analysis (CCA) to address the issue of temporal asynchrony between audio and visual speech modalities and unbounded dynamic time warping (UDTW) is adopted to search for the speech patterns through audio and visual similarity matrices calculated on the aligned audio and visual sequence. Experiments on an audio-visual corpus show that, for the first time, speech pattern discovery can be improved by the use of visualinformation. The decision fusion approach shows superior performance compared with standard feature concatenation and similarity weighting. CCA-based audio-visual synchronization plays an important role in the performance improvement.
Computer graphic art design and visualcommunication design are two closely related fields. Computer graphic art design is a process of using computer technology to create visual effects, while visualcommunication de...
详细信息
This work deals with the problem of high computation complexity in image registration. A hierarchical multiresolution strategy is utilized to speed up the processing of SIFT by starting on a low resolution octave. The...
详细信息
ISBN:
(纸本)9781628415001
This work deals with the problem of high computation complexity in image registration. A hierarchical multiresolution strategy is utilized to speed up the processing of SIFT by starting on a low resolution octave. The initial affine transformation model will be achieved. In subsequent multiresolution octaves, we apply the transformation affine model getting from upper octave to current octave, then, combined with geometrical distribution of matched keypoints to further remove incorrect mappings and update affine transformation model. The strategy ends with the best affine transformation model on the bottom octave(full-size image). Experimental results show that the proposed method can achieve comparative accuracy with less computational than original SIFT.
Nowadays most of people of developed economy countries interact with software every day. As a result of computer systems expansion to all scope of people's activity the problem of transition from visual and comman...
详细信息
ISBN:
(纸本)9781467368551
Nowadays most of people of developed economy countries interact with software every day. As a result of computer systems expansion to all scope of people's activity the problem of transition from visual and command interfaces to natural language user interfaces is thrown into the sharp relief. Computational linguistics and natural language processing methods are described in this article. Methods of natural language manipulation are applied in machine translation software systems, systems of search and exchange the data, text annotation and expert systems. Prototype of natural language user interface to structured data source is developed. As a result it is convert natural language user's query to SQL query to database. Natural language user interface is created to predefined subject field. User interface interacts to database that contain information about existent program libraries and frameworks. Consequently, using natural language processing methods it is possible to develop user natural language user interface providing capability to interact with machine.
RaptorQ is the most advanced raptor code and has an overhead-failure curve close to the random fountain code over the GF(256) finite field. Theoretically, it is possible to encode and decode with linear time complexit...
详细信息
ISBN:
(纸本)9781509040322
RaptorQ is the most advanced raptor code and has an overhead-failure curve close to the random fountain code over the GF(256) finite field. Theoretically, it is possible to encode and decode with linear time complexity by an inactivation decoding algorithm, which is a hybrid algorithm of belief propagation and Gaussian elimination. However, achieving linear time complexity in a real-world implementation is a challenging task. In this paper, we provide an algorithm and data structure to implement inactivation decoding with near-linear time complexity using graphic processing unit.
To solve the difficulties, such as high module coupling, limited data processing capability and single evaluation function, of flight test mission evaluation system based on monolithic architecture, a distributed syst...
详细信息
ISBN:
(纸本)9798400708305
To solve the difficulties, such as high module coupling, limited data processing capability and single evaluation function, of flight test mission evaluation system based on monolithic architecture, a distributed system is designed based on microservice. The system takes data processing as the core, adopts container to decouple processes and orchestrate service modules, and calls service modules remotely through distributed communication framework, which can enhance dynamic extension and data processing capability. Moreover, the system uses plugin framework to design the interactive terminal for visual and intelligent analysis, and reappears the entire flight test immersively, enriching the sensory experience and optimizing the evaluation's quality. Currently, the system has been effectively implemented in the flight test of various aircraft types, thereby playing a crucial role in reducing test cycles, ensuring safety, and enhancing efficiency.
Low-delay and error-resilience video coding are critical for real-time video communication over wireless networks. Intra-refresh coding, which embeds intra-coded regions into inter frames can achieve a relatively smoo...
详细信息
ISBN:
(纸本)9786163618238
Low-delay and error-resilience video coding are critical for real-time video communication over wireless networks. Intra-refresh coding, which embeds intra-coded regions into inter frames can achieve a relatively smooth bit-rate and terminate the error propagation caused by the transmission loss. In this paper, we proposed a novel linear model for the intra-refresh cycle size selection adapting to the network packet loss rate and the motions in the video content. Experimental results show that this linear model works efficiently. The modelled cycle size can achieve almost the same quality as the optimal cycle size under different packet loss rates.
"Multimedia is a book that can never be read" [1]. Users can reorganize information according to their purpose and cognitive characteristics, add, delete or modify nodes, and re-establish links. Multimedia t...
详细信息
ISBN:
(纸本)9781728185897
"Multimedia is a book that can never be read" [1]. Users can reorganize information according to their purpose and cognitive characteristics, add, delete or modify nodes, and re-establish links. Multimedia technology integrates mature image processing, sound processing, video processing, three-dimensional animation technology, and other processing technologies, making the traditional computer single character and plane graphics processing more abundant, and the human-machine interface is more unified and coordinated [2]. With the development of the new era, multimedia has gradually entered people's lives, especially in education and digital entertainment. In the actual application process, it provides people with a wealth of technical experience, including images, sounds, videos, animations and other audio-visualinformation, as well as an excellent human-computer interaction operating system. This article briefly analyzes and discusses the application form of multimedia technology in museum exhibitions, and provides guidelines for promoting museum exhibition work.
In order to improve the design and analysis capabilities of industrial product packaging colours while constructing an industrial product packaging colour bionic feature extraction system under visualcommunication, t...
详细信息
In image processing, mosaic images are images made by cementing together small tiles. The tiles "tessellate" a source image with the purpose of reproducing the original visualinformation rendered into a new...
详细信息
ISBN:
(纸本)9781467359658;9780769549415
In image processing, mosaic images are images made by cementing together small tiles. The tiles "tessellate" a source image with the purpose of reproducing the original visualinformation rendered into a new mosaic-like style. Creation of mosaic images from a sequence of partial views is a powerful means of obtaining a larger view of a scene than available within a single view, and it has been used in wide range of applications. A general framework for retinal and document images is proposed in this paper. This paper also discusses a review on different applications of image mosaicing mainly in the area of retinal image mosaicing and document image mosaicing.
暂无评论