Facial expression animation plays an important role in character animation. The Expression Blendshape Model (EBM) provides a simple representation of various expressions through a linear combination of base blendshape...
详细信息
ISBN:
(数字)9781665495486
ISBN:
(纸本)9781665495486
Facial expression animation plays an important role in character animation. The Expression Blendshape Model (EBM) provides a simple representation of various expressions through a linear combination of base blendshapes with expression coefficients. However, it is challenging to distinguish subtle expression changes. In this paper, we propose a method that combines local features and global features to regress the expression coefficients. Furthermore, local metric leaning (LML) and global metric learning (GML) are proposed to enhance the recognizability of cross-individual expression features. Specifically, the LML increases the feature distance of each blendshape that appears or disappears from the perspective of local representation, resulting in better capture of local appearance changes, while the GML raises feature distance between neutral and emotional expression in the high dimensional feature space from the global perspective. Experimental results and feature visualizations on the FEAFA dataset show the effectiveness of local and global metric learning.
Efficient use of available bandwidth is vital when streaming 360-degree videos as users rarely have enough bandwidth for a pleasant experience. A promising solution is the combination of viewport-dependent streaming u...
详细信息
ISBN:
(数字)9781665495486
ISBN:
(纸本)9781665495486
Efficient use of available bandwidth is vital when streaming 360-degree videos as users rarely have enough bandwidth for a pleasant experience. A promising solution is the combination of viewport-dependent streaming using tiled video and rate adaptation, where the goal is to spend most of the available bandwidth for the viewport tiles. However, head motions resulting in a change in the viewport tiles briefly cause low-quality rendering until the new tiles can be replaced with high-quality versions. Previously, viewport margins -fixed regions around the viewport rendered at a medium quality-were proposed to make the viewport changes less abrupt. Later on, Head-motion-aware Viewport Margins (HMAVM) were implemented to further smooth the transitions at the expense of increased bandwidth consumption. In this paper, we manage the overall bandwidth cost of HMAVMs better by first developing a set of algorithms that trade off the quality of some viewport tiles and then making the margin selection part of the rate-adaptation algorithm.
retrieval in a multimedia database usually involves combining information from different modalities of data, such as text and images. However, all modalities of the data may not be available to form the query. The res...
详细信息
ISBN:
(纸本)9781424414369
retrieval in a multimedia database usually involves combining information from different modalities of data, such as text and images. However, all modalities of the data may not be available to form the query. The results from such a partial query are often less than satisfactory. In this paper, we present an approach to complete a partial query by estimating the missing features in the query. Our experiments with a database of images and their associated captions show that, with an initial text-only query, our completion method has similar performance to a full query with both image and text features. In addition, when we use relevance feedback, our approach outperforms the results obtained using a full query.
The online detection of action start in video data has witnessed an increase in attention from both academia and industry, for abundant use-cases (e.g., an alert mechanism in videos used for surveillance with an abili...
详细信息
ISBN:
(数字)9781665495486
ISBN:
(纸本)9781665495486
The online detection of action start in video data has witnessed an increase in attention from both academia and industry, for abundant use-cases (e.g., an alert mechanism in videos used for surveillance with an ability to automate the recording of key frames and timestamp). Conventional approaches heavily rely on frame-level annotations and other prior knowledge that can only be applied to limited categories. In this paper, we introduce Generic Action Start Detection (GASD): a new task that aims to detect the taxonomy-free action start in an online manner. Furthermore, one novel yet simple design, 3D MLP-mixer based architecture with a multiscaled sampling training strategy, is proposed, which makes the GASD algorithm favorable for edge-device deployment. The GASD task is validated on two large-scale datasets, THUMOS'14 and ActivityNet1.2. Results demonstrate that the proposed architecture achieves the SOTA performance on the GASD task compared with other online action start detection algorithms.
Detecting objects in stationary scenes using fisheye cameras poses challenges due to fluctuations in object sizes and distortions at different image locations, which can degrade the accuracy of existing classifiers. T...
详细信息
ISBN:
(纸本)9798350351439;9798350351422
Detecting objects in stationary scenes using fisheye cameras poses challenges due to fluctuations in object sizes and distortions at different image locations, which can degrade the accuracy of existing classifiers. To address the unique challenges posed by fisheye cameras in stationary object detection scenarios, we introduce FisheyeAdapt, a novel framework that seamlessly integrates tailored post-processing techniques with distortion-aware training strategies, enabling robust and precise object recognition in highly distorted fisheye imagery. We present OmniDet (Scene Context-Aware LEarning for Fisheye), a novel approach that dynamically adjusts confidence thresholds based on object categories and sizes, while leveraging scene contextaware model training. Through extensive experiments, we demonstrate that OmniDet consistently improves the performance of object detection across various fisheye camera-based models, showcasing its wide applicability and effectiveness. Extensive experiments demonstrate the effectiveness of our method.
In this paper, we present a novel 3D model alignment method by analyzing the voxels of 3D meshes and a visual similarity based 3D model matching and retrieving method using active tabu search. Firstly, each 3D model i...
详细信息
In recent years, there has been a significant increase in applications of multimodal signal processing and analysis, largely driven by the increased availability of multimodal datasets and the rapid progress in multim...
详细信息
ISBN:
(纸本)9798350351439;9798350351422
In recent years, there has been a significant increase in applications of multimodal signal processing and analysis, largely driven by the increased availability of multimodal datasets and the rapid progress in multimodal learning systems. Well-known examples include autonomous vehicles, audiovisual generative systems, vision-language systems, and so on. Such systems integrate multiple signal modalities: text, speech, images, video, LiDAR, etc., to perform various tasks. A key issue for understanding such systems is the relationship between various modalities and how it impacts task performance. In this paper, we employ the concept of mutual information (MI) to gain insight into this issue. Taking advantage of the recent progress in entropy modeling and estimation, we develop a system called InfoMeter to estimate MI between modalities in a multimodal learning system. We then apply InfoMeter to analyze a multimodal 3D object detection system over a large-scale dataset for autonomous driving. Our experiments on this system suggest that a lower MI between modalities is beneficial for detection accuracy. This new insight may facilitate improvements in the development of future multimodal learning systems.
A semantic-relation-based retrieval pattern which cooperates with the tree-based retrieval using multimediaprocessing technique is discussed. It is designed for the Videotex Service on metropolitan area computer netw...
详细信息
ISBN:
(纸本)0879426322
A semantic-relation-based retrieval pattern which cooperates with the tree-based retrieval using multimediaprocessing technique is discussed. It is designed for the Videotex Service on metropolitan area computer networks of Nanjing. The information structure, mathematical model, features, and relevant algorithms are given.
The rapid proliferation of short videos on social media platforms has led to a significant increase in derivative content, often repurposing the same video segments. In response to this phenomenon, our study specifica...
详细信息
ISBN:
(纸本)9798350351439;9798350351422
The rapid proliferation of short videos on social media platforms has led to a significant increase in derivative content, often repurposing the same video segments. In response to this phenomenon, our study specifically addresses short video formats, developing an innovative methodology to swiftly identify content that is either similar or identical. Recognizing the inefficiency inherent in comparing the entirety of visual content, our approach initially involves the dimensionality reduction and categorization of short videos through textual data. For the analysis of visual data, we implemented a two-fold strategy: initially removing the template and segmenting videos in the preprocessing phase, followed by the application of two distinct analytical models, ViSiL and CLIP. These models are employed to detect reused materials across a variety of contexts. Our empirical findings revealed multiple instances of footage repetition. This research contributes to two main aspects: (a) analyzing the effectiveness of additional video information for clustering, and (b) developing a more efficient method for detecting identical source materials.
In this paper we describe a system of retrieving information from artwork based on textual cues, descriptive to relative art pieces, made available through the metadata itself. Large datasets of artwork can easily be ...
详细信息
ISBN:
(纸本)9781538618578
In this paper we describe a system of retrieving information from artwork based on textual cues, descriptive to relative art pieces, made available through the metadata itself. Large datasets of artwork can easily be mined by using alternative queries and search methodologies. In the most common search methodology a text-based query using a keyboard is performed. We are proposing a method for searching, finding and recommending digital media content based on pre-set metadata text queries organized in two categories, then mapped to speech sentiment cues extracted from the emotion layer of speech alone. We also account for the difference in sentiment expression for male and female speakers and further suggest that this differentiation may improve system performance.
暂无评论