We present DreamPose, a diffusion-based method for generating animated fashion videos from still images. Given an image and a sequence of human body poses, our method synthesizes a video containing both human and fabr...
ISBN:
(纸本)9798350307184
We present DreamPose, a diffusion-based method for generating animated fashion videos from still images. Given an image and a sequence of human body poses, our method synthesizes a video containing both human and fabric motion. To achieve this, we transform a pretrained text-to-image model (Stable Diffusion [16]) into a pose-and-image guided video synthesis model, using a novel finetuning strategy, a set of architectural changes to support the added conditioning signals, and techniques to encourage temporal consistency. We fine-tune on a collection of fashion videos from the UBC Fashion dataset [50]. We evaluate our method on a variety of clothing styles and poses, and demonstrate that our method produces state-of-the-art results on fashion video animation. Video results are available on our project page: https://***/projects/dreampose
With recent advances in computing hardware and surges of deep-learning architectures, learning-based deep image registration methods have surpassed their traditional counterparts, in terms of metric performance and in...
ISBN:
(纸本)9798350307184
With recent advances in computing hardware and surges of deep-learning architectures, learning-based deep image registration methods have surpassed their traditional counterparts, in terms of metric performance and inference time. However, these methods focus on improving performance measurements such as Dice, resulting in less attention given to model behaviors that are equally desirable for registrations, especially for medical imaging. This paper investigates these behaviors for popular learning-based deep registrations under a sanity-checking microscope. We find that most existing registrations suffer from low inverse consistency and nondiscrimination of identical pairs due to overly optimized image similarities. To rectify these behaviors, we propose a novel regularization-based sanity-enforcer method that imposes two sanity checks on the deep model to reduce its inverse consistency errors and increase its discriminative power simultaneously. Moreover, we derive a set of theoretical guarantees for our sanity-checked image registration method, with experimental results supporting our theoretical findings and their effectiveness in increasing the sanity of models without sacrificing any performance.
Modern neural encoders offer unprecedented text-image retrieval (TIR) accuracy, but their high computational cost impedes an adoption to large-scale image searches. To lower this cost, model cascades use an expensive ...
详细信息
ISBN:
(纸本)9798350307443
Modern neural encoders offer unprecedented text-image retrieval (TIR) accuracy, but their high computational cost impedes an adoption to large-scale image searches. To lower this cost, model cascades use an expensive encoder to refine the ranking of a cheap encoder. However, existing cascading algorithms focus on cross-encoders, which jointly process text-image pairs, but do not consider cascades of bi-encoders, which separately process texts and images. We introduce the small-world search scenario as a realistic setting where bi-encoder cascades can reduce costs. We then propose a cascading algorithm that leverages the small-world search scenario to reduce lifetime image encoding costs of a TIR system. Our experiments show cost reductions by up to 6x.
The accessibility of public spaces for visually impaired individuals is a major concern. One of the challenges faced by blind people in public spaces is the difficulty in locating vacant seats. The presented paper dis...
详细信息
Rice is susceptible to mold and mildew during storage. Metabolites such as aflatoxin produced during mildew will do great harm to consumers. To meet the need for rapid detection of normal rice adulterated with moldy r...
详细信息
The task of image dehazing, which involves the extraction of atmospheric haze from an image, presents a significant challenge due to the necessity of haze removal without detriment to the inherent quality of the image...
详细信息
Among various technical approaches in machine vision coding, image Coding for Machine (ICM) stands out for its capability to simultaneously fulfill both human perception and machine vision needs. However, it is often ...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Among various technical approaches in machine vision coding, image Coding for Machine (ICM) stands out for its capability to simultaneously fulfill both human perception and machine vision needs. However, it is often criticized for its lack of efficiency regarding rate-analytics performance. In this paper, we propose an Appearance Redundancy Reduction (ARR) module, designed to function as a plug-in for existing ICM frameworks, aiming to further enhance the coding efficiency regarding rate-analytics without any changes to the ICM itself. To be specific, our work pays additional attention to the intrinsic correlation between the low-level image structure and high-level vision analytics, and subsequently proposes a novel colour quantization mechanism to squeeze out the analytics-free redundant appearance information. Moreover, a differentiable soften quantization operation is derived to enable end-to-end training within the ICM framework. Extensive experimental results have shown that integrating the proposed ARR module yields substantial improvements regarding rate-analytic performance, even surpassing the performance of the feature coding paradigm, while maintaining the generalizability across different tasks and acceptable perceptual representation.
image translation is an important and challenging area of computervision. It aims to design models that translate source-domain images to target-domain images with applications such as data enhancement, style migrati...
详细信息
Cheapfakes can compromise the integrity of information and erode trust in multimedia content, making their detection critical. Identifying Out of Context misuse of media is essential to prevent the spread of misinform...
详细信息
Now computer science methods are actively used in different areas of Industry 4.0., including in the postal service to automate the transportation of postal items. The problem of transportation is particularly acute f...
详细信息
暂无评论