The content of visual and audio scenes is multi-faceted such that a video stream can be paired with various audio streams and vice-versa. Thereby, in video-to-audio generation task, it is imperative to introduce steer...
ISBN:
(纸本)9798331314385
The content of visual and audio scenes is multi-faceted such that a video stream can be paired with various audio streams and vice-versa. Thereby, in video-to-audio generation task, it is imperative to introduce steering approaches for controlling the generated audio. While Video-to-Audio generation is a well-established generative task, existing methods lack such controllability. In this work, we propose VATT, a multi-modal generative framework that takes a video and an optional text prompt as input, and generates audio and optional textual description (caption) of the audio. Such a framework has two unique advantages: i) Video-to-Audio generation process can be refined and controlled via text which complements the context of the visual information, and ii) The model can suggest what audio to generate for the video by generating audio captions. VATT consists of two key modules: VATT Converter, which is an LLM that has been fine-tuned for instructions and includes a projection layer that maps video features to the LLM vector space, and VATT Audio, a bi-directional transformer that generates audio tokens from visual frames and from optional text prompt using iterative parallel decoding. The audio tokens and the text prompt are used by a pretrained neural codec to convert them into a waveform. Our experiments show that when VATT is compared to existing video-to-audio generation methods in objective metrics, such as VGGSound audiovisual dataset, it achieves competitive performance when the audio caption is not provided. When the audio caption is provided as a prompt, VATT achieves even more refined performance (with lowest KLD score of 1.41). Furthermore, subjective studies asking participants to choose the most compatible generated audio for a given silent video, show that VATT Audio has been chosen on average as a preferred generated audio than the audio generated by existing methods. VATT enables controllable video-to-audio generation through text as well as suggest
The increase in the use of autonomous vehicles has motivated a paradigm shift in the transportation domain as it redefines the boundaries of urban mobility by augmenting safety measures. This research paper explores a...
详细信息
The rise of autonomous vehicles has generated substantial interest due to their potential to transform transportation, enhance safety, and reduce traffic congestion in urban areas. This research investigates strategie...
详细信息
As of recent, the majority of developing countries have faced traffic congestion issues. As a result, vehicular communication technologies such as DSRC and Cellular-V2X are standardized by the 3GPP to support the tran...
详细信息
We study a temporal step size control of explicit Runge-Kutta(RK)methods for com-pressible computational fluid dynamics(CFD),including the Navier-Stokes equations and hyperbolic systems of conservation laws such as th...
详细信息
We study a temporal step size control of explicit Runge-Kutta(RK)methods for com-pressible computational fluid dynamics(CFD),including the Navier-Stokes equations and hyperbolic systems of conservation laws such as the Euler *** demonstrate that error-based approaches are convenient in a wide range of applications and compare them to more classical step size control based on a Courant-Friedrichs-Lewy(CFL)*** numerical examples show that the error-based step size control is easy to use,robust,and efficient,e.g.,for(initial)transient periods,complex geometries,nonlinear shock captur-ing approaches,and schemes that use nonlinear entropy *** demonstrate these properties for problems ranging from well-understood academic test cases to industrially relevant large-scale computations with two disjoint code bases,the open source Julia pack-ages *** with *** and the C/Fortran code SSDC based on PETSc.
How well can we approximate a quantum channel output state using a random codebook with a certain size? In this work, we study the quantum soft covering problem, which uses a pairwise-independent random codebook to ap...
详细信息
A system’s fault tolerance is its capacity to function even if one or more of its components fail. Implementing a fault-tolerant network becomes an important criterion for reliable computing. Reliability measures pla...
详细信息
The current resource allocation in 5G vehicular networks for mobile cloud communication faces several challenges,such as low user utilization,unbalanced resource allocation,and extended adaptive allocation *** propose...
详细信息
The current resource allocation in 5G vehicular networks for mobile cloud communication faces several challenges,such as low user utilization,unbalanced resource allocation,and extended adaptive allocation *** propose an adaptive allocation algorithm for mobile cloud communication resources in 5G vehicular networks to address these *** study analyzes the components of the 5G vehicular network architecture to determine the performance of different *** is ascertained that the communication modes in 5G vehicular networks for mobile cloud communication include in-band and out-of-band ***,this study analyzes the single-hop and multi-hop modes in mobile cloud communication and calculates the resource transmission rate and bandwidth in different communication *** study also determines the scenario of one-way and two-way vehicle lane cloud communication network connectivity,calculates the probability of vehicle network connectivity under different mobile cloud communication radii,and determines the amount of cloud communication resources required by vehicles in different lane *** on the communication status of users in 5G vehicular networks,this study calculates the bandwidth and transmission rate of the allocated channels using Shannon’s *** determines the adaptive allocation of cloud communication resources,introduces an objective function to obtain the optimal solution after allocation,and completes the adaptive allocation *** experimental results demonstrate that,with the application of the proposed method,the maximum utilization of user communication resources reaches approximately 99%.The balance coefficient curve approaches 1,and the allocation time remains under 2 *** indicates that the proposed method has higher adaptive allocation efficiency.
Most near-field (NF) localization algorithms cannot deal with the underdetermined case, while those which can are computationally expensive due to employment of fourth-order cumulants. In this work, a low-complexity s...
详细信息
Nowadays, the speed of solving optimization problems by increasing various issues and the number of variables is critical. The Harris Hawk optimization method is a brand-new, intelligent system that resolves optimizat...
详细信息
暂无评论