This study introduces an efficient and effective method, MeDM, that utilizes pre-trained image Diffusion Models for video-to-video translation with consistent temporal flow. The proposed framework can render videos fr...
详细信息
ISBN:
(纸本)1577358872
This study introduces an efficient and effective method, MeDM, that utilizes pre-trained image Diffusion Models for video-to-video translation with consistent temporal flow. The proposed framework can render videos from scene position information, such as a normal G-buffer, or perform text-guided editing on videos captured in real-world scenarios. We employ explicit optical flows to construct a practical coding that enforces physical constraints on generated frames and mediates independent frame-wise scores. By leveraging this coding, maintaining temporal consistency in the generated videos can be framed as an optimization problem with a closed-form solution. To ensure compatibility with Stable Diffusion, we also suggest a workaround for modifying observation-space scores in latent Diffusion Models. Notably, MeDM does not require fine-tuning or test-time optimization of the Diffusion Models. Through extensive qualitative, quantitative, and subjective experiments on various benchmarks, the study demonstrates the effectiveness and superiority of the proposed approach. Our project page can be found at https://***/.
In order to meet the real-time detection and processing requirements of on-board targets in the field of remote sensing imageprocessing, this paper carries out relevant research from the perspective of software optim...
详细信息
Existing image inpainting methods have shown impressive completion results for low-resolution images. However, most of these algorithms fail at high resolutions and require powerful hardware, limiting their deployment...
详细信息
Traditional ergonomic evaluations often overlook the dynamic and uncertain nature of human movements, leading to potential musculoskeletal disorders (MSDs) and impacting worker health, efficiency, and company costs. D...
详细信息
ISBN:
(纸本)9783031671913;9783031671920
Traditional ergonomic evaluations often overlook the dynamic and uncertain nature of human movements, leading to potential musculoskeletal disorders (MSDs) and impacting worker health, efficiency, and company costs. Disassembly cells, crucial for sustainability and circular economy efforts, pose unique challenges and opportunities for ergonomic optimization. This study introduces an innovative approach for ergonomic risk assessment in the manufacturing industry, particularly within disassembly cells, by integrating real-timevideoprocessing and fuzzy logic. Our research fills a significant gap in ergonomic assessment by utilizing a multi-camera computer vision technique to capture and analyze worker motions in real-time, allowing for dynamic ergonomic risks assessment in a disassembly cell. The fuzzy logic inference enhances the system's ability to handle the variability and subjectivity of human posture, offering a more nuanced and accurate risk assessment than binary logic systems. Experimental validation in a laboratory setting confirms the feasibility of our approach, demonstrating its potential to improve worker safety and productivity by providing a more responsive and adaptable tool for ergonomic assessment in industrial environments. This work marks a significant advancement in the field, suggesting a path forward for the development of ergonomic interventions that are both more effective and applicable in diverse manufacturing settings.
video super-resolution is the task of converting low-resolution video to high-resolution video. Existing methods with better intuitive effects are mainly based on convolutional neural networks (CNNs), but the architec...
详细信息
ISBN:
(纸本)9781510666313;9781510666320
video super-resolution is the task of converting low-resolution video to high-resolution video. Existing methods with better intuitive effects are mainly based on convolutional neural networks (CNNs), but the architecture is heavy, resulting in a slow inference structure. Aiming at this problem, this paper proposes a real-timevideo super-resolution Transformer (RVSRT) can quickly complete the super-resolution task while considering the visual fluency of video frame switching. Unlike traditional methods based on CNNs, this paper does not process video frames separately with different network modules in the temporal domain, but batches adjacent frames through a single UNet-style structure end-to-end Transformer network architecture. Moreover, this paper creatively sets up two-stage interpolation sampling before and after the end-to-end network to maximize the performance of the traditional CV algorithm. The experimental results show that compared with SOTA TMNet [1], RVSRT has only 20% of the network size (2.3M vs 12.3M, parameters) while ensuring comparable performance, and the speed is increased by 80% (26.2 fps vs 14.3 fps, frame size is 720*576).
Current cloud-based multi-party video conferencing suffers from heavy workloads on media servers caused by video transcoding. Emerging edge computing can assist in offloading transcoding tasks to edge nodes. However, ...
详细信息
ISBN:
(纸本)9789819708338;9789819708345
Current cloud-based multi-party video conferencing suffers from heavy workloads on media servers caused by video transcoding. Emerging edge computing can assist in offloading transcoding tasks to edge nodes. However, the resource-limited nature of edge nodes poses new challenges. First, edge nodes can real-timely transcode a video into only a subset of representations, raising the video transcoding problem of what is the set of representations each participant should transcode its video stream into. Second, since participants' downlink resources are limited, one needs to solve the representation selection problem of what representation each participant should select for receiving another participant's video. Third, the above two problems are coupled and should be optimized simultaneously. Hence, this paper studies the joint video transcoding and representation selection problem for edge-assisted multiparty video conferencing, with the aim of maximizing the overall QoE under the resource and real-timevideo transcoding constraints. Such a problem is formulated as a non-linear integer program and is NP-hard. To solve it, we leverage the submodular optimization technique and propose a (1- 1/e) -approximate algorithm with the polynomial computation complexity. Finally, extensive trace-driven simulations are conducted to evaluate the proposed algorithm. The results show that it outperforms the alternatives by 1.5-2.5x on average in terms of overall QoE.
This paper presents the latest Ethernet standardization of in-vehicle network and the future trends of automotive ethernet technology. The proposed system provides a design and optimization algorithm of in-vehicle net...
详细信息
This paper presents the latest Ethernet standardization of in-vehicle network and the future trends of automotive ethernet technology. The proposed system provides a design and optimization algorithm of in-vehicle networking technologies related Ethernet Audio video Bridge (AVB) technology. We present a design of in-vehicle network system as well as the optimization of AVB for automotive. A proposal of Reduced Latency of Machin to Machine (RLMM) plays a significant role in reducing the latency between devices. The approach of RLMM on realistic test cases indicated that there was a latency reduction about 30.41% It is expected that the optimized settings for the actual automotive network environment can greatly shorten the time period in the development and design process. The results achieved from the experiments on the latency present in each function are trustworthy since average values are obtained via repeated tests for several months. It would considerably benefit the industry because analyzing the delay between each function in a short period of time is tremendously significant. In addition, through the proposed real-time camera and video streaming via optimized settings of AVB system, it is expected that AI (Artificial Intelligence) algorithms in autonomous driving will be of great help in understanding and analyzing images in realtime.
In the cardiac operating room, several operators are essential to assist the surgeon, including the physician managing and monitoring the artificial heart-lung machine. The custodian must interpret the patient's v...
详细信息
With the increasing demand for Cloud-based video Surveillance as a Service (VSaaS), the efficient processing of vast amounts of video data poses significant challenges. The framework leverages Fog computing at the net...
详细信息
暂无评论