Face reenactment methods attempt to restore and re-animate portrait videos as realistically as possible. Existing methods face a dilemma in quality versus controllability: 2D GAN-based methods achieve higher image qua...
详细信息
ISBN:
(纸本)9798400701597
Face reenactment methods attempt to restore and re-animate portrait videos as realistically as possible. Existing methods face a dilemma in quality versus controllability: 2D GAN-based methods achieve higher image quality but suffer in fine-grained control of facial attributes compared with 3D counterparts. In this work, we propose StyleAvatar, a real-time photo-realistic portrait avatar reconstruction method using StyleGAN-based networks, which can generate high-fidelity portrait avatars with faithful expression control. We expand the capabilities of StyleGAN by introducing a compositional representation and a sliding window augmentation method, which enable faster convergence and improve translation generalization. Specifically, we divide the portrait scenes into three parts for adaptive adjustments: facial region, non-facial foreground region, and the background. Besides, our network leverages the best of UNet, StyleGAN and time coding for video learning, which enables high-quality video generation. Furthermore, a sliding window augmentation method together with a pre-training strategy are proposed to improve translation generalization and training performance, respectively. The proposed network can converge within two hours while ensuring high image quality and a forward rendering time of only 20 milliseconds. Furthermore, we propose a real-time live system, which further pushes research into applications. Results and experiments demonstrate the superiority of our method in terms of image quality, full portrait video generation, and real-time re-animation compared to existing facial reenactment methods. Training and inference code for this paper are at https://***/LizhenWangT/StyleAvatar.
In recent years, many surveillance cameras have been installed in cities, and human tracking technology has received much attention. In most current human-tracking technologies, servers collect images of people and th...
详细信息
ISBN:
(纸本)9798350304572
In recent years, many surveillance cameras have been installed in cities, and human tracking technology has received much attention. In most current human-tracking technologies, servers collect images of people and then analyze their features from the data. In this method, the network loads on the servers increase as the number of people tracked increases, causing problems, such as packet loss and loss of real-time performance. In this paper, we propose two real-time human tracking methods. The methods conduct a human tracking process without servers by sharing extracted human features among devices. Experimental evaluations of the amount of communication traffic and processingtime using multiple cameras have shown that the two proposed methods can distribute the network load with slight deterioration in processing speed and tracking accuracy.
Versatile video Coding (VVC) allows for large compression efficiency gains over its predecessor, High Efficiency video Coding (HEVC). The added efficiency comes at the cost of increased runtime complexity, especially ...
详细信息
ISBN:
(纸本)9781728198354
Versatile video Coding (VVC) allows for large compression efficiency gains over its predecessor, High Efficiency video Coding (HEVC). The added efficiency comes at the cost of increased runtime complexity, especially for encoding. It is thus highly relevant to explore all available runtime reduction options. This paper proposes a novel first pass for two-pass rate control in all-intra configuration, using low-complexity video analysis and a Random Forest (RF)-based machine learning model to derive the data required for driving the second pass. The proposed method is validated using VVenC, an open and optimized VVC encoder. Compared to the default two-pass rate control algorithm in VVenC, the proposed method achieves around 32% reduction in encoding time for the preset faster, while on average only causing 2% BD-rate increase and achieving similar rate control accuracy.
The detection of potentially illicit behaviors from recorded video footage is an emerging field of study in the domain of imageprocessing and computer vision. Detecting suspicious activities is essential for maintain...
详细信息
Many GPUs have incorporated hardware-accelerated video encoders, which allow video encoding tasks to be offloaded from the main CPU and provide higher power efficiency. Over the years, many new video codecs such as H....
详细信息
ISBN:
(纸本)9798350348439;9798350384611
Many GPUs have incorporated hardware-accelerated video encoders, which allow video encoding tasks to be offloaded from the main CPU and provide higher power efficiency. Over the years, many new video codecs such as H.265/HEVC, VP9, and AV1 were added to the latest GPU boards. Recently, the rise of live video content such as VTuber, game live-streaming, and live event broadcasts, drives the demand for high-efficiency hardware encoders in the GPUs to tackle these real-timevideo encoding tasks, especially at higher resolutions such as 4K/8K UHD. In this paper, RD performance, encoding speed, as well as power consumption of hardware encoders in several generations of NVIDIA, Intel GPUs as well as Qualcomm Snapdragon Mobile SoCs were evaluated and compared to the software counterparts, including the latest H.266/VVC codec, using several metrics including PSNR, SSIM, and machine-learning based VMAF. The results show that modern GPU hardware encoders can match the RD performance of software encoders in real-time encoding scenarios, and while encoding speed increased in newer hardware, there is mostly negligible RD performance improvement between hardware generations. Finally, the bitrate required for each hardware encoder to match YouTube transcoding quality was also calculated.
Denoising videos in real-time is critical in many applications, including robotics and medicine, where varying-light conditions, miniaturized sensors, and optics can substantially compromise image quality. This work p...
详细信息
Denoising videos in real-time is critical in many applications, including robotics and medicine, where varying-light conditions, miniaturized sensors, and optics can substantially compromise image quality. This work proposes the first video denoising method based on a deep neural network that achieves state-of-the-art performance on dynamic scenes while running in real-time on VGA video resolution with no frame latency. The backbone of our method is a novel, remarkably simple, temporal network of cascaded blocks with forward block output propagation. We train our architecture with short, long, and global residual connections by minimizing the restoration loss of pairs of frames, leading to a more effective training across noise levels. It is robust to heavy noise following Poisson-Gaussian noise statistics. The algorithm is evaluated on RAW and RGB data. We propose a denoising algorithm that requires no future frames to denoise a current frame, reducing its latency considerably. The visual and quantitative results show that our algorithm achieves state-of-the-art performance among efficient algorithms, achieving from two-fold to two-orders-of-magnitude speed-ups on standard benchmarks for video denoising.
Versatile video Coding (VVC) offers compression efficiency improvements of 50% and 75% compared to High Efficiency video Coding (HEVC) and Advanced video Coding (AVC), respectively. However, the VVC encoder software (...
详细信息
Volumetric capture is an important topic in eXtended reality (XR) as it enables the integration of realistic three-dimensional content into virtual scenarios and immersive applications. Certain systems are even capabl...
详细信息
ISBN:
(纸本)9798400704123
Volumetric capture is an important topic in eXtended reality (XR) as it enables the integration of realistic three-dimensional content into virtual scenarios and immersive applications. Certain systems are even capable of delivering these volumetric captures live and in real-time, opening the door to interactive use cases such as immersive videoconferencing. One example of such systems is FVV Live, a Free Viewpoint video (FVV) application capable of working in real-time with low delay Current breakthroughs in Artificial Intelligence (AI) in general and deep learning in particular report great success when applied to the computer vision tasks involved in volumetric capture, helping to overcome the quality and bandwidth restrictions that these systems often face. Despite their promising results, state-of-the-art approaches still come with the disadvantage of requiring large processing power and time. This project aims to advance the volumetric capture state-of-the-art applying the previously mentioned deep learning techniques, optimizing the models to work in real-time while still delivering high quality. The technology developed will be validated integrating it into immersive video communication systems such as FVV Live in order to overcome their main restrictions and to improve the quality delivered to the end user.
The trend of recent years is the continuous development of the Internet of Things (IoT). Among such things, a significant share is occupied by visual sensors and video cameras that generate large amounts of data. In t...
详细信息
ISBN:
(纸本)9798350310856
The trend of recent years is the continuous development of the Internet of Things (IoT). Among such things, a significant share is occupied by visual sensors and video cameras that generate large amounts of data. In turn, the need to attract significant storage resources, transmission throughput, and processing power is an inevitable solution for real-timevideo analytics. Thus, the combination of smart cameras with the computing paradigm of Cloud/Edge and IoT architectures form the next generation of video surveillance systems, called the "Internet of video Things" (IoVT In this paper, a new IoVT platform is developed that, in addition to harmoniously combining Edge/Cloud computing, uses SDN to overcome challenges such as flexible management, control, and maintenance of IoVT devices. In particular, within the proposed IoVT platform, an algorithm for the dynamic selection of Edge or Cloud computing is implemented using an SDN controller to provide effective video analytics in real-time. This algorithm considers such parameters as the priority of computational tasks, the number of video streams, and the image quality with the ability to adapt to a specific application by software configuration of the IoVT platform. We also demonstrate the effectiveness of the proposed solutions on real equipment and discuss several promising areas of application of the developed platform.
Visually impairments or blindness people need guidance in order to avoid collision risks with outdoor obstacles. Recently, technology has been proving its presence in all aspects of human life, and new devices provide...
详细信息
Visually impairments or blindness people need guidance in order to avoid collision risks with outdoor obstacles. Recently, technology has been proving its presence in all aspects of human life, and new devices provide assistance to humans on a daily basis. However, due to real-time dynamics or a lack of specialized knowledge, object detection confronts a reliability difficulty. To overcome the challenge, YOLO Glass a video-based Smart object detection model has been proposed for visually impaired person to navigate effectively in indoor and outdoor environments. Initially the captured video is converted into key frames and pre-processed using Correlation Fusion-based disparity approach. The pre-processed images were augmented to prevent overfitting of the trained model. The proposed method uses an obstacle detection system based on a Squeeze and Attendant Block YOLO Network model (SAB-YOLO). A proposed system assists visually impaired users in detecting multiple objects and their locations relative to their line of sight, and alerts them by providing audio messages via headphones. The system assists blind and visually impaired people in managing their daily tasks and navigating their surroundings. The experimental results show that the proposed system improves accuracy by 98.99%, proving that it can accurately identify objects. The detection accuracy of the proposed method is 5.15%, 7.15% and 9.7% better that existing YOLO v6, YOLO v5 and YOLO v3, respectively.
暂无评论