Reference Picture Resampling (RPR) is a powerful tool that allows improving video coding efficiency of next generation codecs like Versatile video Coding (VVC) or Enhanced Compression Model (ECM). This feature is well...
详细信息
ISBN:
(纸本)9781510679344;9781510679351
Reference Picture Resampling (RPR) is a powerful tool that allows improving video coding efficiency of next generation codecs like Versatile video Coding (VVC) or Enhanced Compression Model (ECM). This feature is well designed to support frame changing resolution without inserting an instantaneous decoder refresh (IDR) or intra random access picture (IRAP). video streaming and low delay scenarios can take advantage of RPR to ensure a smooth frame-based bit-rate adaptation, compared to traditional techniques that can generate bitrate leaps. This paper proposes an encoder method to select the picture resolution change parameters effectively depending on the video signal characteristics. The picture resolution change decision is based on a low complexity neural network, and it is performed before the encoding process without RD-score computations making this approach suitable for realtime and low delay implementation. The experiments under Random Access (RA) and All Intra (AI) configurations of the VVC Test Model (VTM-21.1) show that the proposed method can bring luma BD-rate gain improvement of 1.46% and 0.95% respectively compared to the VVC Test Model anchor.
Visual-based translator systems, which utilize image and video data for real-time translation, represent a captivating research area. While previous research has explored text-based translation, limitations exist in c...
详细信息
ISBN:
(纸本)9798350385434;9798350385427
Visual-based translator systems, which utilize image and video data for real-time translation, represent a captivating research area. While previous research has explored text-based translation, limitations exist in capturing the nuances of human communication. Visual-based approaches address this gap by leveraging deep learning and imageprocessing to extract information like objects, faces, and environmental context. This research project investigates the application of deep learning in sign language translation using Automated AI. By analyzing visual data, the system recognizes signs and translates them into spoken languages or generates text descriptions. This technology holds significant promise for various fields - education, entertainment, tourism, and healthcare - and can contribute to the advancement of information technology and artificial intelligence systems.
With the advancement of technology and due to the recent pandemic situation, the education sector has turned to the online teaching method. But the main problem here is the inconvenience and irregularities in the stud...
详细信息
real-timeimageprocessing involves the transformation of incoming signals, primarily from a camera, into a format that can be readily interpreted by a display device. This process is heavily reliant on precise timing...
详细信息
Distortions like blur and smoke in real-time laparoscopic videos often result from lens contamination. Detecting these distortions automatically and "in realtime"is a step preceding automatic lens cleaning ...
详细信息
Computer vision is a promising domain that focuses on emerging approaches, algorithms and technologies to provide computing capability to machine to analysis visual data, such as image files, videos files and real tim...
详细信息
USB2.0 is used to design the video retrieval system for sports events. The system uses image sensor as data source, including data transmission, real-time display and corner capture. The detection principle and implem...
详细信息
With the constant increase in video resolution and frame rate, notably those required for immersive applications, there is a need for efficient and reliable coding technologies suitable for very high visual quality de...
详细信息
ISBN:
(纸本)9781510679344;9781510679351
With the constant increase in video resolution and frame rate, notably those required for immersive applications, there is a need for efficient and reliable coding technologies suitable for very high visual quality delivery with very low latency. Immersive applications can use the low latency and high bandwidth throughput of 5G networks for increased mobility. JPEG XS is a low-complexity coding standard that can be implemented with very low latency. It is designed to provide visually lossless quality while offering compression efficiency, making it suitable for immersive applications that rely on video content. This paper reports a quality evaluation of omnidirectional video using 360 degrees test sequences coded with JPEG XS. A subjective quality experiment was performed using an alternating double-stimulus method in a VR environment, where subjects could freely commute between the reference and distorted video. Test sequences were encoded at five different bitrates, ranging from 0.35 bpp to 2 bpp. These bitrates are suitable for real-time high-resolution video transmission over 5G networks. It was concluded that JPEG XS provides an effective low-latency solution suitable for high-quality immersive applications using 5G networks.
This paper presents a video Inpainting algorithm that enables monocular-camera-laser-based pipeline inspection robots to capture both color and 3D information using only one video stream. Conventional monocular-camera...
详细信息
ISBN:
(纸本)9781665491907
This paper presents a video Inpainting algorithm that enables monocular-camera-laser-based pipeline inspection robots to capture both color and 3D information using only one video stream. Conventional monocular-camera-laser inspection methods are limited to capture either 2D color images or 3D point clouds since the laser tends to overexpose the actual color of the scanning area. We propose a real-timevideo Inpainting method to solve this problem with minimal hardware needs that can be easily integrated with conventional pipeline profiling robots. The algorithm is accelerated by two components: a lightweight network that directly predicts the complete optical flow and simplifies the algorithm pipeline, and the Polar coordinate transformation, which significantly reduces the imageprocessing compexity. real-world experiments demonstrate that our online algorithm has comparable or better color estimation accuracy against state-of-the-art offline algorithms, while is capable of running at 23 frames per second (FPS) on a laptop computer with a resolution of 1024x1024 pixels. In addition, we verify that this method can be used for video pre-processing for downstream tasks that require high-quality visual inputs, such as Simultaneously Localization and Mapping (SLAM). To the best of our knowledge, this is the first real-timevideo Inpainting algorithm that can be used for in-pipe environments, serving as an important building block for highly compact RGB-D inspection sensors and robots for the pipeline industry.
video Enhancement is an important computer vision task aiming at the removal of the artifacts from a lossy compressed video and the improvement of the visual properties by a photo-realistic restoration of the video co...
详细信息
ISBN:
(数字)9781665496209
ISBN:
(纸本)9781665496209
video Enhancement is an important computer vision task aiming at the removal of the artifacts from a lossy compressed video and the improvement of the visual properties by a photo-realistic restoration of the video contents. Decades of research produced a multitude of efficient algorithms, enabling the reduction of the memory footprint of the transferred video contents in a contiguously increasing network of video streaming services. In this work, we propose VETRAN - a low latency real-time online video Enhancement TRANsformer based on spatial and temporal attention mechanisms. We validate our method on recent video Enhancement NTIRE and AIM challenge benchmarks, i.e. REDS/REDS4, LDV, and IntVID. We improve over the compared state-of-the-art methods both quantitatively and qualitatively, while maintaining a low inference time.
暂无评论