Versatile video Coding (VVC) offers compression efficiency improvements of 50% and 75% compared to High Efficiency video Coding (HEVC) and Advanced video Coding (AVC), respectively. However, the VVC encoder software (...
详细信息
Modern wafer inspection systems in Integrated Circuit (IC) manufacturing utilize deep neural networks. The training of such networks requires the availability of a very large number of defective or faulty die patterns...
详细信息
ISBN:
(纸本)9781510673878;9781510673861
Modern wafer inspection systems in Integrated Circuit (IC) manufacturing utilize deep neural networks. The training of such networks requires the availability of a very large number of defective or faulty die patterns on a wafer called wafer maps. The number of defective wafer maps on a production line is often limited. In order to have a very large number of defective wafer maps for the training of deep neural networks, generative models can be utilized to generate realistic synthesized defective wafer maps. This paper compares the following three generative models that are commonly used for generating synthesized images: Generative Adversarial Network (GAN), Variational Auto-Encoder (VAE), and CycleGAN which is a variant of GAN. The comparison is carried out based on the public domain wafer map dataset WM-811K. The quality aspect of the generated wafer map images is evaluated by computing the five metrics of peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), inception score (IS), Frechet inception distance (FID), and kernel inception distance (KID). Furthermore, the computational efficiency of these generative networks is examined in terms of their deployment in a real-time inspection system.
The detection of potentially illicit behaviors from recorded video footage is an emerging field of study in the domain of imageprocessing and computer vision. Detecting suspicious activities is essential for maintain...
详细信息
The field of imageprocessing is playing a vital role in making technological changes those results in realtime applications. image scaling is one of such fundamental method that helps to resolve storage issue and al...
详细信息
Versatile video Coding (VVC) allows for large compression efficiency gains over its predecessor, High Efficiency video Coding (HEVC). The added efficiency comes at the cost of increased runtime complexity, especially ...
详细信息
ISBN:
(纸本)9781728198354
Versatile video Coding (VVC) allows for large compression efficiency gains over its predecessor, High Efficiency video Coding (HEVC). The added efficiency comes at the cost of increased runtime complexity, especially for encoding. It is thus highly relevant to explore all available runtime reduction options. This paper proposes a novel first pass for two-pass rate control in all-intra configuration, using low-complexity video analysis and a Random Forest (RF)-based machine learning model to derive the data required for driving the second pass. The proposed method is validated using VVenC, an open and optimized VVC encoder. Compared to the default two-pass rate control algorithm in VVenC, the proposed method achieves around 32% reduction in encoding time for the preset faster, while on average only causing 2% BD-rate increase and achieving similar rate control accuracy.
Denoising videos in real-time is critical in many applications, including robotics and medicine, where varying-light conditions, miniaturized sensors, and optics can substantially compromise image quality. This work p...
详细信息
Denoising videos in real-time is critical in many applications, including robotics and medicine, where varying-light conditions, miniaturized sensors, and optics can substantially compromise image quality. This work proposes the first video denoising method based on a deep neural network that achieves state-of-the-art performance on dynamic scenes while running in real-time on VGA video resolution with no frame latency. The backbone of our method is a novel, remarkably simple, temporal network of cascaded blocks with forward block output propagation. We train our architecture with short, long, and global residual connections by minimizing the restoration loss of pairs of frames, leading to a more effective training across noise levels. It is robust to heavy noise following Poisson-Gaussian noise statistics. The algorithm is evaluated on RAW and RGB data. We propose a denoising algorithm that requires no future frames to denoise a current frame, reducing its latency considerably. The visual and quantitative results show that our algorithm achieves state-of-the-art performance among efficient algorithms, achieving from two-fold to two-orders-of-magnitude speed-ups on standard benchmarks for video denoising.
Volumetric capture is an important topic in eXtended reality (XR) as it enables the integration of realistic three-dimensional content into virtual scenarios and immersive applications. Certain systems are even capabl...
详细信息
ISBN:
(纸本)9798400704123
Volumetric capture is an important topic in eXtended reality (XR) as it enables the integration of realistic three-dimensional content into virtual scenarios and immersive applications. Certain systems are even capable of delivering these volumetric captures live and in real-time, opening the door to interactive use cases such as immersive videoconferencing. One example of such systems is FVV Live, a Free Viewpoint video (FVV) application capable of working in real-time with low delay Current breakthroughs in Artificial Intelligence (AI) in general and deep learning in particular report great success when applied to the computer vision tasks involved in volumetric capture, helping to overcome the quality and bandwidth restrictions that these systems often face. Despite their promising results, state-of-the-art approaches still come with the disadvantage of requiring large processing power and time. This project aims to advance the volumetric capture state-of-the-art applying the previously mentioned deep learning techniques, optimizing the models to work in real-time while still delivering high quality. The technology developed will be validated integrating it into immersive video communication systems such as FVV Live in order to overcome their main restrictions and to improve the quality delivered to the end user.
Recent Large Language Models (LLMs) have been enhanced with vision capabilities, enabling them to comprehend images, videos, and interleaved vision-language content. However, the learning methods of these large multim...
详细信息
ISBN:
(纸本)9798350353006
Recent Large Language Models (LLMs) have been enhanced with vision capabilities, enabling them to comprehend images, videos, and interleaved vision-language content. However, the learning methods of these large multimodal models (LMMs) typically treat videos as predetermined clips, rendering them less effective and efficient at handling streaming video inputs. In this paper, we propose a novel Learning-In-video-Stream (LIVE) framework, which enables temporally aligned, long-context, and real-time dialogue within a continuous video stream. Our LIVE framework comprises comprehensive approaches to achieve video streaming dialogue, encompassing: (1) a training objective designed to perform language modeling for continuous streaming inputs, (2) a data generation scheme that converts offline temporal annotations into a streaming dialogue format, and (3) an optimized inference pipeline to speed up interactive chat in real-world video streams. With our LIVE framework, we develop a simplified model called videoLLM-online and demonstrate its significant advantages in processing streaming videos. For instance, our videoLLM-online-7B model can operate at over 10 FPS on an A100 GPU for a 5-minute video clip from Ego4D narration. Moreover, videoLLM-online also showcases state-of-the-art performance on public offline video benchmarks, such as recognition, captioning, and forecasting. The code, model, data, and demo have been made available at ***/videollm-online.
The trend of recent years is the continuous development of the Internet of Things (IoT). Among such things, a significant share is occupied by visual sensors and video cameras that generate large amounts of data. In t...
详细信息
ISBN:
(纸本)9798350310856
The trend of recent years is the continuous development of the Internet of Things (IoT). Among such things, a significant share is occupied by visual sensors and video cameras that generate large amounts of data. In turn, the need to attract significant storage resources, transmission throughput, and processing power is an inevitable solution for real-timevideo analytics. Thus, the combination of smart cameras with the computing paradigm of Cloud/Edge and IoT architectures form the next generation of video surveillance systems, called the "Internet of video Things" (IoVT In this paper, a new IoVT platform is developed that, in addition to harmoniously combining Edge/Cloud computing, uses SDN to overcome challenges such as flexible management, control, and maintenance of IoVT devices. In particular, within the proposed IoVT platform, an algorithm for the dynamic selection of Edge or Cloud computing is implemented using an SDN controller to provide effective video analytics in real-time. This algorithm considers such parameters as the priority of computational tasks, the number of video streams, and the image quality with the ability to adapt to a specific application by software configuration of the IoVT platform. We also demonstrate the effectiveness of the proposed solutions on real equipment and discuss several promising areas of application of the developed platform.
Visually impairments or blindness people need guidance in order to avoid collision risks with outdoor obstacles. Recently, technology has been proving its presence in all aspects of human life, and new devices provide...
详细信息
Visually impairments or blindness people need guidance in order to avoid collision risks with outdoor obstacles. Recently, technology has been proving its presence in all aspects of human life, and new devices provide assistance to humans on a daily basis. However, due to real-time dynamics or a lack of specialized knowledge, object detection confronts a reliability difficulty. To overcome the challenge, YOLO Glass a video-based Smart object detection model has been proposed for visually impaired person to navigate effectively in indoor and outdoor environments. Initially the captured video is converted into key frames and pre-processed using Correlation Fusion-based disparity approach. The pre-processed images were augmented to prevent overfitting of the trained model. The proposed method uses an obstacle detection system based on a Squeeze and Attendant Block YOLO Network model (SAB-YOLO). A proposed system assists visually impaired users in detecting multiple objects and their locations relative to their line of sight, and alerts them by providing audio messages via headphones. The system assists blind and visually impaired people in managing their daily tasks and navigating their surroundings. The experimental results show that the proposed system improves accuracy by 98.99%, proving that it can accurately identify objects. The detection accuracy of the proposed method is 5.15%, 7.15% and 9.7% better that existing YOLO v6, YOLO v5 and YOLO v3, respectively.
暂无评论