With the rapid development of video-on-demand (VOD) and real-time streaming video technologies, the accurate objective assessment of streaming video Quality of Experience (QoE) has become a focal point for optimizing ...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
With the rapid development of video-on-demand (VOD) and real-time streaming video technologies, the accurate objective assessment of streaming video Quality of Experience (QoE) has become a focal point for optimizing streaming-related technologies. However, due to the inherent transmission distortions caused by poor Quality of Service (QoS) conditions in streaming videos, such as intermittent stalling, rebuffering, and drastic changes in video sharpness due to bitrate fluctuations, evaluating streaming video QoE presents numerous challenges. This paper introduces a large and diverse in-the-wild streaming video QoE evaluation dataset - the SJLIVE-1k dataset. This work addresses the limitations of corresponding datasets, which lack in-the-wild video sequences under real network conditions and whose amount of video content is insufficient. Furthermore, we propose an end-to-end objective QoE evaluation strategy that extracts video content and QoS features from the video itself without using any extra information. By implementing self-supervised contrastive learning as the "reminder" to bridge the gap between the different types of features, our approach achieves state-of-the-art results across three datasets. Our proposed dataset will be released to facilitate further research.
In this paper, we present EdgeRelight360, an approach for real-timevideo portrait relighting on mobile devices, utilizing text-conditioned generation of 360-degree high dynamic range image (HDRI) maps. Our method pro...
详细信息
ISBN:
(纸本)9798350365474
In this paper, we present EdgeRelight360, an approach for real-timevideo portrait relighting on mobile devices, utilizing text-conditioned generation of 360-degree high dynamic range image (HDRI) maps. Our method proposes a diffusion-based text-to-360-degree image generation in the HDR domain, taking advantage of the HDR10 standard. This technique facilitates the generation of high-quality, realistic lighting conditions from textual descriptions, offering flexibility and control in portrait video relighting task. Unlike the previous relighting frameworks, our proposed system performs video relighting directly on-device, enabling real-time inference with real 360-degree HDRI maps. This on-device processing ensures both privacy and guarantees low runtime, providing an immediate response to changes in lighting conditions or user inputs. Our approach paves the way for new possibilities in real-timevideo applications, including video conferencing, gaming, and augmented reality, by allowing dynamic, text-based control of lighting conditions.
This study introduces a proposed integrated imageprocessing pipeline to enhance vehicle detection and counting precision in real-timevideo streams. This method can accurately pinpoint areas in the videos where cars ...
详细信息
This study introduces a proposed integrated imageprocessing pipeline to enhance vehicle detection and counting precision in real-timevideo streams. This method can accurately pinpoint areas in the videos where cars are present by using Regions of Interest (ROIs) and segmenting the frames. To ensure efficient processing of images, it is essential to optimize their quality and dimensions. Utilizing Convolutional Neural Networks (CNNs) for feature extraction enables us to impart discriminative features through hierarchical layers. Afterward, machine learning models enhance the extracted features before applying them to classify automobiles. There are two post-processing tasks: implementing vehicle counting methods that consider discovered ROIs and optimizing image size further. The primary aim is to achieve precise and effective vehicle counting and identification with this method, which is essential for tasks like traffic surveillance. Based on the experimental findings, the system effectively balances processing efficiency and accuracy in vehicle recognition and classification. With this integrated infrastructure, real-time enhancements could be made to processing traffic surveillance video streams.
Recent advancements in volumetric displays have opened doors to immersive, glass-free holographic experiences in our everyday environments. This paper introduces Holoportal, a real-time, low-latency system that captur...
详细信息
ISBN:
(纸本)9781510673878;9781510673861
Recent advancements in volumetric displays have opened doors to immersive, glass-free holographic experiences in our everyday environments. This paper introduces Holoportal, a real-time, low-latency system that captures, processes, and displays 3D video of two physically separated individuals as if they are conversing face-to-face in the same location. The evolution of work in multi-view immersive video communication from a Space-time-Flow (STF) media technology to realtime Holoportal communication is also discussed. Multiple cameras at each location capture subjects from various angles, with wireless synchronization for precise video-frame alignment. Through this technology we envision a future where any living space can transform into a Holoportal with a wireless network of cameras placed on various objects, including TVs, speakers, and refrigerators.
Multiply-Accumulate (MAC) operation is widely used in various real-timeimageprocessing tasks, ranging from Convolutional Neural Networks to digital filtering, significantly impacting overall system performance. In t...
详细信息
ISBN:
(纸本)9781510673199;9781510673182
Multiply-Accumulate (MAC) operation is widely used in various real-timeimageprocessing tasks, ranging from Convolutional Neural Networks to digital filtering, significantly impacting overall system performance. In this work the Self-Adapting Reconfigurable Multiply-Accumulate (SR-MAC) is proposed as a new instrument to find the optimal trade-off between operation throughput, power consumption and physical resources utilization in real-timeimageprocessing applications. Operations of the proposed system rely on the dynamic reconfiguration of the hardware resources on the basis of the current computational requirements. This is achieved by monitoring overflow and over-representation occurrences at each accumulation cycle, and properly considering the relevant portion of the accumulation result. A custom architecture of the proposed algorithm has been designed and implemented on an AMD Xilinx Artix-7 FPGA through a Verilog description and compared to the AMD Xilinx fixed-point macro (floating-point fused multiply-accumulate). The SR-MAC achieves reductions of 83% (82%), 79% (93%) and 87.2% (94%) in the number of LUTs, FFs, and the power dissipation, P-dynN, respectively. The SR-MAC has also been used to replace arithmetic units in typical real-timeimageprocessing applications. In these cases, its employment has allowed the reduction up to 6% and 14% of FFs and P-dynN, respectively, while increasing up to 14% the f(Max). These results highlight the significant performance enhancement achieved with respect to both single operators and entire systems, making SR-MAC an excellent design choice in real-timeimageprocessing applications.
There has been a rise in the frequency of fire-related calamities all over the globe, which leads to the need for an efficient fire detection system to avoid high losses or fatalities. This paper focuses on real-time ...
详细信息
In the dynamic landscape of modern communication, the demand for innovative virtual video conferencing solutions is ever-increasing. Our work presents an innovative approach to building a virtual video conferencing sy...
详细信息
ISBN:
(纸本)9798350391893;9798350391886
In the dynamic landscape of modern communication, the demand for innovative virtual video conferencing solutions is ever-increasing. Our work presents an innovative approach to building a virtual video conferencing system that can be used by remote users with the help of a web page. Our system allows remote participants, joining via a web browser, to freely navigate and view the virtual environment from any angle, enhancing spatial awareness and engagement. Additionally, our system grants participants the freedom to view the environment independently, even if the host restricts certain views which is one of the main drawbacks of the current video conferencing systems. Unlike current platforms, our solution also allows users to choose their appearance location within the virtual space and this feature is missing in the current systems. Furthermore, our system is highly customizable, enabling the integration of features such as recording specific portions of the screen, which are not available in existing video conferencing tools. This flexibility ensures a more immersive, interactive, and personalized meeting experience, significantly advancing the capabilities of remote collaboration technologies. Our work highlights the results of research carried out to create a virtual conference setting in the Unity environment and establish successful real-time communication between the webpage and the Unity environment. In this virtual setting, monitors act as participants, and participants can choose on which monitor they want to appear. Participants join this virtual meeting setup from a webpage, which consists of two windows: the first window shows the participants themselves, and the second window displays the virtual meeting setup. Participants can observe this environment from any perspective they want, navigating using a keyboard and a mouse. Since we are implementing everything from scratch, we have full control over every feature and functionality with Agora video SDK
video Object Detection (VOD) is one of the fundamental problems in video understanding with applications ranging from surveillance to autonomous driving. But many such real-world applications are unable to leverage th...
详细信息
ISBN:
(纸本)9781728198354
video Object Detection (VOD) is one of the fundamental problems in video understanding with applications ranging from surveillance to autonomous driving. But many such real-world applications are unable to leverage the existing VOD models owing to their higher computational complexity which reduces inference speed. Single-stage still-image object detection models are naively used without any use of video information. In this paper, we present YOLOX based VOD model, YOLO-MaxVOD, which provides a better trade-off between accuracy and inference time than the current real-time VOD solutions. Specifically, we propose a temporal fusion module that integrates within the YOLOX architecture to take advantage of the high speed that the YOLOX model offers. In our experimentation on the imagenet-VID dataset, we show that YOLO-MaxVOD shows 4.4-5.6% AP50 improvement over the baseline YOLOX, across different versions, with just a 1-2 ms increase in latency on NVIDIA 1080Ti GPU.
Recent algorithmic developments, specifically in deep learning, have propelled computer vision forward for practical applications. However, the high computational complexity and the resulting power consumption are oft...
详细信息
ISBN:
(纸本)9781510673199;9781510673182
Recent algorithmic developments, specifically in deep learning, have propelled computer vision forward for practical applications. However, the high computational complexity and the resulting power consumption are often overlooked issues. This is not only a problem if the systems need to be installed in the wild, where often only a limited electricity supply is available, but also in the context of high energy consumption. To address both aspects, we explore the intersection of green artificial intelligence and real-time computer vision, focusing on the use of single-board computers. To this end, we need to take into account the limitations of single-board computers, including limited processing power and storage capacity, and demonstrate how the algorithm and data optimization ensure high-quality results, however, at a drastically reduced computational effort. Energy efficiency can be increased, aligning with the goals of Green AI and making such systems less dependent on a permanent electrical power supply.
Optical imageprocessing, which capitalizes on the distinctive characteristics of light, facilitates the manipulation of visual data in real-time and at a high speed. This technology is instrumental in performing task...
详细信息
ISBN:
(纸本)9781510673199;9781510673182
Optical imageprocessing, which capitalizes on the distinctive characteristics of light, facilitates the manipulation of visual data in real-time and at a high speed. This technology is instrumental in performing tasks such as enhancing edges, recognizing patterns, and extracting features, all of which are crucial in fields like medical imaging, surveillance, and industrial automation. In this study, we present the successful demonstration of a photonic integrated circuit (PIC) made of Lithium niobate on insulator, enabling matrix-vector multiplications for image classification. By surpassing an electrical bandwidth of 15 GHz, our experiment showcases the PIC's ability live edge detection and video streaming. Remarkably, its energy efficiency surpasses the limit imposed by electronic systems for each operation by consuming < 10 fJ/bit.
暂无评论