real-timevideo analytics typically require video frames to be processed by a query to identify objects or activities of interest while adhering to an end-to-end frame processing latency constraint. This imposes a con...
详细信息
ISBN:
(纸本)9798400704437
real-timevideo analytics typically require video frames to be processed by a query to identify objects or activities of interest while adhering to an end-to-end frame processing latency constraint. This imposes a continuous and heavy load on backend compute and network infrastructure. video data, has inherent redundancy and does not always contain an object of interest for a given query. We leverage this property of video streams to propose a lightweight Load Shedder that can be deployed on edge servers or on inexpensive edge devices co-located with cameras. The proposed Load Shedder uses pixel-level color-based features to calculate a utility score for each ingress video frame and a minimum utility threshold to select interesting frames to send for query processing. Dropping unnecessary frames enables the video analytics query in the backend to meet the end-to-end latency constraint with fewer compute and network resources. To guarantee a bounded end-to-end latency at runtime, we introduce a control loop that monitors the backend load and dynamically adjusts the utility threshold. Performance evaluations show that the proposed Load Shedder selects a large portion of frames containing each object of interest while meeting the end-to-end frame processing latency constraint. Furthermore, it does not impose a significant latency overhead when running on edge devices with modest compute resources.
In-loop filtering (ILF) is a key technology in image/video coding for reducing the artifacts. Recently, neural network-based in-loop filtering methods achieve remarkable coding gains beyond the capability of advanced ...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
In-loop filtering (ILF) is a key technology in image/video coding for reducing the artifacts. Recently, neural network-based in-loop filtering methods achieve remarkable coding gains beyond the capability of advanced video coding standards, establishing themselves a promising candidate tool for future standards. However, the utilization of deep neural networks (DNN) brings high computational complexity and raises high demand of dedicated hardware, which is challenging to apply into general use. To address this limitation, we study an efficient in-loop filtering scheme by adopting look-up tables (LUTs). After training a DNN with a predefined reference range for in-loop filtering, we cache the output values of the DNN into a LUT via traversing all possible inputs. In the coding process, the filtered pixel is generated by locating the input pixels (to-be-filtered pixel and reference pixels) and interpolating between the cached values. To further enable larger reference range within the limited LUT storage, we introduce an enhanced indexing mechanism in the filtering process, and a clipping/finetuning mechanism in the training. The proposed method is implemented into the Versatile video Coding (VVC) reference software, VTM-11.0. Experimental results show that the proposed method, with three different configurations, achieves on average 0.13%similar to 0.51%, and 0.10%similar to 0.39% BD-rate reduction under the all-intra (AI) and random-access (RA) configurations respectively. The proposed method incurs only 1%similar to 8% time increase, an additional computation of 0.13 similar to 0.93 kMAC/pixel, and 164 similar to 1148 KB storage cost for a single model. Our method has explored a new and more practical approach for neural network-based ILF.
Urban public safety management relies heavily on video surveillance systems, which provide crucial visual data for resolving a wide range of incidents and controlling unlawful activities. Traditional methods for targe...
详细信息
Urban public safety management relies heavily on video surveillance systems, which provide crucial visual data for resolving a wide range of incidents and controlling unlawful activities. Traditional methods for target detection predominantly employ a two-stage approach, focusing on precision in identifying objects such as pedestrians and vehicles. These objects, typically sparse in large-scale, lower-quality surveillance footage, induce considerable redundant computation during the initial processing stage. This redundancy constrains real-time detection capabilities and escalates processing costs. Furthermore, transmitting raw images and videos laden with superfluous information to centralized back-end systems significantly burdens network communications and fails to capitalize on the computational resources available at diverse surveillance nodes. This study introduces DiffRank, a novel preprocessing method for fixed-angle videoimagery in urban surveillance. The method strategically generates candidate regions during preprocessing, thereby reducing redundant object detection and improving the efficiency of the detection algorithm. Drawing upon change detection principles, a background feature learning approach utilizing shallow features has been developed. This approach prioritizes learning the characteristics of fixed-area backgrounds over direct background identification. As a result, alterations in ROI are efficiently discerned using computationally efficient shallow features, markedly accelerating the generation of proposed Regions of Interest (ROIs) and diminishing the computational demands for subsequent object detection and classification. Comparative analysis on various public and private datasets illustrates that DiffRank, while maintaining high accuracy, substantially outperforms existing baselines in terms of speed, particularly with larger image sizes (e.g., an improvement exceeding 300 % at 1920 x1080 resolution). Moreover, the method demonstrates en
real-time near-infrared (NIR) face alignment holds significant importance across various domains, such as security, healthcare, and augmented reality. However, existing face alignment techniques tailored for visible-l...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
real-time near-infrared (NIR) face alignment holds significant importance across various domains, such as security, healthcare, and augmented reality. However, existing face alignment techniques tailored for visible-light (VIS) encounter a decline in accuracy when applied in NIR settings. This decline stems from the domain discrepancy between VIS and NIR facial domains and the absence of meticulously annotated NIR facial data. To address this issue, we introduce a system and strategy for gathering paired VIS-NIR facial images and meticulously annotating precise landmarks. Our system facilitates streamlined dataset preparation by utilizing automatic annotation transfer from VIS images to their corresponding NIR counterparts. Following our devised approach, we constructed an inaugural dataset comprising high-frame-rate paired VIS-NIR facial images with landmark annotations. Additionally, to enhance the diversity of facial data, we augment our dataset through VIS-NIR image-to-image (img2img) translation using publicly available facial landmark datasets. Through the retraining of face alignment models and subsequent evaluations, our findings demonstrate a noteworthy enhancement in the accuracy of face alignment under NIR conditions using our dataset. Furthermore, the augmented dataset exhibits refined accuracy, particularly notable in the case of different individuals' facial features.
Machine vision enables machines to extract rich information from image or video data and make intelligent decisions. However, approaches using artificial synapse hardware systems significantly limit the real-time and ...
详细信息
Machine vision enables machines to extract rich information from image or video data and make intelligent decisions. However, approaches using artificial synapse hardware systems significantly limit the real-time and accuracy in machine vision segmentation amid complex environments. Addressing this, we propose a novel three-terminal adaptive artificial-light-emitting synapse (AALS) capable of photoelectric double output along with adaptive behavior. The device uses silver nanowires (AgNWs) as polar conductive bridges to reduce reliance on transparent electrodes, while polyvinyl alcohol (PVA) dielectric layers adaptively modulate charge carrier concentrations in conductive channels. Additionally, we have designed an adaptive parallel neural network (APNN) and applied it to autonomous driving imageprocessing. This innovation significantly reduces adaptation time and notably enhances mean pixel accuracy (MPA) for semantic segmentation under overexposure and low-light conditions by 142.2% and 304.4%, respectively. Therefore, this work introduces new strategies for advanced adaptive vision, promising significant potential in intelligent driving and neuromorphic computing.
In an increasingly visual world, people with blindness and low vision (pBLV) face substantial challenges in navigating their surroundings and interpreting visual information. From our previous work, (VISION)-I-4 is a ...
详细信息
ISBN:
(纸本)9798350303582;9798350303599
In an increasingly visual world, people with blindness and low vision (pBLV) face substantial challenges in navigating their surroundings and interpreting visual information. From our previous work, (VISION)-I-4 is a smart wearable that helps pBLV in their daily challenges. It enables multiple microservices based on artificial intelligence (AI), such as visual scene processing, navigation, and vision-language inference. These microservices require powerful computational resources and, in some cases, stringent inference times, hence the need to offload computation to edge servers. This paper introduces a novel video streaming platform that improves the capabilities of (VISION)-I-4 by providing real-time support of the microservices at the network edge. When video is offloaded wirelessly to the edge, the time-varying nature of the wireless network requires adaptation strategies for a seamless video service. We demonstrate the performance of our adaptive real-timevideo streaming platform through experimentation with an open-source 5G deployment based on open air interface (OAI). The experiments demonstrate the ability to provide microservices robustly in time-varying network conditions.
Remote driving aims to improve transport systems by promoting efficiency, sustainability, and accessibility. In the railway sector, remote driving makes it possible to increase flexibility, as the driver no longer has...
详细信息
ISBN:
(纸本)9798350344998;9798350345001
Remote driving aims to improve transport systems by promoting efficiency, sustainability, and accessibility. In the railway sector, remote driving makes it possible to increase flexibility, as the driver no longer has to be in the cab. However, this brings several challenges, as it has to provide at least the same level of safety obtained when the driver is in the cab. To achieve it, wireless networks and video streaming technologies gain importance as they should provide real-time track visualization and obstacle detection capabilities to the remote driver. Low latency camera capture, onboard media processing devices, and streaming protocols adapted for wireless links are the necessary enablers to be developed and integrated into the railway infrastructure. This paper compares video streaming protocols such as real-time Streaming Protocol (RTSP) and Web real-time Communication (WebRTC), as they are the main alternatives based on real-time Transport Protocol (RTP) protocol to enable low latency. As latency is the main performance metric, this paper also provides a solution to calculate the End-to-End video streaming latency analytically. Finally, the paper proposes a rate control algorithm to adapt the video stream depending on the network capacity. The objective is to keep the latency as low as possible while avoiding any visual artifacts. The proposed solutions are tested in different setups and scenarios to prove their effectiveness before the planned field testing.
Fish are a critical component of marine biology;therefore, the accurate identification and counting of fish are essential for the objective monitoring and assessment of marine biological resources. High-frequency adap...
详细信息
Fish are a critical component of marine biology;therefore, the accurate identification and counting of fish are essential for the objective monitoring and assessment of marine biological resources. High-frequency adaptive resolution imaging sonar (ARIS) is widely used for underwater object detection and imaging, and it quickly obtains close-up video of free-swimming fish in high-turbidity water environments. Nonetheless, processing the massive data output using imaging sonars remains a major challenge. Here, the authors developed an automatic image-processing programme that fuses K-nearest neighbour background subtraction with DeepSort target tracking to automatically track and count fish. The automatic programme was evaluated using four test data sets with different target sizes and observation ranges and differently deployed sonars. According to the results, the approach successfully counted free-swimming fish targets with an accuracy index of 73% and a completeness index of 70%. Under appropriate conditions, this approach could replace time-consuming semi-automatic approaches and improve the efficiency of imaging sonar data processing, while providing technical support for future real-time data processing.
In recent years, the scarcity of effective communication systems has been an essential issue for disabled people [physically disabled, locomotor disability, and amyotrophic lateral sclerosis (ALS)] who cannot speak, w...
详细信息
In recent years, the scarcity of effective communication systems has been an essential issue for disabled people [physically disabled, locomotor disability, and amyotrophic lateral sclerosis (ALS)] who cannot speak, walk, or move their hands. The lives of disabled people depend on others for survival, so they need assistive technology to live independently. This research paper aims to develop an efficient real-time eye-gaze communication system using a low-cost webcam for disabled persons. This proposed work developed a video-Oculography (VOG) based system under natural head movements using a 5-point user-specific calibration (algorithmic calibration) approach for eye-tracking and cursor movement. During calibration, some parameters are calculated and used to control the computer with the eyes. Additionally, we designed a graphical user interface (GUI) to examine the performance and fulfill the basic daily needs of disabled individuals. The proposed method enables disabled persons to operate a computer by moving and blinking their eyes, similar to a typical computer user. The overall cost of the developed system is low (Cost < $50, varies based on camera usage) compared to the cost of various existing systems. The proposed system is tested with disabled and non-disabled individuals and has achieved an average blinking accuracy of 97.66%. The designed system has attained an average typing speed of 15 and 20 characters per minute for disabled and non-disabled participants, respectively. On average, the system has achieved a visual angle accuracy of 2.2 degrees for disabled participants and 0.8 degrees for non-disabled participants. The experiment's outcomes demonstrate that the developed system is robust and accurate.
The appearance of the log cross-section provides important information when assessing the quality of the log, where properties to consider include pith location and density of annual rings. This makes tasks like estim...
详细信息
The appearance of the log cross-section provides important information when assessing the quality of the log, where properties to consider include pith location and density of annual rings. This makes tasks like estimation of pith location and annual ring detection of great interest. However, creating labeled training data for these tasks can be time-consuming and subject to misjudgments. For this reason, we aim to create generated training data with controlled properties of pith location and amount of annual rings. We propose a two-step generator based on generative adversarial networks in which we can completely avoid manual labeling, not only when generating training data but also during training of the generator itself. This opens up the possibility to train the generator on other types of log end data without the need to manually label new training data. The same method is used to create two generated training datasets;one of entire log ends and one of patches of log ends. To evaluate how the generated data compares to real data, we train two deep learning models to perform estimation of pith location and ring counting, respectively. The models are trained separately on real and generated data and evaluated on real data only. The results show that the performance of both estimation of pith location and ring counting can be improved by replacing real training data with larger sets of generated training data.
暂无评论