video analytics systems designed for computer vision tasks use deep learning models that rely on high-quality input data to maximize performance. However, in a real-world system, these inputs are often compressed usin...
详细信息
ISBN:
(数字)9781665496209
ISBN:
(纸本)9781665496209
video analytics systems designed for computer vision tasks use deep learning models that rely on high-quality input data to maximize performance. However, in a real-world system, these inputs are often compressed using video codecs such as HEVC. video compression degrades the quality of the inputs, thereby degrading the performance of these models. Region-of-interest (ROI) coding enables bits to be allocated to improve performance;however, the method to select regions should be computationally simple since it must occur during or before the video is compressed and transmitted for further processing. In this paper, we propose a task-aware quad-tree (TA-QT) partitioning and quantization method to achieve ROI coding for HEVC and other video coding standards. TAQT uses a lightweight edge-based model to guide task-aware video encoding to improve end-stage video analytics (ESVA) performance while reducing both bit-rate and encoding time. We demonstrate the effectiveness of our approach in terms of (a) the performance of the ESVA on compressed inputs, (b) transmission bit-rates, and (c) encoding time.
The rapid increase in camera installations on offshore drilling platforms has intensified the challenge of highconcurrency video data processing. Traditional single-cloud server video analysis is becoming inadequate, ...
详细信息
ISBN:
(纸本)9798350359329;9798350359312
The rapid increase in camera installations on offshore drilling platforms has intensified the challenge of highconcurrency video data processing. Traditional single-cloud server video analysis is becoming inadequate, leading to heightened processing latency and bandwidth overuse. In response, we propose an edge-cloud collaborative video object detection architecture based on a GRU-Enhanced Double Deep Q-Network (GE-DDQN). Our architecture utilizes the YOLOv8 algorithm for object detection and incorporates a GE-DDQN model for efficient task offloading between edge and cloud computing. A token bucket mechanism is employed to regulate data offloading rates from edge devices, optimizing collaborative efficiency for highperformance detection. Comparative experiments on real offshore drilling platform video data underscore the superiority of our method in managing dynamic and complex video streams. The architecture demonstrates remarkable video analysis performance in this domain, achieving a precision of 90.2% and a processing speed of 25.45 FPS, marking a significant advancement in edgecloud video analytics.
The COVID pandemic has led to the wide adoption of online video calls in recent years. However, the increasing reliance on video calls provides opportunities for new impersonation attacks by fraudsters using the advan...
详细信息
Silent speech is unaffected by ambient noise, increases accessibility, and enhances privacy and security. Yet current silent speech recognizers operate in a phrase-in/phrase-out manner, thus are slow, error prone, and...
详细信息
ISBN:
(纸本)9798400703300
Silent speech is unaffected by ambient noise, increases accessibility, and enhances privacy and security. Yet current silent speech recognizers operate in a phrase-in/phrase-out manner, thus are slow, error prone, and impractical for mobile devices. We present MELDER, aMobile Lip Reader that operates in real-time by splitting the input video into smaller temporal segments to process them individually. An experiment revealed that this substantially improves computation time, making it suitable for mobile devices. We further optimize the model for everyday use by exploiting the knowledge from a high-resource vocabulary using a transfer learning model. We then compare MELDER in both stationary and mobile settings with two state-of-the-art silent speech recognizers, where MELDER demonstrated superior overall performance. Finally, we compare two visual feedback methods of MELDER with the visual feedback method of Google Assistant. The outcomes shed light on how these proposed feedback methods influence users' perceptions of the model's performance.
Deep learning-based action classification technology has been applied to various fields,such as social safety,medical services,and *** an action on a practical level requires tracking multiple human bodies in an image...
详细信息
Deep learning-based action classification technology has been applied to various fields,such as social safety,medical services,and *** an action on a practical level requires tracking multiple human bodies in an image in real-time and simultaneously classifying their *** are various related studies on the real-time classification of actions in an ***,existing deep learning-based action classification models have prolonged response speeds,so there is a limit to real-time *** addition,it has low accuracy of action of each object ifmultiple objects appear in the ***,it needs to be improved since it has a memory overhead in processingimage *** learning-based action classification using one-shot object detection is proposed to overcome the limitations of multiframe-based analysis *** proposed method uses a one-shot object detection model and a multi-object tracking algorithm to detect and track multiple objects in the ***,a deep learning-based pattern classification model is used to classify the body action of the object in the image by reducing the data for each object to an action *** to the existing studies,the constructed model shows higher accuracy of 74.95%,and in terms of speed,it offered better performance than the current studies at 0.234 s per *** proposed model makes it possible to classify some actions only through action vector learning without additional image learning because of the vector learning feature of the posterior neural ***,it is expected to contribute significantly to commercializing realistic streaming data analysis technologies,such as CCTV.
This paper presents a real-time semantic segmentation framework for camera-based environment perception of objects and infrastructure elements in autonomous scale cars. It is specifically targeted towards student comp...
详细信息
ISBN:
(纸本)9798350394283;9798350394276
This paper presents a real-time semantic segmentation framework for camera-based environment perception of objects and infrastructure elements in autonomous scale cars. It is specifically targeted towards student competitions such as the Carolo Cup or the Bosch Future Mobility Challenge. To reduce pixel-wise manual annotation efforts, our framework involves a mixture of both synthetic and realimage data, carefully tuned towards the unique requirements of the given scenario. realimages are acquired from a 1:10 scale vehicle equipped with a single monocular camera and are manually annotated. Synthetic image data with automatic pixel-wise annotation is obtained via a custom Unity-based simulation pipeline. We evaluate various mixed real-synthetic data strategies to train different state-of-the-art deep neural networks with a focus on both segmentation performance and real-time capability using an NVIDIA Jetson AGX Xavier platform as in-vehicle test bed. Our experimental results show a significant improvement in semantic segmentation performance of the mixed real-synthetic data approach at real-time speeds of approximately 60 FPS on the target platform.
The high level of compression achieved by high efficiency video coding (HEVC) helps reduce network traffic loads and mitigate data rate requirements. However, HEVC is vulnerable to error-prone channels where transmiss...
详细信息
The high level of compression achieved by high efficiency video coding (HEVC) helps reduce network traffic loads and mitigate data rate requirements. However, HEVC is vulnerable to error-prone channels where transmission errors can result in severe degradation of video quality. In this paper, a saliency-aware encoding scheme is proposed to improve the error robustness of HEVC streaming, by reducing temporal error propagation in case of packet loss. The proposed scheme firstly introduces a saliency detection model in compressed domain, based on two HEVC features derived from the depth splitting of the coding unit and the residual. Incorporated with the saliency map, an improved reference frame selection strategy is then introduced to reduce the inter-prediction mismatch that occurs at the decoder after packet loss. Specifically, the reference frames are dynamically selected based on the saliency-weighted Lagrangian optimisation, which not only reduces the number of prediction units (PUs) that depend on a single reference in saliency regions but also chooses the optimal coding mode for non-saliency regions. Finally, the most salient PUs are required to select the reference block in which most of the pixels are coded with intra-mode, for providing more robust reference to saliency regions. The simulation results show that the proposed reference picture selection scheme outperforms other reference methods with higher error robustness and a smaller loss in coding efficiency. Compared to the HEVC reference software, the proposed scheme is able to improve the quality of recovered video after packet loss, achieving average PSNR gains of up to 1.92 dB.
Existing spectral imaging technology based on compressed coding requires tens of minutes or even hours to obtain higher-quality spectral data. This limits their use in real dynamic scenarios and can only be discussed ...
详细信息
Existing spectral imaging technology based on compressed coding requires tens of minutes or even hours to obtain higher-quality spectral data. This limits their use in real dynamic scenarios and can only be discussed theoretically. Therefore, we propose a non-iterative algorithm model based on image reflection intensity-estimation aid (IRI-EA). The algorithm studies the approximate proportional relationship between the reflection strength of the RGB diagram and the corresponding spectrum image and reconstructs high-quality spectral data within about 20 s. By solving the difference map of the corresponding spectral scene, combining it with the spectral data of the IRI method, and introducing the total guidance (TG) filter, the reconstruction error can be significantly reduced, and the spectral reconstruction quality can be improved. Compared with other advanced methods, numerous experimental results indicate the advantages of this method in reconstruction quality and efficiency. Specifically, compared with the existing advanced methods, the average efficiency of our method has improved by at least 85%. Our reconstruction model opens up the possibility of processingreal-timevideo and accelerating other methods. (c) 2024 Optica Publishing Group. All rights, including for text and data mining (TDM), Artificial Intelligence (AI) training, and similar technologies, are reserved.
Nowadays, China’s digital media technology is relatively lagging behind, and its application in the field of teaching is the goal pursued by many scholars. A real-time teacher-student interaction environment has been...
详细信息
Most of the current video codecs support only translational motion models. However, real motion is often complex and cannot be precisely estimated using only translational models. To handle complex motions like pannin...
详细信息
ISBN:
(纸本)9781728198354
Most of the current video codecs support only translational motion models. However, real motion is often complex and cannot be precisely estimated using only translational models. To handle complex motions like panning, zooming, scaling, shearing and rotation, AOMedia AV1 encoder counts with two tools, called Global and Local Warped Motion Compensation (LWMC). This paper presents two dedicated hardware designs for the AV1 LWMC interpolation filters. The presented hardware can process up to UHD 8K videos at 60fps. The architecture was synthesized for 40nm TSMC standard cells, requiring 454.37K gates with a power dissipation of 189.35mW. To the best of the authors' knowledge, this is the first work in the literature targeting a dedicated hardware design for LWMC AV1 tool.
暂无评论