The huge computation burden of state-of-the-art video coding technologies can be mitigated with Region-of-Interest (ROI) techniques that limit the highest coding effort to salient regions. However, the complexity over...
详细信息
ISBN:
(数字)9789464593617
ISBN:
(纸本)9798331519773
The huge computation burden of state-of-the-art video coding technologies can be mitigated with Region-of-Interest (ROI) techniques that limit the highest coding effort to salient regions. However, the complexity overhead of saliency detection can easily cancel out the speed gain of ROI coding. This work introduces a lightweight ROI tracking technique that can be used in place of compute-intensive ROI detection to guide a video encoder in inter coding. Low computational overhead is achieved by feeding motion vectors (MVs) of a video encoder back to our neural network that is trained for accurate estimation of ROI movement and size changes. The network training is carried out with our new dataset that is also released in this work to foster the development of head tracking techniques in applications like video conferencing. Our experimental results demonstrate substantial speedups with minimal accuracy tradeoffs over traditional salient object detection (SOD) methods. In scenarios, where a single ROI is tracked with a 64-frame detection interval, our solution obtains up to 50-fold speedup with accuracy of 87% and an average ROI center error of 16 pixels. These results confirm that our ROI tracking approach is a potential technique for low-cost and low-power streaming media applications.
With the advances of embedded GPUs' programming models like GLES and OpenCL, the mobile processor has gained more parallel computing capability, which enables real-timeimageprocessing on portable devices. GLES i...
详细信息
There is a strong need for non-reference video quality metrics for user-generated video content to prevent loss of video quality caused by distortion during recording, compression, and signal transmission. Here we con...
There is a strong need for non-reference video quality metrics for user-generated video content to prevent loss of video quality caused by distortion during recording, compression, and signal transmission. Here we contribute to advancing the issue of streaming quality by creating a large-scale dataset with video compression and transmission artefacts. Our final dataset consists of 4.1 million video quality perceptual thresholds by users. We also created a new first non-reference video quality metric that includes the psychophysical features of the user’s video experience, which provides stability in predicting the user’s subjective rating of a video. Our experimental results show that the proposed video quality metric achieves the most stable performance on three independent video datasets. We believe our study will expand further research into deep learning-based video quality metrics modelling.
In the rapidly evolving landscape of digital transactions, the efficiency and accuracy of billing systems are paramount. The checkout process, from the ordinary retail shop and online shops, demands speed and accuracy...
详细信息
ISBN:
(数字)9798350368949
ISBN:
(纸本)9798350368956
In the rapidly evolving landscape of digital transactions, the efficiency and accuracy of billing systems are paramount. The checkout process, from the ordinary retail shop and online shops, demands speed and accuracy in digital transactions. AutoBill is an innovation of a new AI checkout system that will transform the current retail checkout method through machine learning with additional powers from image recognition and real-time data storage. It is built on the Raspberry Pi platform while using TensorFlow’s object detection model but uses Google’s Teachable Machine. It integrates MongoDB for dynamic data handling. AutoBill automatically removes the human factor of billing, providing a seamless and efficient solution that reduces manual intervention enhancing overall transaction accuracy. The paper details a system design, implementation, and performance evaluation that shows promise of revolutionizing contactless shopping experiences.
In order to achieve unattended tape storage management, this article designs a tape barcode recognition and positioning technology based on video and image. The algorithm uses the YOLOV5s network model to quickly reco...
详细信息
ISBN:
(数字)9798350386974
ISBN:
(纸本)9798350386981
In order to achieve unattended tape storage management, this article designs a tape barcode recognition and positioning technology based on video and image. The algorithm uses the YOLOV5s network model to quickly recognize the tape barcode in the image and the QR code used to record the actual position of the plane, and then uses the ZBAR toolkit to decode it. Finally, the QR code is used as a geometric correction control point for the image, and the actual position of the plane is calculated by the pixel position of the tape barcode. Through testing on the dataset, the recognition time for each tape is 0.007s, and the detection rate refers to the proportion of tapes that can be found and correctly recognized, which is 97.13%. This achieves the goal of efficient inventory of tapes. At the same time, when the position of the camera remains unchanged, after geometric correction processing of the image, the positioning accuracy of the tape can reach 99.99%. From the experimental results, it can be seen that this article has achieved its goal well.
The 1-ms visual feedback system is critical for seamless actuation in robotics, as any delay affects its performance in handling dynamic situations. Specular reflections cause problems in many visual technologies, mak...
详细信息
ISBN:
(数字)9798350349399
ISBN:
(纸本)9798350349405
The 1-ms visual feedback system is critical for seamless actuation in robotics, as any delay affects its performance in handling dynamic situations. Specular reflections cause problems in many visual technologies, making specular detection crucial in 1-ms visual feedback systems. However, existing real-time methods, which target Neumann architecture, fail to achieve the 1-ms delay due to spatial memory paths resulting from extensive frame-based processing. This research aims to develop a 1-ms specular detection system from both algorithm and architecture perspectives, proposing 1) temporal clustering and temporal reference based specular detection method, which leverages temporal domain information to address the requirements of frame-based processing; and 2) global-local integrated specular detection architecture, which enables the coexistence of local and global processing within a 1-ms stream-based architecture. The proposed methods are implemented on FPGA. The evaluation shows that the proposed system supports sensing and processing a $1000-\mathrm{fps}$ sequence with a delay of $0.941 \mathrm{~ms} /$ frame.
Exposure errors in images, including both underexposure and overexposure, significantly diminish images’ contrast and visual appeal. Existing deep learning-based exposure correction methods either require large netwo...
Exposure errors in images, including both underexposure and overexposure, significantly diminish images’ contrast and visual appeal. Existing deep learning-based exposure correction methods either require large networks or longer processingtime for inference and are thus not applicable for embedded devices and real-time applications. To address these issues, a lightweight network is proposed in this paper to correct exposure errors with limited memory occupation and inference steps. It adopts the Laplacian pyramid to incrementally recover the color and details of the image through a layer-by-layer procedure. A structural re-parameterization structure is designed to both reduce model size for inference speed up and improve performance with a multi-branch learning structure. Extensive experiments demonstrate that our method achieves a better performance-efficiency trade-off than other exposure correction methods.
real-time eyeblink detection in the wild can widely serve for fatigue detection, face anti-spoofing, emotion analysis, etc. The existing research efforts generally focus on single-person cases towards trimmed video. H...
real-time eyeblink detection in the wild can widely serve for fatigue detection, face anti-spoofing, emotion analysis, etc. The existing research efforts generally focus on single-person cases towards trimmed video. However, multi-person scenario within untrimmed videos is also important for practical applications, which has not been well concerned yet. To address this, we shed light on this research field for the first time with essential contributions on dataset, theory, and practices. In particular, a large-scale dataset termed MPEblink that involves 686 untrimmed videos with 8748 eyeblink events is proposed under multi-person conditions. The samples are captured from uncon-strainedfilms to reveal “in the wild“ characteristics. Meanwhile, a real-time multi-person eyeblink detection method is also proposed. Being different from the existing counter-parts, our proposition runs in a one-stage spatio-temporal way with end-to-end learning capacity. Specifically, it simultaneously addresses the sub-tasks of face detection, face tracking, and human instance-level eyeblink detection. This paradigm holds 2 main advantages: (1) eyeblink features can be facilitated via the face's global context (e.g., head pose and illumination condition) with joint optimization and interaction, and (2) addressing these sub-tasks in parallel instead of sequential manner can save time remarkably to meet the real-time running requirement. Experiments on MPEblink verify the essential challenges of real-time multi-person eyeblink detection in the wild for untrimmed video. Our method also outperforms existing approaches by large margins and with a high inference speed.
Moving targets always defocus and shift outside the scene in video synthetic aperture radar (video SAR) image sequences. However, the shadows of moving targets are immune to these issues and can reveal the true positi...
详细信息
Moving targets always defocus and shift outside the scene in video synthetic aperture radar (video SAR) image sequences. However, the shadows of moving targets are immune to these issues and can reveal the true position of the moving targets. As such, by tracking the shadows of moving targets in the video SAR image sequence, it becomes feasible to keep track of these targets. Nevertheless, due to the small pixel size and time-varying characteristics of the target shadow, current prevailing tracking methods often prove insufficient for direct tracking of the shadow. In this letter, a shadow-assisted tracking method for moving targets based on a multilevel discriminant correlation filters network (MDCFnet) is proposed. Primarily, we designed a reverse feature pyramid network (RFPN) that integrates multiple high-level features into low-level features to obtain multiple features with higher distinguishability and resolution, thereby enhancing the final tracking accuracy and precision. Furthermore, we devised multilevel discriminant correlation filters (MDCFs) to perform filtering tracking under multiple feature maps. real dataset processing results are provided to demonstrate that the proposed method outperforms other state-of-the-art methods.
An effective tool for violence detection is highly demanded to examine the rise in crime rate in today's era. Artificial Intelligence can play a significant role in violence detection and monitoring to tackle vari...
详细信息
暂无评论